Ubuntu Manpage: bloomfilter.sh - Filters reads potentially sharing a kmer with a reference

NAME

       bloomfilter.sh - Filters reads potentially sharing a kmer with a reference

SYNOPSIS

       bloomfilter.sh in=<input file> out=<nonmatches> outm=<matches> ref=<reference>

DESCRIPTION

       Filters  reads  potentially  sharing  a kmer with a reference.  The more memory, the higher the accuracy.
       Reads going to outu are guaranteed to not match the reference, but reads going to outm might may  or  may
       not match the reference.

EXAMPLES

       bloomfilter.sh in=reads.fq outm=nonhuman.fq outm=human.fq k=31 minhits=3 ref=human.fa

       Error correction and depth filtering can be done simultaneously.

OPTIONS

   File parameters:
       in=<file>
              Primary input, or read 1 input.

       in2=<file>
              Read 2 input if reads are in two files.

       outm=<file>
              (out) Primary matched read output.

       outm2=<file>
              (out2) Matched read 2 output if reads are in two files.

       outu=<file>
              Primary unmatched read output.

       outu2=<file>
              Unmatched read 2 output if reads are in two files.

       ref=<file>
              Reference sequence file, or a comma-delimited list.

              For depth-based filtering, set this to the same as the input.

       overwrite=t
              (ow) Set to false to force the program to abort rather than overwrite an existing file.

   Hashing parameters:
       k=31   Kmer length.

       hashes=2
              Number of hashes per kmer.  Higher generally reduces false positives at the expense of speed.

       minprob=0.5
              Ignore  reference  kmers  with  probability  of being correct below this (affects fastq references
              only).

       memmult=1.0
              Fraction of free memory to use for Bloom filter.   1.0  should  generally  work;  if  the  program
              crashes with an out of memory error, set this lower.  Higher increases specificity.

       cells= Option  to set the number of cells manually.  By default this will be autoset to use all available
              memory.  The only reason to set this is to ensure deterministic output.

       seed=0 This will change the hash function used.

   Reference-matching parameters:
       minhits=3
              Consecutive kmer hits for a read to be considered matched.

              Higher reduces false positives at the expense of sensitivity.

       mincount=1
              Minimum number of times a read kmer must occur in the reference to be considered a match.

       requireboth=f
              Require both reads in a pair to match the ref in order to go to outm.

              By default, pairs go to outm if either matches.

   Java Parameters
       -Xmx   This will set Java's memory usage, overriding autodetection.

              -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs. The max is typically  85%
              of physical memory.

       -eoom  This  flag  will  cause  the  process to exit if an out-of-memory exception occurs.  Requires Java
              8u92+.

       -da    Disable assertions.

AUTHOR

       Written by Brian Bushnell

       Please contact Brian Bushnell at bbushnell@lbl.gov if you encounter any problems.

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.

bloomfilter.sh 38.43                               March 2019                                  BLOOMFILTER.SH(1)