Provided by: biobambam2_2.0.185+ds-1_amd64 bug

NAME

       bamadapterfind - find adapter contamination in sequencing reads

SYNOPSIS

       bamdapterfind [options]

DESCRIPTION

       bamdapterfind  scans  a  BAM file for contaminations by sequencing adapters. It uses two separate methods
       for this detection:

       list:  each read is matched against a predefined list of adapter sequences. A sequence is  considered  as
              matching  if  there is an overlap of a least adpmatchminscore bases, the overlap covers at least a
              factor of adpmatchminfrac of the adapter's length and the indel free local alignment  between  the
              adapter  and  the  read covers at least a factor of adpmatchminpfrac of the length of the possible
              overlap between the two. If such a match is found, then the auxiliary field as is filled with  the
              length  of  the  match,  af  is filled with the fraction of the adapter sequence matched and aa is
              filled with the name of the matched adapter sequence.

       overlap:
              the two mates need to have a match similar to the following two lines

                  s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15s16t0t1t2t3
          x3x2x1x0s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15s16

              where an infix s0s1s2... of the first read matches a suffix  of  the  reverse  complement  of  the
              second  read.  In  this case it is likely that the first read has been sequenced beyond the end of
              the payload sequence and into the attached adapter. This overlap needs to be at least  MIN_OVERLAP
              bases  long to be considered. If such an overlap is found, then the adjacent sequences are checked
              for a match, where in the example x3x2x1x0 needs to be the reverse  complement  of  t0t1t2t3.  The
              adjacent sequences are checked up to a limit of ADAPTER_MATCH base pairs. If such a match is found
              then  the  auxiliary  field  ah  is  set  to 1 and a3 is used to store the length of the suspected
              adapter sequence.

       The following key=value pairs can be given at the program start:

       level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-
       a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value
       is

       11:    igzip compression

       verbose=<1>: Valid values are

       1:     print progress report on standard error

       0:     do not print progress report

       mod=<1048576>: if verbose=1 then this sets the frequency of progress reports, i.e. a report is given  for
       each mod'th input read/alignment

       adaptersbam=<>:  file  name  of the BAM file containing the list of adapter used for the adapter matching
       described above under list. The program contains an internal list which is used if this key is not given.

       SEED_LENGTH=<12>: length of the seed used for detecting overlaps in overlap based matching  (see  overlap
       above, default value is 12 base pairs).

       PCT_MISMATCH=<10>: percentage of mismatches allowed for overlap matching. This only includes the overlap,
       not the suspected attached adapter sequence. The default value is 10.

       MAX_SEED_MISMATCHES=<SEED_LENGTH*PCT_MISMATCH>:  maximum  number  of  mismatches  allowed in the seed. By
       default this value is computed as SEED_LENGTH*PCT_MISMATCH.

       MIN_OVERLAP=<32>: minimum length of overlap for overlap matching in base pairs (see above).  The  default
       value is 32.

       ADAPTER_MATCH=<12>:  maximum  number  of  base  pairs  to  check  for  matching adapters in overlap based
       matching. The default value is 12.

       adpmatchminscore=<16> minimum score for list based adapter matching (see above, default value is 16)

       adpmatchminfrac=<0.75> minimum fraction of adapter sequence which needs  to  match  (see  above,  default
       value is 0.75=75%)

       adpmatchminpfrac=<0.8> minimum fraction of overlap for adapter list matching (see above, default value is
       0.8=80%)

       clip=<0> clip the adapters off and move the corresponding sequence part to the qs auxiliary field and the
       corresponding quality string part to the qq auxiliary field

       reflen=<3000000000> length of reference sequence/genome

       pA=<0.25> relative frequency of base A in reference sequence/genome

       pC=<0.25> relative frequency of base C in reference sequence/genome

       pG=<0.25> relative frequency of base G in reference sequence/genome

       pT=<0.25> relative frequency of base T in reference sequence/genome

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright  ©  2009-2013  German  Tischler,  © 2011-2013 Genome Research Limited.  License GPLv3+: GNU GPL
       version 3 <http://gnu.org/licenses/gpl.html>
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to  the  extent
       permitted by law.

BIOBAMBAM                                           July 2013                                  BAMADAPTERFIND(1)