Ubuntu Manpage: bamstreamingmarkduplicates

Provided by: biobambam2_2.0.185+ds-1_amd64

NAME

       bamstreamingmarkduplicates - mark duplicate reads

SYNOPSIS

       bamstreamingmarkduplicates [options]

DESCRIPTION

       bamstreamingmarkduplicates  reads  a  coordinate  sorted BAM, SAM or CRAM file, which has been previously
       processed by bamsort using the options fixmates=1 and adddupmarksupport=1, marks duplicate read pairs and
       reads and writes the resulting file in BAM, SAM or CRAM format.  The  preprocessing  of  the  file  using
       bamsort  with  the stated options is mandatory, i.e.  bamstreamingmarkduplicates will fail without it. In
       contrast to bammarkduplicates and bammarkduplicates2  the  streaming  variant  bamstreamingmarkduplicates
       processes  the  file  in a single pass.  bamstreamingmarkduplicates cannot handle files containing orphan
       pair ends (pairs where one of the two ends is missing in the file).

       The following key=value pairs can be given:

       M=<>: file name for metrics data. By default the metrics data is written on the standard error channel.

       level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-
       a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value
       is

       11:    igzip compression

       verbose=<1>: Valid values are

       1:     print progress report on standard error

       0:     do not print progress report

       tmpfile=<filename>: set the prefix for temporary file names

       disablevalidation=<0|1>: sets whether input validation is performed. Valid values are

       0:     validation is enabled (default)

       1:     validation is disabled

       md5=<0|1>: md5 checksum creation for output file. This option can only be given if outputformat=bam. Then
       valid values are

       0:     do not compute checksum. This is the default.

       1:     compute checksum. If the md5filename key is set, then the checksum is written to the  given  file.
              If md5filename is unset, then no checksum will be computed.

       md5filename file name for md5 checksum if md5=1.

       index=<0|1>:  compute  BAM index for output file. This option can only be given if outputformat=bam. Then
       valid values are

       0:     do not compute BAM index. This is the default.

       1:     compute BAM index. If the indexfilename key is set, then the BAM index is  written  to  the  given
              file. If indexfilename is unset, then no BAM index will be computed.

       indexfilename file name for output BAM index if index=1.

       inputformat=<bam>:  input  file format.  All versions of bamstreamingmarkduplicates come with support for
       the BAM input format. If the program in addition is linked to the  io_lib  package,  then  the  following
       options are valid:

       bam:   BAM (see http://samtools.sourceforge.net/SAM1.pdf)

       sam:   SAM (see http://samtools.sourceforge.net/SAM1.pdf)

       cram:  CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)

       outputformat=<bam>: output file format.  All versions of bamstreamingmarkduplicates come with support for
       the  BAM  output  format.  If the program in addition is linked to the io_lib package, then the following
       options are valid:

       bam:   BAM (see http://samtools.sourceforge.net/SAM1.pdf)

       sam:   SAM (see http://samtools.sourceforge.net/SAM1.pdf)

       cram:  CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit). This format  is  not  advisable  for  data
              sorted by query name.

       I=<[stdin]>: input filename, standard input if unset.

       O=<[stdout]>: output filename, standard output if unset.

       inputthreads=<[1]>: input helper threads, only valid for inputformat=bam.

       outputthreads=<[1]>: output helper threads, only valid for outputformat=bam.

       reference=<[]>:  reference FastA file for inputformat=cram and outputformat=cram. An index file (.fai) is
       required.

       tag=<tag> name of auxiliary field storing tag information in string form. Read fragments  or  pairs  with
       different  tags  will  not  be  considered  as  duplicates, even they would be according to their mapping
       coordinates. For pairs the tag field information of the first and second mate are concatenated to  obtain
       the tag of the pair.

       nucltag=<tag>  this option works like the tag option but is restricted to sequences of nucleotides (A,C,G
       or T) as tags. The length of each tag sequence is not allowed to exceed 15 bases. All tags  are  required
       to  have  the  same  length.   Each  non nucleotide symbol is mapped to A. In contrast to the tag option,
       nucltag uses less memory for processing and can be expected to be faster.

       filterdupmarktags=<[0]>: remove the auxiliary fields MC, MQ, MS, and  MT  used  for  streaming  duplicate
       marking when producing the output file. By default the fields are not removed.

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright  ©  2009-2014  German  Tischler,  © 2011-2014 Genome Research Limited.  License GPLv3+: GNU GPL
       version 3 <http://gnu.org/licenses/gpl.html>
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to  the  extent
       permitted by law.

BIOBAMBAM                                          August 2014                     BAMSTREAMINGMARKDUPLICATES(1)