Provided by: biobambam2_2.0.185+ds-1_amd64 bug

NAME

       fastqtobam - convert FastQ to unmapped BAM

SYNOPSIS

       fastqtobam [options]

DESCRIPTION

       fastqtobam  reads  one or two FastQ files and converts them to a BAM file in which each read is marked as
       unmapped. If no input file name is given, then a single FastQ file is read from standard  input.  If  one
       file name is given, then a single FastQ file is read from the given file. In both cases the read names in
       the  file are parsed to determine whether the contained reads are paired or not if the name scheme is not
       set to pairedfiles.  If two file names are given, then the program assumes to find two FastQ files  which
       are  synchronous, i.e. where the first read in the first file is the mate of the first read in the second
       file etc. Input file names can be given either via the I key or after the key=value pairs on the  command
       line. The program accepts read name formats as described below under the key namescheme.

       The following key=value pairs can be given:

       verbose=<[0|1]> print progress report. By default progress is not reported.

       I=<filename>:  input  file  name (data is read from standard input if this option is not given). This key
       can be given twice.

       level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-
       a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value
       is

       11:    igzip compression

       md5=<0|1>: md5 checksum creation for output file. Valid values are

       0:     do not compute checksum. This is the default.

       1:     compute checksum. If the md5filename key is set, then the checksum is written to the  given  file.
              If md5filename is unset, then no checksum will be computed.

       md5filename file name for md5 checksum if md5=1.

       gz=<[0|1]> input is gzip compressed FastQ. By default input is assumed to be uncompressed FastQ.

       threads=<1> additional BAM encoding helper threads.

       PGID=<>  read group identifier for reads. By default no read group identifier is set.  The fields CN, DS,
       DT, FO, KS, LB, PG, PI, PL, PU and SM of the corresponding @RG header line can be set by using  the  keys
       RGCN, RGDS, etc.  respectively.

       qualityoffset=<33> FastQ quality offset. This value is subtracted from the ASCII character representation
       to get the quality score value.

       qualitymax=<41> maximum valid quality value, 41 by default. Higher values may indicate a wrong setting of
       the qualityoffset parameter. BAM allows quality values up to the value of 94.

       qualityhist=<0>  compute  a quality histogram and print it on the standard error channel after processing
       has finished successfully. Lines for the  quality  histogram  are  prefixed  with  [H]  and  contain  tab
       separated values. The histogram enumerates quality scores from high to low values. The histogram has four
       columns (after the [H] marker). The first is the ASCII representation of the quality with offset 33, i.e.
       the  symbol  !  denotes quality 0. The second column gives the absolute frequency of the value. The third
       column stores the relative frequency of the value, i.e. the fraction  of  all  values  assigned  to  this
       value.   The  fourth  column gives a cumulative relative frequency value over all quality for the current
       line and those for higher quality values.

       checkquality=<1> check whether quality values  are  in  range  and  terminate  if  an  invalid  value  is
       encountered.

       namescheme=<generic> read name scheme. This determines how read names are parsed. There are four possible
       options:

       generic:
              the  first  sequence of non whitespace characters is extracted from the @ line of the FastQ record
              and the rest of the @ line is discarded. If the retained name ends in /1 or /2, then the  read  is
              part  of a read pair, otherwise it is the single read for the template. For a pair the part of the
              name before the /1 or /2 is considered  the  template  name.  For  a  single  the  whole  name  is
              considered the name of the template.

       c18s:  The  name  is  expected  to consist of two sequences of non white-space characters where the first
              contains seven colon separated fields and the second four colon separated fields. The first of the
              two is considered to be the name of the template. It is assumed that this read is  the  only  read
              for the template.

       c18pe: As  for c18s, the name is expected to consist of two sequences of non white-space characters where
              the first contains seven colon separated fields and the second four colon  separated  fields.  The
              first of the two is considered to be the name of the template. The read is assumed to be part of a
              read pair. The first field in the second non-whitespace sequence of the @ line designates, whether
              it  is  the  first  or  second of the pair depending on whether the field stores the number 1 or 2
              respectively.

       pairedfiles:
              The input framgents are assumed to be paired. If there is a single input file then the  pairs  are
              expected  consecutive in the file. If there are two input files then the read names in the two are
              expected to be synchronous.  All characters in read names beginning from  the  first  white  space
              character  are  discarded.  If  the  two  (so  reduced)  read  names  in question end on /1 and /2
              respectively, then those suffixes will be clipped off also. The remaining read names  are  checked
              for equality. If they are not equal, then the program will reject the input and terminate.

       chksumfn=<>  File  name used for storing bamseqchksum like information about the output file.  By default
       no such file is produced.

       hash=<crc32prod> Hash used for producing bamseqchksum type information. The information produced is  only
       stored if the chksumfn option is set.

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright  ©  2009-2014  German  Tischler,  © 2011-2014 Genome Research Limited.  License GPLv3+: GNU GPL
       version 3 <http://gnu.org/licenses/gpl.html>
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to  the  extent
       permitted by law.

BIOBAMBAM                                           July 2013                                      FASTQTOBAM(1)