Ubuntu Manpage: fastqtobam - convert FastQ to unmapped BAM

Provided by: biobambam2_2.0.185+ds-1_amd64

NAME

       fastqtobam - convert FastQ to unmapped BAM

SYNOPSIS

       fastqtobam [options]

DESCRIPTION

fastqtobam reads one or two FastQ files and converts them to a BAM file in which each read is marked as
unmapped. If no input file name is given, then a single FastQ file is read from standard input. If one
file name is given, then a single FastQ file is read from the given file. In both cases the read names in
the file are parsed to determine whether the contained reads are paired or not if the name scheme is not
set to pairedfiles. If two file names are given, then the program assumes to find two FastQ files which
are synchronous, i.e. where the first read in the first file is the mate of the first read in the second
file etc. Input file names can be given either via the I key or after the key=value pairs on the command
line. The program accepts read name formats as described below under the key namescheme.

The following key=value pairs can be given:

verbose=<[0|1]> print progress report. By default progress is not reported.

I=<filename>: input file name (data is read from standard input if this option is not given). This key
can be given twice.

level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

-1: zlib/gzip default compression level

0: uncompressed

1: zlib/gzip level 1 (fast) compression

9: zlib/gzip level 9 (best) compression

If libmaus has been compiled with support for igzip (see https://software.intel.com/en-us/articles/igzip-
a-high-performance-deflate-compressor-with-optimizations-for-genomic-data) then an additional valid value
is

11: igzip compression

md5=<0|1>: md5 checksum creation for output file. Valid values are

0: do not compute checksum. This is the default.

1: compute checksum. If the md5filename key is set, then the checksum is written to the given file.
If md5filename is unset, then no checksum will be computed.

md5filename file name for md5 checksum if md5=1.

gz=<[0|1]> input is gzip compressed FastQ. By default input is assumed to be uncompressed FastQ.

threads=<1> additional BAM encoding helper threads.

PGID=<> read group identifier for reads. By default no read group identifier is set. The fields CN, DS,
DT, FO, KS, LB, PG, PI, PL, PU and SM of the corresponding @RG header line can be set by using the keys
RGCN, RGDS, etc. respectively.

qualityoffset=<33> FastQ quality offset. This value is subtracted from the ASCII character representation
to get the quality score value.

qualitymax=<41> maximum valid quality value, 41 by default. Higher values may indicate a wrong setting of
the qualityoffset parameter. BAM allows quality values up to the value of 94.

qualityhist=<0> compute a quality histogram and print it on the standard error channel after processing
has finished successfully. Lines for the quality histogram are prefixed with [H] and contain tab
separated values. The histogram enumerates quality scores from high to low values. The histogram has four
columns (after the [H] marker). The first is the ASCII representation of the quality with offset 33, i.e.
the symbol ! denotes quality 0. The second column gives the absolute frequency of the value. The third
column stores the relative frequency of the value, i.e. the fraction of all values assigned to this
value. The fourth column gives a cumulative relative frequency value over all quality for the current
line and those for higher quality values.

checkquality=<1> check whether quality values are in range and terminate if an invalid value is
encountered.

namescheme=<generic> read name scheme. This determines how read names are parsed. There are four possible
options:

generic:
the first sequence of non whitespace characters is extracted from the @ line of the FastQ record
and the rest of the @ line is discarded. If the retained name ends in /1 or /2, then the read is
part of a read pair, otherwise it is the single read for the template. For a pair the part of the
name before the /1 or /2 is considered the template name. For a single the whole name is
considered the name of the template.

c18s: The name is expected to consist of two sequences of non white-space characters where the first
contains seven colon separated fields and the second four colon separated fields. The first of the
two is considered to be the name of the template. It is assumed that this read is the only read
for the template.

c18pe: As for c18s, the name is expected to consist of two sequences of non white-space characters where
the first contains seven colon separated fields and the second four colon separated fields. The
first of the two is considered to be the name of the template. The read is assumed to be part of a
read pair. The first field in the second non-whitespace sequence of the @ line designates, whether
it is the first or second of the pair depending on whether the field stores the number 1 or 2
respectively.

pairedfiles:
The input framgents are assumed to be paired. If there is a single input file then the pairs are
expected consecutive in the file. If there are two input files then the read names in the two are
expected to be synchronous. All characters in read names beginning from the first white space
character are discarded. If the two (so reduced) read names in question end on /1 and /2
respectively, then those suffixes will be clipped off also. The remaining read names are checked
for equality. If they are not equal, then the program will reject the input and terminate.

chksumfn=<> File name used for storing bamseqchksum like information about the output file. By default
no such file is produced.

hash=<crc32prod> Hash used for producing bamseqchksum type information. The information produced is only
stored if the chksumfn option is set.

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright  ©  2009-2014  German  Tischler,  © 2011-2014 Genome Research Limited.  License GPLv3+: GNU GPL
       version 3 <http://gnu.org/licenses/gpl.html>
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to  the  extent
       permitted by law.

BIOBAMBAM                                           July 2013                                      FASTQTOBAM(1)