Ubuntu Manpage: samtools-fasta,-samtools-fastq - converts a SAM/BAM/CRAM file to FASTA or FASTQ

Provided by: samtools_1.19.2-1build2_amd64

NAME

       samtools-fasta,-samtools-fastq - converts a SAM/BAM/CRAM file to FASTA or FASTQ

SYNOPSIS

       samtools fastq [options] in.bam
       samtools fasta [options] in.bam

DESCRIPTION

Converts a BAM or CRAM into either FASTQ or FASTA format depending on the command invoked. The files will
be automatically compressed if the file names have a .gz, .bgz, or .bgzf extension.

Note this command is attempting to reverse the alignment process, so if the aligner took a single input
FASTQ and produced multiple SAM records via supplementary and/or secondary alignments, then converting
back to FASTQ again should produce the original single FASTA / FASTQ record. By default it will not
attempt to records for supplementary and secondary alignments, but see the -F option for more details.

If the input contains read-pairs which are to be interleaved or written to separate files in the same
order, then the input should be first collated by name. Use samtools collate or samtools sort -n to
ensure this.

For each different QNAME, the input records are categorised according to the state of the READ1 and READ2
flag bits. The three categories used are:

1 : Only READ1 is set.

2 : Only READ2 is set.

0 : Either both READ1 and READ2 are set; or neither is set.

The exact meaning of these categories depends on the sequencing technology used. It is expected that
ordinary single and paired-end sequencing reads will be in categories 1 and 2 (in the case of paired-end
reads, one read of the pair will be in category 1, the other in category 2). Category 0 is essentially a
“catch-all” for reads that do not fit into a simple paired-end sequencing model.

For each category only one sequence will be written for a given QNAME. If more than one record is
available for a given QNAME and category, the first in input file order that has quality values will be
used. If none of the candidate records has quality values, then the first in input file order will be
used instead.

Sequences will be written to standard output unless one of the -1, -2, -o, or -0 options is used, in
which case sequences for that category will be written to the specified file. The same filename may be
specified with multiple options, in which case the sequences will be multiplexed in order of occurrence.

If a singleton file is specified using the -s option then only paired sequences will be output for
categories 1 and 2; paired meaning that for a given QNAME there are sequences for both category 1 and 2.
If there is a sequence for only one of categories 1 or 2 then it will be diverted into the specified
singletons file. This can be used to prepare fastq files for programs that cannot handle a mixture of
paired and singleton reads.

The -s option only affects category 1 and 2 records. The output for category 0 will be the same
irrespective of the use of this option.

The sequence generated will be for the entire sequence recorded in the SAM record (and quality if
appropriate). This means if it has soft-clipped CIGAR records then the soft-clipped data will be in the
output FASTA/FASTQ. Hard-clipped data is, by definition, absent from the SAM record and hence will be
absent in any FASTA/FASTQ produced.

The filter options order of precedence is -d, -f, -F, --rf and -G.

OPTIONS

-n By default, either '/1' or '/2' is added to the end of read names where the corresponding READ1
or READ2 FLAG bit is set. Using -n causes read names to be left as they are.

-N Always add either '/1' or '/2' to the end of read names even when put into different files.

-O Use quality values from OQ tags in preference to standard quality string if available.

-s FILE Write singleton reads to FILE.

-t Copy RG, BC and QT tags to the FASTQ header line, if they exist.

-T TAGLIST
Specify a comma-separated list of tags to copy to the FASTQ header line, if they exist. TAGLIST
can be blank or * to indicate all tags should be copied to the output. If using *, be careful to
quote it to avoid unwanted shell expansion.

-1 FILE Write reads with the READ1 FLAG set (and READ2 not set) to FILE instead of outputting them. If
the -s option is used, only paired reads will be written to this file.

-2 FILE Write reads with the READ2 FLAG set (and READ1 not set) to FILE instead of outputting them. If
the -s option is used, only paired reads will be written to this file.

-o FILE Write reads with either READ1 FLAG or READ2 flag set to FILE instead of outputting them to
stdout. This is equivalent to -1 FILE -2 FILE.

-0 FILE Write reads where the READ1 and READ2 FLAG bits set are either both set or both unset to FILE
instead of outputting them.

-f INT Only output alignments with all bits set in INT present in the FLAG field. INT can be specified
in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e.
/^0[0-7]+/) [0].

-F INT, ,--excl-flags INT ,--exclude-flags INT
Do not output alignments with any bits set in INT present in the FLAG field. INT can be
specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0'
(i.e. /^0[0-7]+/) [0x900]. This defaults to 0x900 representing filtering of secondary and
supplementary alignments.

--rf INT , --incl-flags INT, --include-flags INT
Only output alignments with any bits set in INT present in the FLAG field. INT can be specified
in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e.
/^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated list of flag
names [0].

-G INT Only EXCLUDE reads with all of the bits set in INT present in the FLAG field. INT can be
specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0'
(i.e. /^0[0-7]+/) [0].

-d TAG[:VAL]
Only output alignments containing an auxiliary tag matching both TAG and VAL. If VAL is omitted
then any value is accepted. The tag types supported are i, f, Z, A and H. "B" arrays are not
supported. This is comparable to the method used in samtools view -d, but for single values only
(i.e. there is no sibling -D option).

-i add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)

-c [0..9]
set compression level when writing gz or bgzf fastq files.

--i1 FILE
write first index reads to FILE

--i2 FILE
write second index reads to FILE

--barcode-tag TAG
aux tag to find index reads in [default: BC]

--quality-tag TAG
aux tag to find index quality in [default: QT]

-@, --threads INT
Number of input/output compression threads to use in addition to main thread [0].

--index-format STR
string to describe how to parse the barcode and quality tags. For example:

i14i8 the first 14 characters are index 1, the next 8 characters are index 2

n8i14 ignore the first 8 characters, and use the next 14 characters for index 1

If the tag contains a separator, then the numeric part can be replaced with '*' to mean
'read until the separator or end of tag', for example:

n*i* ignore the left part of the tag until the separator, then use the second part

EXAMPLES

       Starting from a coordinate sorted file, output paired reads to  separate  files,  discarding  singletons,
       supplementary and secondary reads.  The resulting files can be used with, for example, the bwa aligner.

           samtools collate -u -O in_pos.bam | \
           samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n

       Starting  with  a  name  collated  file,  output  paired and singleton reads in a single file, discarding
       supplementary and secondary reads.  To get all of the reads in a single file, it is necessary to redirect
       the output of samtools fastq.  The output file is suitable for use with  bwa  mem  -p  which  understands
       interleaved files containing a mixture of paired and singleton reads.

           samtools fastq -0 /dev/null in_name.bam > all_reads.fq

       Output  paired reads in a single file, discarding supplementary and secondary reads.  Save any singletons
       in a separate file.  Append /1 and /2 to read names.  This format is suitable for use by NextGenMap  when
       using  its  -p  and  -q  options.   With  this  aligner,  paired  reads  must be mapped separately to the
       singletons.

           samtools fastq -0 /dev/null -s single.fq -N in_name.bam > paired.fq

BUGS

       o The way of specifying output files is far too complicated and easy to get wrong.

AUTHOR

       Written by Heng Li, with modifications by Martin  Pollard  and  Jennifer  Liddle,  all  from  the  Sanger
       Institute.