Provided by: samtools_1.19.2-1build2_amd64 bug

NAME

       samtools-import - converts FASTQ files to unmapped SAM/BAM/CRAM

SYNOPSIS

       samtools import [options] [ fastq_file ... ]

DESCRIPTION

       Reads  one  or  more  FASTQ files and converts them to unmapped SAM, BAM or CRAM.  The input files may be
       automatically decompressed if they have a .gz extension.

       The simplest usage in the absence of any other command line options is to provide one or two input files.

       If a single file is given, it will be interpreted as a single-ended sequencing  format  unless  the  read
       names end with /1 and /2 in which case they will be labelled as PAIRED with READ1 or READ2 BAM flags set.
       If  a  pair  of  filenames  are given they will be read from alternately to produce an interleaved output
       file, also setting PAIRED and READ1 / READ2 flags.

       The filenames may be explicitly labelled using -1 and -2 for READ1  and  READ2  data  files,  -s  for  an
       interleaved  paired file (or one half of a paired-end run), -0 for unpaired data and explicit index files
       specified with --i1 and --i2.  These correspond to typical output  produced  by  Illumina  bcl2fastq  and
       match  the  output  from  samtools  fastq.   The  index  files will set both the BC barcode code and it's
       associated QT quality tag.

       The Illumina CASAVA identifiers may also be processed when the -i option is  given.   This  tag  will  be
       processed  for  READ1  /  READ2, whether or not the read failed processing (QCFAIL flag), and the barcode
       sequence which will be added to the BC tag.  This can be an  alternative  to  explicitly  specifying  the
       index files, although note that doing so will not fill out the barcode quality tag.

OPTIONS

       -s FILE Import paired interleaved data from FILE.

       -0 FILE Import single-ended (unpaired) data from FILE.

               Operationally  there  is no difference between the -s and -0 options as given an interleaved file
               with /1 and /2 read name endings both will correctly set the PAIRED, READ1 and READ2  flags,  and
               given data with no suffixes and no CASAVA identifiers being processed both will leave the data as
               unpaired.   However their inclusion here is for more descriptive command lines and to improve the
               header comment describing the samtools fastq decode command.

       -1 FILE, -2 FILE
               Import paired data from a pair of FILEs.  The BAM flag PAIRED will be set, but not PROPER_PAIR as
               it has not  been  aligned.   READ1  and  READ2  will  be  stored  in  their  original,  unmapped,
               orientation.

       --i1 FILE, --i2 FILE
               Specifies  index  barcodes  associated with the -1 and -2 files.  These will be appended to READ1
               and READ2 records in the barcode (BC) and quality (QT) tags.

       -i      Specifies that the Illumina CASAVA identifiers should be processed.   This  may  set  the  READ1,
               READ2 and QCFAIL flags and add a barcode tag.

       -N, --name2
               Assume  the  read  names  are  encoded  in  the  SRA  and  ENA formats where the first word is an
               automatically generated name with the second field being the original name.  This option extracts
               that second field instead.

       --barcode-tag TAG
               Changes the auxiliary tag used for barcode sequence.  Defaults to BC.

       --quality-tag TAG
               Changes the auxiliary tag used for barcode quality.  Defaults to QT.

       -oFILE  Output to FILE.  By default output will be written to stdout.

       --order TAG
               When outputting a SAM record, also output an integer tag containing the Nth record number.   This
               may  be  useful  if  the  data  is to be sorted or collated in some manner and we wish this to be
               reversible.  In this case the tag may be used  with  samtools  sort  -t  TAG  to  regenerate  the
               original input order.

               Note  integer  tags can only hold up to 2^32 record numbers (approximately 4 billion).  Data sets
               with more records can switch to using a fixed-width string tag instead, with leading 0s to ensure
               sort works.  To do this specify TAG:LENGTH.  E.g. --order rn:12 will be able  to  sort  up  to  1
               trillion records.

       -r RG_line, --rg-line RG_line
               A  complete  @RG  header  line may be specified, with or without the initial "@RG" component.  If
               specified this will also use the ID field from RG_line in each SAM records RG auxiliary tag.

               If specified multiple times this appends to  the  RG  line,  automatically  adding  tabs  between
               invocations.

       -R RG_ID, --rg RG_ID
               This  is  a  shorter  form  of  the  option above, equivalent to --rg-line ID:RG_ID.  If both are
               specified then this option is ignored.

       -u      Output BAM or CRAM as uncompressed data.

       -T TAGLIST
               This looks for any SAM-format auxiliary tags in the comment field of a fastq  read  name.   These
               must   match   the   <alpha-num><alpha-num>:<type>:<data>   pattern   as  specified  in  the  SAM
               specification.  TAGLIST can be blank or * to indicate all tags should be copied  to  the  output,
               otherwise it is a comma-separated list of tag types to include with all others being discarded.

EXAMPLES

       Convert a single-ended fastq file to an unmapped CRAM.  Both of these commands perform the same action.

           samtools import -0 in.fastq -o out.cram
           samtools import in.fastq > out.cram

       Convert a pair of Illumina fastqs containing CASAVA identifiers to BAM, adding the barcode information to
       the BC auxiliary tag.

           samtools import -i -1 in_1.fastq -2 in_2.fastq -o out.bam
           samtools import -i in_[12].fastq > out.bam

       Specify the read group. These commands are equivalent

           samtools import -r "$(echo -e 'ID:xyz\tPL:ILLUMINA')" in.fq
           samtools import -r "$(echo -e '@RG\tID:xyz\tPL:ILLUMINA')" in.fq
           samtools import -r ID:xyz -r PL:ILLUMINA in.fq

       Create  an  unmapped  BAM file from a set of 4 Illumina fastqs from bcf2fastq, consisting of two read and
       two index tags.  The CASAVA identifier is used only for setting QC pass / failure status.

           samtools import -i -1 R1.fq -2 R2.fq --i1 I1.fq --i2 I2.fq -o out.bam

       Convert a pair of CASAVA barcoded fastq files to unmapped CRAM with an incremental record  counter,  then
       sort  this by minimiser in order to reduce file space.  The reversal process is also shown using samtools
       sort and samtools fastq.

           samtools import -i in_1.fq in_2.fq --order ro -O bam,level=0 | \
               samtools sort -@4 -M -o out.srt.cram -

           samtools sort -@4 -O bam -u -t ro out.srt.cram | \
               samtools fastq -1 out_1.fq -2 out_2.fq -i --index-format "i*i*"

AUTHOR

       Written by James Bonfield of the Wellcome Sanger Institute.

SEE ALSO

       samtools(1), samtools-fastq(1)

       Samtools website: <http://www.htslib.org/>

samtools-1.19.2                                  24 January 2024                              samtools-import(1)