Ubuntu Manpage: featureCounts - toolkit for processing next-gen sequencing data

NAME

       featureCounts - toolkit for processing next-gen sequencing data

SYNOPSIS

       featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...

DESCRIPTION

       Version 2.0.4

       ## Mandatory arguments:

       -a <string>
              Name  of an annotation file. GTF/GFF format by default. See -F option for more format information.
              Inbuilt annotations (SAF format) is available in 'annotation' directory of  the  package.  Gzipped
              file is also accepted.

       -o <string>
              Name  of  output  file  including  read  counts.  A  separate file including summary statistics of
              counting results is also included in the  output  ('<string>.summary').  Both  files  are  in  tab
              delimited format.

       input_file1 [input_file2] ...
              A list of SAM or BAM format files. They can be

       either name or location sorted. If no files provided,
              <stdin>  input  is  expected.  Location-sorted  paired-end  reads are automatically sorted by read
              names.

       ## Optional arguments: # Annotation

       -F <string>
              Specify format of the provided annotation file. Acceptable formats include  'GTF'  (or  compatible
              GFF format) and 'SAF'. 'GTF' by default.  For SAF format, please refer to Users Guide.

       -t <string>
              Specify  feature  type(s)  in  a  GTF  annotation.  If multiple types are provided, they should be
              separated by ',' with no space in between. 'exon' by  default.  Rows  in  the  annotation  with  a
              matched feature will be extracted and used for read mapping.

       -g <string>
              Specify  attribute  type  in  GTF  annotation.  'gene_id'  by default. Meta-features used for read
              counting will be extracted from annotation using the provided value.

       --extraAttributes
              Extract extra attribute types from the provided GTF annotation and include them  in  the  counting
              output.  These attribute types will not be used to group features. If more than one attribute type
              is provided they should be separated by comma.

       -A <string>
              Provide a chromosome name alias file to match chr names in annotation with  those  in  the  reads.
              This should be a twocolumn comma-delimited text file. Its first column should include chr names in
              the  annotation  and  its  second column should include chr names in the reads. Chr names are case
              sensitive. No column header should be included in the file.

       # Level of summarization

       -f     Perform read counting at feature level (eg. counting reads for exons rather than genes).

       # Overlap between reads and features

       -O     Assign reads to all their overlapping meta-features (or features if -f is specified).

       --minOverlap <int>
              Minimum number of overlapping bases in a read that is required for read assignment. 1 by  default.
              Number  of  overlapping  bases  is  counted  from both reads if paired end. If a negative value is
              provided, then a gap of up to specified size will be allowed between read and the feature that the
              read is assigned to.

       --fracOverlap <float> Minimum fraction of overlapping bases in a read that is
              required for read assignment. Value should  be  within  range  [0,1].  0  by  default.  Number  of
              overlapping  bases  is  counted from both reads if paired end. Both this option and '--minOverlap'
              option need to be satisfied for read assignment.

       --fracOverlapFeature <float> Minimum fraction of overlapping bases in a
              feature that is required for read assignment. Value should be within range [0,1]. 0 by default.

       --largestOverlap
              Assign reads to a meta-feature/feature that has the largest number of overlapping bases.

       --nonOverlap <int>
              Maximum number of non-overlapping bases in a read (or a read pair)  that  is  allowed  when  being
              assigned to a feature. No limit is set by default.

       --nonOverlapFeature <int> Maximum number of non-overlapping bases in a feature
              that is allowed in read assignment. No limit is set by default.

       --readExtension5 <int> Reads are extended upstream by <int> bases from their
              5' end.

       --readExtension3 <int> Reads are extended upstream by <int> bases from their
              3' end.

       --read2pos <5:3>
              Reduce  reads  to their 5' most base or 3' most base. Read counting is then performed based on the
              single base the read is reduced to.

       # Multi-mapping reads

       -M     Multi-mapping reads will also be counted. For a multimapping read,  all  its  reported  alignments
              will be counted. The 'NH' tag in BAM/SAM input is used to detect multi-mapping reads.

       # Fractional counting

       --fraction
              Assign fractional counts to features. This option must be used together with '-M' or '-O' or both.
              When  '-M'  is  specified,  each reported alignment from a multi-mapping read (identified via 'NH'
              tag) will carry a fractional count of 1/x, instead of 1 (one), where x  is  the  total  number  of
              alignments  reported  for  the  same  read.  When '-O' is specified, each overlapping feature will
              receive a fractional count of 1/y, where y is the total number of features  overlapping  with  the
              read.  When  both  '-M'  and  '-O'  are specified, each alignment will carry a fractional count of
              1/(x*y).

       # Read filtering

       -Q <int>
              The minimum mapping quality score a read must satisfy in  order  to  be  counted.  For  paired-end
              reads, at least one end should satisfy this criteria. 0 by default.

       --splitOnly
              Count split alignments only (ie. alignments with CIGAR string containing 'N'). An example of split
              alignments is exon-spanning reads in RNA-seq data.

       --nonSplitOnly
              If specified, only non-split alignments (CIGAR strings do not contain letter 'N') will be counted.
              All the other alignments will be ignored.

       --primary
              Count  primary  alignments only. Primary alignments are identified using bit 0x100 in SAM/BAM FLAG
              field.

       --ignoreDup
              Ignore duplicate reads in read counting. Duplicate reads are identified using bit Ox400 in BAM/SAM
              FLAG field. The whole read pair is ignored if one of the reads is a duplicate read for paired  end
              data.

       # Strandness

       -s <int or string>
              Perform  strand-specific  read  counting. A single integer value (applied to all input files) or a
              string of commaseparated values (applied to each corresponding input  file)  should  be  provided.
              Possible  values  include: 0 (unstranded), 1 (stranded) and 2 (reversely stranded).  Default value
              is 0 (ie. unstranded read counting carried out for all input files).

       # Exon-exon junctions

       -J     Count number of reads supporting each exon-exon junction.  Junctions were identified from all  the
              exon-spanning reads in the input (containing 'N' in CIGAR string). Counting results are saved to a
              file named '<output_file>.jcounts'

       -G <string>
              Provide the name of a FASTA-format file that contains the reference sequences used in read mapping
              that  produced  the provided SAM/BAM files. This optional argument can be used with '-J' option to
              improve read counting for junctions.

       # Parameters specific to paired end reads

       -p     If specified, libraries are assumed to contain paired-end reads. For  any  library  that  contains
              paired-end  reads,  the  'countReadPairs'  parameter  controls  if  read  pairs or reads should be
              counted.

       --countReadPairs
              If specified, fragments (or templates) will be counted instead  of  reads.  This  option  is  only
              applicable for paired-end reads. For single-end data, it is ignored.

       -B     Only count read pairs that have both ends aligned.

       -P     Check validity of paired-end distance when counting read pairs. Use -d and -D to set thresholds.

       -d <int>
              Minimum fragment/template length, 50 by default.

       -D <int>
              Maximum fragment/template length, 600 by default.

       -C     Do  not  count  read pairs that have their two ends mapping to different chromosomes or mapping to
              same chromosome but on different strands.

       --donotsort
              Do not sort reads in BAM/SAM input. Note that reads from the same pair are required to be  located
              next to each other in the input.

       # Number of CPU threads

       -T <int>
              Number of the threads. 1 by default.

       # Read groups

       --byReadGroup
              Assign reads by read group. "RG" tag is required to be present in the input BAM/SAM files.

       # Long reads

       -L     Count  long reads such as Nanopore and PacBio reads. Long read counting can only run in one thread
              and only reads (not read-pairs) can be counted. There is  no  limitation  on  the  number  of  'M'
              operations allowed in a CIGAR string in long read counting.

       # Assignment results for each read

       -R <format>
              Output  detailed assignment results for each read or readpair. Results are saved to a file that is
              in one of the following formats: CORE, SAM and BAM. See Users Guide  for  more  info  about  these
              formats.

       --Rpath <string>
              Specify  a  directory to save the detailed assignment results. If unspecified, the directory where
              counting results are saved is used.

       # Miscellaneous

       --tmpDir <string>
              Directory under which intermediate files are saved (later removed). By default, intermediate files
              will be saved to the directory specified in '-o' argument.

       --maxMOp <int>
              Maximum number of 'M' operations allowed in a CIGAR string. 10 by default. Both 'X'  and  '='  are
              treated as 'M' and adjacent 'M' operations are merged in the CIGAR string.

       --verbose
              Output verbose information for debugging, such as unmatched chromosome/contig names.

       -v     Output version of the program.

AUTHOR

        This manpage was written by Alexandre Mestiashvili for the Debian distribution and
        can be used for any other usage of the program.

featureCounts 2.0.3                                March 2023                                   FEATURECOUNTS(1)