Ubuntu Manpage: sniffles - structural variation caller using third-generation sequencing

NAME

       sniffles - structural variation caller using third-generation sequencing

DESCRIPTION

       usage:  sniffles  --input  SORTED_INPUT.bam [--vcf OUTPUT.vcf] [--snf MERGEABLE_OUTPUT.snf] [--threads 4]
       [--non-germline]

       Sniffles2: A fast structural variant (SV) caller for long-read sequencing data

              Version 2.0.2 Contact: moritz.g.smolka@gmail.com

              Usage example A - Call SVs for a single sample:

              sniffles --input sorted_indexed_alignments.bam --vcf output.vcf

              ... OR, with CRAM input and bgzipped+tabix indexed VCF output:

              sniffles --input sample.cram --vcf output.vcf.gz

              ... OR, producing only a SNF file with SV candidates for later multi-sample calling:

              sniffles --input sample1.bam --snf sample1.snf

              ... OR, simultaneously producing a single-sample VCF and SNF file for later multi-sample calling:

              sniffles --input sample1.bam --vcf sample1.vcf.gz --snf sample1.snf

              ... OR, with additional options to specify tandem repeat annotations (for improved call accuracy),
              reference (for DEL sequences) and non-germline mode for detecting rare SVs:

              sniffles --input sample1.bam --vcf sample1.vcf.gz --tandem-repeats tandem_repeats.bed  --reference
              genome.fa --non-germline

              Usage example B - Multi-sample calling:

              Step  1.  Create  .snf  for  each  sample:  sniffles --input sample1.bam --snf sample1.snf Step 2.
              Combined calling: sniffles --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf

              ... OR, using a .tsv file containing a list of .snf files, and custom sample ids  in  an  optional
              second column (one sample per line): Step 2. Combined calling: sniffles --input snf_files_list.tsv
              --vcf multisample.vcf

              Usage example C - Determine genotypes for a set of known SVs (force calling):

              sniffles --input sample.bam --genotype-vcf input_known_svs.vcf --vcf output_genotypes.vcf

              Use --help for full parameter/usage information

   optional arguments:
       -h, --help
              show this help message and exit

       --version
              show program's version number and exit

   Common parameters:
       -i IN [IN ...], --input IN [IN ...]
              For  single-sample  calling:  A  coordinate-sorted  and  indexed .bam/.cram (BAM/CRAM format) file
              containing aligned reads. - OR - For multi-sample calling: Multiple .snf files  (generated  before
              by running Sniffles2 for individual samples with --snf) (default: None)

       -v OUT.vcf, --vcf OUT.vcf
              VCF  output  filename to write the called and refined SVs to. If the given filename ends with .gz,
              the VCF file will be automatically bgzipped and a .tbi index built for it. (default: None)

       --snf OUT.snf
              Sniffles2 file (.snf) output filename to store candidates for later multi-sample calling (default:
              None)

       --reference reference.fasta
              (Optional) Reference sequence the reads were aligned against. To  enable  output  of  deletion  SV
              sequences, this parameter must be set. (default: None)

       --tandem-repeats IN.bed
              (Optional)  Input  .bed  file  containing  tandem  repeat  annotations  for  the reference genome.
              (default: None)

       --non-germline
              Call non-germline SVs (rare, somatic or mosaic SVs) (default: False)

       --phase
              Determine phase for SV calls (requires the input alignments to be phased) (default: False)

       -t N, --threads N
              Number of parallel threads to use (speed-up for multi-core CPUs) (default: 4)

   SV Filtering parameters:
       --minsupport auto
              Minimum number of supporting reads for a SV to be reported (default: automatically choose based on
              coverage) (default: auto)

       --minsupport-auto-mult 0.1/0.025
              Coverage  based  minimum  support  multiplier  for  germline/non-germline  modes  (only  for  auto
              minsupport) (default: None)

       --minsvlen N
              Minimum SV length (in bp) (default: 35)

       --minsvlen-screen-ratio N
              Minimum length for SV candidates (as fraction of --minsvlen) (default: 0.95)

       --mapq N
              Alignments with mapping quality lower than this value will be ignored (default: 25)

       --no-qc
              Output all SV candidates, disregarding quality control steps. (default: False)

       --qc-stdev True
              Apply filtering based on SV start position and length standard deviation (default: True)

       --qc-stdev-abs-max N
              Maximum standard deviation for SV length and size (in bp) (default: 500)

       --qc-strand False
              Apply filtering based on strand support of SV calls (default: False)

       --qc-coverage N
              Minimum surrounding region coverage of SV calls (default: 1)

       --long-ins-length 2500
              Insertion  SVs  longer  than  this value are considered as hard to detect based on the aligner and
              read length and subjected to more sensitive filtering. (default: 2500)

       --long-del-length 50000
              Deletion SVs longer than this value are subjected to central coverage  drop-based  filtering  (Not
              applicable for --non-germline) (default: 50000)

       --long-del-coverage 0.66
              Long  deletions  with  central  coverage (in relation to upstream/downstream coverage) higher than
              this value will be filtered (Not applicable for --non-germline) (default: 0.66)

       --long-dup-length 50000
              Duplication SVs longer than this value are subjected to central coverage increase-based  filtering
              (Not applicable for --non-germline) (default: 50000)

       --long-dup-coverage 1.33
              Long  duplications  with central coverage (in relation to upstream/downstream coverage) lower than
              this value will be filtered (Not applicable for --non-germline) (default: 1.33)

       --max-splits-kb N
              Additional number of splits per kilobase read sequence allowed before reads are ignored  (default:
              0.1)

       --max-splits-base N
              Base  number of splits allowed before reads are ignored (in addition to --max-splits-kb) (default:
              3)

       --min-alignment-length N
              Reads with alignments shorter than this length (in bp) will be ignored (default: 1000)

       --phase-conflict-threshold F
              Maximum fraction of conflicting reads permitted for SV phase information to be  labelled  as  PASS
              (only for --phase) (default: 0.1)

       --detect-large-ins True
              Infer insertions that are longer than most reads and therefore are spanned by few alignments only.
              (default: True)

   SV Clustering parameters:
       --cluster-binsize N
              Initial screening bin size in bp (default: 100)

       --cluster-r R
              Multiplier for SV start position standard deviation criterion in cluster merging (default: 2.5)

       --cluster-repeat-h H
              Multiplier for mean SV length criterion for tandem repeat cluster merging (default: 1.5)

       --cluster-repeat-h-max N
              Max.  merging  distance  based  on SV length criterion for tandem repeat cluster merging (default:
              1000)

       --cluster-merge-pos N
              Max. merging distance for insertions and deletions on the same  read  and  cluster  in  non-repeat
              regions (default: 150)

       --cluster-merge-len F
              Max. size difference for merging SVs as fraction of SV length (default: 0.33)

       --cluster-merge-bnd N
              Max. merging distance for breakend SV candidates. (default: 1500)

   SV Genotyping parameters:
       --genotype-ploidy N
              Sample ploidy (currently fixed at value 2) (default: 2)

       --genotype-error N
              Estimated false positive rate for leads (relating to total coverage) (default: 0.05)

       --sample-id SAMPLE_ID
              Custom ID for this sample, used for later multi-sample calling (stored in .snf) (default: None)

       --genotype-vcf IN.vcf
              Determine  the  genotypes  for all SVs in the given input .vcf file (forced calling). Re-genotyped
              .vcf will be written to the output file specified with --vcf. (default: None)

   Multi-Sample Calling / Combine parameters:
       --combine-high-confidence F
              Minimum fraction of samples in which a SV needs to have  individually  passed  QC  for  it  to  be
              reported  in  combined output (a value of zero will report all SVs that pass QC in at least one of
              the input samples) (default: 0.0)

       --combine-low-confidence F
              Minimum fraction of samples in which a SV needs to be present (failed QC) for it to be reported in
              combined output (default: 0.2)

       --combine-low-confidence-abs N
              Minimum absolute number of samples in which a SV needs to be present (failed  QC)  for  it  to  be
              reported in combined output (default: 3)

       --combine-null-min-coverage N
              Minimum coverage for a sample genotype to be reported as 0/0 (sample genotypes with coverage below
              this threshold at the SV location will be output as ./.) (default: 5)

       --combine-match N
              Maximum  deviation  of  multiple  SV's  start/end position for them to be combined across samples.
              Given by max_dev=M*sqrt(min(SV_length_a,SV_length_b)), where M is this parameter. (default: 500)

       --combine-consensus
              Output the consensus genotype of all samples (default: False)

       --combine-separate-intra
              Disable combination of SVs within the same sample (default: False)

       --combine-output-filtered
              Include low-confidence / putative non-germline SVs in multi-calling (default: False)

   SV Postprocessing, QC and output parameters:
       --output-rnames
              Output names of all supporting reads for each SV in the RNAMEs info field (default: False)

       --no-consensus
              Disable consensus sequence generation for insertion SV calls (may improve  performance)  (default:
              False)

       --no-sort
              Do not sort output VCF by genomic coordinates (may slightly improve performance) (default: False)

       --no-progress
              Disable progress display (default: False)

       --quiet
              Disable all logging, except errors (default: False)

       --max-del-seq-len N
              Maximum deletion sequence length to be output. Deletion SVs longer than this value will be written
              to the output as symbolic SVs. (default: 50000)

       --symbolic
              Output  all  SVs  as symbolic, including insertions and deletions, instead of reporting nucleotide
              sequences.  (default: False)

              Usage example A - Call SVs for a single sample:

              sniffles --input sorted_indexed_alignments.bam --vcf output.vcf

              ... OR, with CRAM input and bgzipped+tabix indexed VCF output:

              sniffles --input sample.cram --vcf output.vcf.gz

              ... OR, producing only a SNF file with SV candidates for later multi-sample calling:

              sniffles --input sample1.bam --snf sample1.snf

              ... OR, simultaneously producing a single-sample VCF and SNF file for later multi-sample calling:

              sniffles --input sample1.bam --vcf sample1.vcf.gz --snf sample1.snf

              ... OR, with additional options to specify tandem repeat annotations (for improved call accuracy),
              reference (for DEL sequences) and non-germline mode for detecting rare SVs:

              sniffles --input sample1.bam --vcf sample1.vcf.gz --tandem-repeats tandem_repeats.bed  --reference
              genome.fa --non-germline

              Usage example B - Multi-sample calling:

              Step  1.  Create  .snf  for  each  sample:  sniffles --input sample1.bam --snf sample1.snf Step 2.
              Combined calling: sniffles --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf

              ... OR, using a .tsv file containing a list of .snf files, and custom sample ids  in  an  optional
              second column (one sample per line): Step 2. Combined calling: sniffles --input snf_files_list.tsv
              --vcf multisample.vcf

              Usage example C - Determine genotypes for a set of known SVs (force calling):

              sniffles --input sample.bam --genotype-vcf input_known_svs.vcf --vcf output_genotypes.vcf

AUTHOR

        This manpage was written by Andreas Tille for the Debian distribution and
        can be used for any other usage of the program.

sniffles 2.0.2                                    February 2022                                      SNIFFLES(1)