Provided by: libvcflib-tools_1.0.12+dfsg-1_amd64 bug

NAME

       vcflib index

DESCRIPTION

       vcflib  contains tools and libraries for dealing with the Variant Call Format (VCF) which is a flat-file,
       tab-delimited textual format intended to describe reference-indexed variations between individuals.

       VCF provides a common interchange format for the description of variation in individuals and  populations
       of  samples, and has become the defacto standard reporting format for a wide array of genomic variant de‐
       tectors.

       vcflib provides methods to manipulate and interpret sequence variation as it can be described by VCF.  It
       is both:

       • an API for parsing and operating on records of genomic variation as it can be described by the VCF for‐
         mat,

       • and a collection of command-line utilities for executing complex manipulations on VCF files.

       The API itself provides a quick and extremely permissive method to read and write VCF files.   Extensions
       and  applications of the library provided in the included utilities (*.cpp) comprise the vast bulk of the
       library’s utility for most users.

   filter
       filter command                             description
       ──────────────────────────────────────────────────────────────────────────
       vcffilter                                  VCF filter the  specified  vcf
                                                  file using the set of filters
       vcfuniq                                    List  unique genotypes.  Simi‐
                                                  lar to GNU uniq, but aimed  at
                                                  VCF  records.  vcfuniq removes
                                                  records which  have  the  same
                                                  position,  ref, and alt as the
                                                  previous record  on  a  sorted
                                                  VCF  file.   Note that it does
                                                  not  adjust/combine  genotypes
                                                  in   the  output,  but  simply
                                                  takes the first  record.   See
                                                  also  vcfcreatemulti  for com‐
                                                  bining records.
       vcfuniqalleles                             List unique alleles  For  each
                                                  record,  remove  any duplicate
                                                  alternate  alleles  that   may
                                                  have   resulted  from  merging
                                                  separate VCF files.

   metrics
       metrics command                            description
       ──────────────────────────────────────────────────────────────────────────
       vcfcheck                                   Validate integrity and identi‐
                                                  ty of  the  VCF  by  verifying
                                                  that   the  VCF  record’s  REF
                                                  matches  a   given   reference
                                                  file.
       vcfdistance                                Adds  a  tag  to  each variant
                                                  record  which  indicates   the
                                                  distance  to the nearest vari‐
                                                  ant.   (defaults  to  BasesTo‐
                                                  ClosestVariant  if  no  custom
                                                  tag name is given.
       vcfentropy                                 Annotate VCF records with  the
                                                  Shannon  entropy  of  flanking
                                                  sequence.  Anotates the output
                                                  VCF  file   with,   for   each
                                                  record,  EntropyLeft, Entropy‐
                                                  Right,  EntropyCenter,   which
                                                  are  the  entropies of the se‐
                                                  quence  of  the  given  window
                                                  size  to  the left, right, and
                                                  center of  the  record.   Also
                                                  adds EntropyRef and EntropyAlt
                                                  for each alt.
       vcfhetcount                                Calculate  the  heterozygosity
                                                  rate: count the number of  al‐
                                                  ternate  alleles  in heterozy‐
                                                  gous genotypes in all  records
                                                  in the vcf file
       vcfhethomratio                             Generates  the  het/hom  ratio
                                                  for  each  individual  in  the
                                                  file

   phenotype
       phenotype command                          description
       ──────────────────────────────────────────────────────────────────────────
       permuteGPAT++                              permuteGPAT++  is a method for
                                                  adding empirical p-values to a
                                                  GPAT++ score.

   genotype
       genotype command                           description
       ──────────────────────────────────────────────────────────────────────────
       abba-baba                                  abba-baba calculates the  tree
                                                  pattern  for  four indviduals.
                                                  This tool assumes reference is
                                                  ancestral and ignores non  ab‐
                                                  ba-baba  sites.  The output is
                                                  a boolian value: 1 = true ,  0
                                                  =  false  for  abba  and baba.
                                                  the tree  argument  should  be
                                                  specified  from the most basal
                                                  taxa to the most derived.
       hapLrt                                     HapLRT is a  likelihood  ratio
                                                  test  for  haplotype  lengths.
                                                  The lengths are  modeled  with
                                                  an  exponential  distribution.
                                                  The sign denotes if the target
                                                  has longer haplotypes  (1)  or
                                                  the background (-1).
       normalize-iHS                              normalizes   iHS   or   XP-EHH
                                                  scores.

   transformation
       transformation command                     description
       ──────────────────────────────────────────────────────────────────────────
       dumpContigsFromHeader                      Dump contigs from header
       smoother                                   smoothes is a method for  win‐
                                                  dow   smoothing  many  of  the
                                                  GPAT++ formats.
       vcf2dag                                    Modify VCF to be able to build
                                                  a directed acyclic graph (DAG)
       vcf2fasta                                  Generates sample_seq:N.fa  for
                                                  each   sample,  reference  se‐
                                                  quence, and chromosomal copy N
                                                  in [0,1...  ploidy].  Each se‐
                                                  quence in the  fasta  file  is
                                                  named  using  the same pattern
                                                  used for the file name, allow‐
                                                  ing them to be combined.
       vcf2tsv                                    Converts VCF to per-allelle or
                                                  per-genotype     tab-delimited
                                                  format,  using  null string to
                                                  replace empty  values  in  the
                                                  table.    Specifying  -g  will
                                                  output  one  line  per  sample
                                                  with   genotype   information.
                                                  When there is  more  than  one
                                                  alt  allele there will be mul‐
                                                  tiple rows, one for  each  al‐
                                                  lele  and, the info will match
                                                  the `A' index
       vcfaddinfo                                 Adds info fields from the sec‐
                                                  ond file which are not present
                                                  in the first vcf file.
       vcfafpath                                  Display genotype paths
       vcfallelicprimitives                       WARNING: this tool is  consid‐
                                                  ered  legacy  and  is only re‐
                                                  tained  for  older  workflows.
                                                  It  will emit a warning!  Even
                                                  though it can use the WFA  you
                                                  should use vcfwave instead.
       vcfannotate                                Intersect  the  records in the
                                                  VCF file with targets provided
                                                  in a BED file.   Intersections
                                                  are  done on the reference se‐
                                                  quences in the VCF  file.   If
                                                  no  VCF  filename is specified
                                                  on the command line (last  ar‐
                                                  gument)   the  VCF  read  from
                                                  stdin.
       vcfannotategenotypes                       Examine  genotype   correspon‐
                                                  dence.   Annotate genotypes in
                                                  the first file with  genotypes
                                                  in the second adding the geno‐
                                                  type  as  another flag to each
                                                  sample  filed  in  the   first
                                                  file.   annotation-tag  is the
                                                  name of the sample flag  which
                                                  is  added to store the annota‐
                                                  tion.  also adds a  `has_vari‐
                                                  ant'  flag for sites where the
                                                  second file has a variant.
       vcfbreakmulti                              If multiple alleles are speci‐
                                                  fied in a single record, break
                                                  the   record   into   multiple
                                                  lines,  preserving allele-spe‐
                                                  cific INFO fields.
       vcfcat                                     Concatenates VCF files
       vcfclassify                                Creates a new VCF  where  each
                                                  variant  is  tagged  by allele
                                                  class: snp, ts/tv, indel, mnp
       vcfcleancomplex                            Removes reference-matching se‐
                                                  quence  from  complex  alleles
                                                  and adjusts records to reflect
                                                  positional change.
       vcfcombine                                 Combine  VCF files positional‐
                                                  ly,  combining  samples   when
                                                  sites  and alleles are identi‐
                                                  cal.  Any number of VCF  files
                                                  may  be  combined.   The  INFO
                                                  field and  other  columns  are
                                                  taken  from  one  of the files
                                                  which   are   combined    when
                                                  records   in   multiple  files
                                                  match.   Alleles   must   have
                                                  identical  ordering to be com‐
                                                  bined  into  one  record.   If
                                                  they  do not, multiple records
                                                  will be emitted.
       vcfcommonsamples                           Generates each record  in  the
                                                  first  file,  removing samples
                                                  not present in the second
       vcfcreatemulti                             Go through sorted VCF and when
                                                  overlapping alleles are repre‐
                                                  sented     across     multiple
                                                  records,  merge  them  into  a
                                                  single multi-ALT record.   See
                                                  the documentation for more in‐
                                                  formation.
       vcfecho                                    Echo VCF to stdout (simple de‐
                                                  mo)
       vcfevenregions                             Generates  a  list of regions,
                                                  e.g. chr20:10..30  using   the
                                                  variant   density  information
                                                  provided in the  VCF  file  to
                                                  ensure  that  the regions have
                                                  even  numbers   of   variants.
                                                  This  can be use to reduce the
                                                  variance in runtime  when  di‐
                                                  viding  variant  detection  or
                                                  genotyping by genomic  coordi‐
                                                  nates.
       vcffixup                                   Generates  a  VCF stream where
                                                  AC and NS have been  generated
                                                  for  each  record using sample
                                                  genotypes
       vcfflatten                                 Removes multi-allelic sites by
                                                  picking the most common alter‐
                                                  nate.   Requires  allele  fre‐
                                                  quency  specification `AF' and
                                                  use of `G' and `A' to  specify
                                                  the  fields which vary accord‐
                                                  ing to the Allele or Genotype.
                                                  VCF file may be  specified  on
                                                  the  command  line or piped as
                                                  stdin.
       vcfgeno2alleles                            modifies the  genotypes  field
                                                  to provide the literal alleles
                                                  rather than indexes
       vcfgeno2haplo                              Convert  genotype-based phased
                                                  alleles  within   –window-size
                                                  into  haplotype alleles.  Will
                                                  break  haplotype  construction
                                                  when  encountering  non-phased
                                                  genotypes on input.
       vcfgenosamplenames                         Get samplenames
       vcfglbound                                 Adjust GLs so that the maximum
                                                  GL is 0 by  dividing  all  GLs
                                                  for each sample by the max.
       vcfglxgt                                   Set  genotypes using the maxi‐
                                                  mum  genotype  likelihood  for
                                                  each sample.
       vcfindex                                   Adds  an  index  number to the
                                                  INFO field (id=position)
       vcfinfo2qual                               Sets QUAL from info field  tag
                                                  keyed  by [key].  The VCF file
                                                  may be omitted and  read  from
                                                  stdin.   The  average  of  the
                                                  field is used if  it  contains
                                                  multiple values.
       vcfinfosummarize                           Take  annotations given in the
                                                  per-sample fields and add  the
                                                  mean,  median,  min, or max to
                                                  the site-level INFO.
       vcfintersect                               VCF set analysis
       vcfkeepgeno                                Reduce file size  by  removing
                                                  FORMAT  fields  not  listed on
                                                  the command line  from  sample
                                                  specifications in the output
       vcfkeepinfo                                To  decrease  file size remove
                                                  INFO fields not listed on  the
                                                  command line
       vcfkeepsamples                             outputs each record in the vcf
                                                  file,   removing  samples  not
                                                  listed on the command line
       vcfld                                      Compute LD
       vcfleftalign                               Left-align indels and  complex
                                                  variants  in the input using a
                                                  pairwise   ref/alt   alignment
                                                  followed by a heuristic, iter‐
                                                  ative left realignment process
                                                  that  shifts indel representa‐
                                                  tions to their absolute  left‐
                                                  most (5’) extent.
       vcflength                                  Add length info field
       vcfnullgenofields                          Makes   the  FORMAT  for  each
                                                  variant line  the  same  (uses
                                                  all   the  FORMAT  fields  de‐
                                                  scribed in the header).  Fills
                                                  out per-sample fields to match
                                                  FORMAT.  Expands GT values  of
                                                  `.'  with  number  of  alleles
                                                  based on ploidy (eg: `./.' for
                                                  dipolid).
       vcfnumalt                                  outputs a VCF stream where NU‐
                                                  MALT has  been  generated  for
                                                  each record using sample geno‐
                                                  types
       vcfoverlay                                 Overlay  records  in the input
                                                  vcf files with order as prece‐
                                                  dence.
       vcfprimers                                 For each VCF  record,  extract
                                                  the  flanking  sequences,  and
                                                  write them to stdout as  FASTA
                                                  records  suitable  for  align‐
                                                  ment.
       vcfqual2info                               Puts QUAL into an  info  field
                                                  tag keyed by [key].
       vcfremap                                   For each alternate allele, at‐
                                                  tempt  to  realign against the
                                                  reference  with  lowered   gap
                                                  open  penalty.  If realignment
                                                  is possible, adjust the  cigar
                                                  and  reference/alternate alle‐
                                                  les.   Observe  how  different
                                                  alignment  parameters, includ‐
                                                  ing context and entropy-depen‐
                                                  dent ones,  influence  variant
                                                  classification and interpreta‐
                                                  tion.
       vcfremoveaberrantgenotypes                 strips  samples  which are ho‐
                                                  mozygous but have observations
                                                  implying heterozygosity.   Re‐
                                                  move samples for which the re‐
                                                  ported  genotype  (GT) and ob‐
                                                  servation counts disagree (AO,
                                                  RO).
       vcfremovesamples                           outputs each record in the vcf
                                                  file, removing samples  listed
                                                  on the command line
       vcfsample2info                             Take  annotations given in the
                                                  per-sample fields and add  the
                                                  mean,  median,  min, or max to
                                                  the site-level INFO.
       vcfsamplediff                              Establish   putative   somatic
                                                  variants  using  reported dif‐
                                                  ferences between germline  and
                                                  somatic  samples.   Tags  each
                                                  record where the listed sample
                                                  genotypes differ with  .   The
                                                  first  sample is assumed to be
                                                  germline, the second  somatic.
                                                  Each  record  is  tagged  with
                                                  ={germline,somatic,loh}     to
                                                  specify  the  type  of variant
                                                  given the genotype  difference
                                                  between the two samples.
       vcfsamplenames                             List sample names
       vcfstreamsort                              Sorts  the input (either stdin
                                                  or  file)  using  a  streaming
                                                  sort   algorithm.   Guarantees
                                                  that the positional  order  is
                                                  correct  provided out-of-order
                                                  variants are no more than  100
                                                  positions   in  the  VCF  file
                                                  apart.
       vcfwave                                    Realign reference  and  alter‐
                                                  nate alleles with WFA, parsing
                                                  out  the  `primitive'  alleles
                                                  into  multiple  VCF   records.
                                                  New records have IDs that ref‐
                                                  erence  the  source record ID.
                                                  Genotypes/samples are  handled
                                                  correctly.  Deletions generate
                                                  haploid/missing  genotypes  at
                                                  overlapping sites.

   statistics
       statistics command                         description
       ──────────────────────────────────────────────────────────────────────────
       bFst                                       bFst is a Bayesian approach to
                                                  Fst.   Importantly  bFst   ac‐
                                                  counts for genotype uncertain‐
                                                  ty in the model using genotype
                                                  likelihoods.   For  a more de‐
                                                  tailed  description  see:   `A
                                                  Bayesian approach to inferring
                                                  population structure from dom‐
                                                  inant markers’ by Holsinger et
                                                  al. Molecular  Ecology Vol 11,
                                                  issue 7 2002.  The  likelihood
                                                  function  has been modified to
                                                  use genotype likelihoods  pro‐
                                                  vided   by   variant  callers.
                                                  There are five free parameters
                                                  estimated in the  model:  each
                                                  subpopulation’s   allele  fre‐
                                                  quency and Fis  (fixation  in‐
                                                  dex,  within  each  subpopula‐
                                                  tion), a  free  parameter  for
                                                  the  total population’s allele
                                                  frequency, and Fst.
       genotypeSummary                            Generates a table of  genotype
                                                  counts.   Summarizes  genotype
                                                  counts for bi-allelic SNVs and
                                                  indel
       iHS                                        iHS calculates the  integrated
                                                  haplotype score which measures
                                                  the relative decay of extended
                                                  haplotype  homozygosity  (EHH)
                                                  for the reference and alterna‐
                                                  tive alleles at a  site  (see:
                                                  voight  et  al. 2006, Spiech &
                                                  Hernandez 2014).
       meltEHH
       pFst                                       pFst is  a  probabilistic  ap‐
                                                  proach  for  detecting differ‐
                                                  ences  in  allele  frequencies
                                                  between two populations.
       pVst                                       pVst calculates vst, a measure
                                                  of CNV stratification.
       permuteSmooth                              permuteSmooth  is a method for
                                                  adding   empirical    p-values
                                                  smoothed wcFst scores.
       plotHaps                                   plotHaps  provides the format‐
                                                  ted output that  can  be  used
                                                  with `bin/plotHaplotypes.R'.
       popStats                                   General   population   genetic
                                                  statistics for each SNP
       segmentFst                                 segmentFst   creates   genomic
                                                  segments  (bed  file)  for re‐
                                                  gions with high wcFst
       segmentIhs                                 Creates genomic segments  (bed
                                                  file)  for  regions  with high
                                                  wcFst
       sequenceDiversity                          The sequenceDiversity  program
                                                  calculates two popular metrics
                                                  of haplotype diversity: pi and
                                                  extended  haplotype  homozygo‐
                                                  isty (eHH).  Pi is  calculated
                                                  using the Nei and Li 1979 for‐
                                                  mulation.   eHH  a  convenient
                                                  way to think  about  haplotype
                                                  diversity.   When  eHH = 0 all
                                                  haplotypes in the  window  are
                                                  unique  and  when  eHH = 1 all
                                                  haplotypes in the  window  are
                                                  identical.
       vcfaltcount                                count  the number of alternate
                                                  alleles in all records in  the
                                                  vcf file
       vcfcountalleles                            Count alleles
       vcfgenosummarize                           Adds   summary  statistics  to
                                                  each record summarizing quali‐
                                                  ties reported in called  geno‐
                                                  types.   Uses:  RO  (reference
                                                  observation count), QR (quali‐
                                                  ty sum reference observations)
                                                  AO   (alternate    observation
                                                  count), QA (quality sum alter‐
                                                  nate observations)
       vcfgenotypecompare                         adds  statistics  to  the INFO
                                                  field of the vcf file describ‐
                                                  ing the amount of  discrepancy
                                                  between  the genotypes (GT) in
                                                  the vcf file and the genotypes
                                                  reported in the  .   use  this
                                                  after  vcfannotategenotypes to
                                                  get correspondence  statistics
                                                  for two vcfs.
       vcfgenotypes                               Report  the genotypes for each
                                                  sample, for  each  variant  in
                                                  the  VCF.  Convert the numeri‐
                                                  cal represenation of genotypes
                                                  provided by the GT field to  a
                                                  human-readable  genotype  for‐
                                                  mat.
       vcfparsealts                               Alternate    allele    parsing
                                                  method.    This   method  uses
                                                  pairwise alignment of REF  and
                                                  ALTs  to  determine  component
                                                  allelic  primitives  for  each
                                                  alternate allele.
       vcfrandom                                  Generate a random VCF file
       vcfrandomsample                            Randomly  sample sites from an
                                                  input VCF file, which  may  be
                                                  provided  as stdin.  Scale the
                                                  sampling  probability  by  the
                                                  field  specified in KEY.  This
                                                  may be used to provide uniform
                                                  sampling  across  allele  fre‐
                                                  quencies, for instance.
       vcfroc                                     Generates  a  pseudo-ROC curve
                                                  using sensitivity  and  speci‐
                                                  ficity estimated against a pu‐
                                                  tative  truth set.  Threshold‐
                                                  ing is provided by  successive
                                                  QUAL cutoffs.
       vcfsitesummarize                           Summarize by site
       vcfstats                                   Prints  statistics about vari‐
                                                  ants in the input VCF file.
       wcFst                                      wcFst is  Weir  &  Cockerham’s
                                                  Fst for two populations.  Neg‐
                                                  ative  values  are VALID, they
                                                  are sites which can be treated
                                                  as zero Fst.  For more  infor‐
                                                  mation see Evolution, Vol.  38
                                                  N.   6 Nov 1984.  Specifically
                                                  wcFst uses equations 1,2,3,4.

SOURCE CODE

       See the source code repository at https://github.com/vcflib/vcflib

CREDIT

       Citations are the bread and butter of Science.  If you are using this software in your research and  want
       to support our future work, please cite the following publication:

       Please cite:

       A  spectrum  of  free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2,
       hts-nim  and   slivar   (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009123).
       Garrison  E,  Kronenberg  ZN,  Dawson  ET, Pedersen BS, Prins P (2022), PLoS Comput Biol 18(5): e1009123.
       https://doi.org/10.1371/journal.pcbi.1009123

LICENSE

       Copyright 2011-2024 (C) Erik Garrison and vcflib contributors.  Copyright 2020-2024 (C) Pjotr  Prins  MIT
       licensed.

AUTHORS

       Erik Garrison and vcflib contributors.

vcflib                                                                                                 vcflib(1)