Provided by: libvcflib-tools_1.0.9+dfsg1-3build1_amd64 bug

NAME

       vcflib index

DESCRIPTION

       vcflib  contains tools and libraries for dealing with the Variant Call Format (VCF) which is a flat-file,
       tab-delimited textual format intended to describe reference-indexed variations between individuals.

       VCF provides a common interchange format for the description of variation in individuals and  populations
       of  samples, and has become the defacto standard reporting format for a wide array of genomic variant de‐
       tectors.

       vcflib provides methods to manipulate and interpret sequence variation as it can be described by VCF.  It
       is both:

       • an API for parsing and operating on records of genomic variation as it can be described by the VCF for‐
         mat,

       • and a collection of command-line utilities for executing complex manipulations on VCF files.

       The API itself provides a quick and extremely permissive method to read and write VCF files.   Extensions
       and  applications of the library provided in the included utilities (*.cpp) comprise the vast bulk of the
       library’s utility for most users.

   filter
       filter command                             description
       ──────────────────────────────────────────────────────────────────────────
       vcffilter                                  VCF filter the  specified  vcf
                                                  file using the set of filters
       vcfuniq                                    List  unique genotypes.  Simi‐
                                                  lar to GNU uniq, but aimed  at
                                                  VCF  records.  vcfuniq removes
                                                  records which  have  the  same
                                                  position,  ref, and alt as the
                                                  previous record  on  a  sorted
                                                  VCF  file.   Note that it does
                                                  not  adjust/combine  genotypes
                                                  in   the  output,  but  simply
                                                  takes the first  record.   See
                                                  also  vcfcreatemulti  for com‐
                                                  bining records.
       vcfuniqalleles                             List unique alleles  For  each
                                                  record,  remove  any duplicate
                                                  alternate  alleles  that   may
                                                  have   resulted  from  merging
                                                  separate VCF files.

   metrics
       metrics command                            description
       ──────────────────────────────────────────────────────────────────────────
       vcfcheck                                   Validate integrity and identi‐
                                                  ty of  the  VCF  by  verifying
                                                  that   the  VCF  record’s  REF
                                                  matches  a   given   reference
                                                  file.
       vcfdistance                                Adds  a  tag  to  each variant
                                                  record  which  indicates   the
                                                  distance  to the nearest vari‐
                                                  ant.   (defaults  to  BasesTo‐
                                                  ClosestVariant  if  no  custom
                                                  tag name is given.
       vcfentropy                                 Annotate VCF records with  the
                                                  Shannon  entropy  of  flanking
                                                  sequence.  Anotates the output
                                                  VCF  file   with,   for   each
                                                  record,  EntropyLeft, Entropy‐
                                                  Right,  EntropyCenter,   which
                                                  are  the  entropies of the se‐
                                                  quence  of  the  given  window
                                                  size  to  the left, right, and
                                                  center of  the  record.   Also
                                                  adds EntropyRef and EntropyAlt
                                                  for each alt.
       vcfhetcount                                Calculate  the  heterozygosity
                                                  rate: count the number of  al‐
                                                  ternate  alleles  in heterozy‐
                                                  gous genotypes in all  records
                                                  in the vcf file
       vcfhethomratio                             Generates  the  het/hom  ratio
                                                  for  each  individual  in  the
                                                  file

   phenotype
       phenotype command                          description
       ──────────────────────────────────────────────────────────────────────────
       permuteGPAT++                              permuteGPAT++  is a method for
                                                  adding empirical p-values to a
                                                  GPAT++ score.

   genotype
       genotype command                           description
       ──────────────────────────────────────────────────────────────────────────
       abba-baba                                  abba-baba calculates the  tree
                                                  pattern  for  four indviduals.
                                                  This tool assumes reference is
                                                  ancestral and ignores non  ab‐
                                                  ba-baba  sites.  The output is
                                                  a boolian value: 1 = true ,  0
                                                  =  false  for  abba  and baba.
                                                  the tree  argument  should  be
                                                  specified  from the most basal
                                                  taxa to the most derived.
       hapLrt                                     HapLRT is a  likelihood  ratio
                                                  test  for  haplotype  lengths.
                                                  The lengths are  modeled  with
                                                  an  exponential  distribution.
                                                  The sign denotes if the target
                                                  has longer haplotypes  (1)  or
                                                  the background (-1).
       normalize-iHS                              normalizes   iHS   or   XP-EHH
                                                  scores.

   transformation
       transformation command                     description
       ──────────────────────────────────────────────────────────────────────────
       dumpContigsFromHeader                      Dump contigs from header
       smoother                                   smoothes is a method for  win‐
                                                  dow   smoothing  many  of  the
                                                  GPAT++ formats.
       vcf2dag                                    Modify VCF to be able to build
                                                  a directed acyclic graph (DAG)
       vcf2fasta                                  Generates sample_seq:N.fa  for
                                                  each   sample,  reference  se‐
                                                  quence, and chromosomal copy N
                                                  in [0,1... ploidy].  Each  se‐
                                                  quence  in  the  fasta file is
                                                  named using the  same  pattern
                                                  used for the file name, allow‐
                                                  ing them to be combined.
       vcf2tsv                                    Converts VCF to per-allelle or
                                                  per-genotype     tab-delimited
                                                  format, using null  string  to
                                                  replace  empty  values  in the
                                                  table.   Specifying  -g   will
                                                  output  one  line  per  sample
                                                  with   genotype   information.
                                                  When  there  is  more than one
                                                  alt allele there will be  mul‐
                                                  tiple  rows,  one for each al‐
                                                  lele and, the info will  match
                                                  the `A' index
       vcfaddinfo                                 Adds info fields from the sec‐
                                                  ond file which are not present
                                                  in the first vcf file.
       vcfafpath                                  Display genotype paths
       vcfallelicprimitives                       WARNING:  this tool is consid‐
                                                  ered legacy and  is  only  re‐
                                                  tained  for  older  workflows.
                                                  It will emit a  warning!  Even
                                                  though  it can use the WFA you
                                                  should use vcfwave instead.
       vcfannotate                                Intersect the records  in  the
                                                  VCF file with targets provided
                                                  in  a BED file.  Intersections
                                                  are done on the reference  se‐
                                                  quences  in  the VCF file.  If
                                                  no VCF filename  is  specified
                                                  on  the command line (last ar‐
                                                  gument)  the  VCF  read   from
                                                  stdin.
       vcfannotategenotypes                       Examine   genotype  correspon‐
                                                  dence.  Annotate genotypes  in
                                                  the  first file with genotypes
                                                  in the second adding the geno‐
                                                  type as another flag  to  each
                                                  sample   filed  in  the  first
                                                  file.  annotation-tag  is  the
                                                  name  of the sample flag which
                                                  is added to store the  annota‐
                                                  tion.   also adds a `has_vari‐
                                                  ant' flag for sites where  the
                                                  second file has a variant.
       vcfbreakmulti                              If multiple alleles are speci‐
                                                  fied in a single record, break
                                                  the   record   into   multiple
                                                  lines, preserving  allele-spe‐
                                                  cific INFO fields.
       vcfcat                                     Concatenates VCF files
       vcfclassify                                Creates  a  new VCF where each
                                                  variant is  tagged  by  allele
                                                  class: snp, ts/tv, indel, mnp
       vcfcleancomplex                            Removes reference-matching se‐
                                                  quence  from  complex  alleles
                                                  and adjusts records to reflect
                                                  positional change.
       vcfcombine                                 Combine VCF files  positional‐
                                                  ly,   combining  samples  when
                                                  sites and alleles are  identi‐
                                                  cal.   Any number of VCF files
                                                  may  be  combined.   The  INFO
                                                  field  and  other  columns are
                                                  taken from one  of  the  files
                                                  which    are   combined   when
                                                  records  in   multiple   files
                                                  match.    Alleles   must  have
                                                  identical ordering to be  com‐
                                                  bined  into  one  record.   If
                                                  they do not, multiple  records
                                                  will be emitted.
       vcfcommonsamples                           Generates  each  record in the
                                                  first file,  removing  samples
                                                  not present in the second
       vcfcreatemulti                             Go through sorted VCF and when
                                                  overlapping alleles are repre‐
                                                  sented     across     multiple
                                                  records,  merge  them  into  a
                                                  single  multi-ALT record.  See
                                                  the documentation for more in‐
                                                  formation.
       vcfecho                                    Echo VCF to stdout (simple de‐
                                                  mo)
       vcfevenregions                             Generates a list  of  regions,
                                                  e.g. chr20:10..30   using  the
                                                  variant  density   information
                                                  provided  in  the  VCF file to
                                                  ensure that the  regions  have
                                                  even   numbers   of  variants.
                                                  This can be use to reduce  the
                                                  variance  in  runtime when di‐
                                                  viding  variant  detection  or
                                                  genotyping  by genomic coordi‐
                                                  nates.
       vcffixup                                   Generates a VCF  stream  where
                                                  AC  and NS have been generated
                                                  for each record  using  sample
                                                  genotypes
       vcfflatten                                 Removes multi-allelic sites by
                                                  picking the most common alter‐
                                                  nate.   Requires  allele  fre‐
                                                  quency specification `AF'  and
                                                  use  of `G' and `A' to specify
                                                  the fields which vary  accord‐
                                                  ing to the Allele or Genotype.
                                                  VCF  file  may be specified on
                                                  the command line or  piped  as
                                                  stdin.
       vcfgeno2alleles                            modifies  the  genotypes field
                                                  to provide the literal alleles
                                                  rather than indexes
       vcfgeno2haplo                              Convert genotype-based  phased
                                                  alleles   within  –window-size
                                                  into haplotype alleles.   Will
                                                  break  haplotype  construction
                                                  when  encountering  non-phased
                                                  genotypes on input.
       vcfgenosamplenames                         Get samplenames
       vcfglbound                                 Adjust GLs so that the maximum
                                                  GL  is  0  by dividing all GLs
                                                  for each sample by the max.
       vcfglxgt                                   Set genotypes using the  maxi‐
                                                  mum  genotype  likelihood  for
                                                  each sample.
       vcfindex                                   Adds an index  number  to  the
                                                  INFO field (id=position)
       vcfinfo2qual                               Sets  QUAL from info field tag
                                                  keyed by [key].  The VCF  file
                                                  may  be  omitted and read from
                                                  stdin.   The  average  of  the
                                                  field  is  used if it contains
                                                  multiple values.
       vcfinfosummarize                           Take annotations given in  the
                                                  per-sample  fields and add the
                                                  mean, median, min, or  max  to
                                                  the site-level INFO.
       vcfintersect                               VCF set analysis
       vcfkeepgeno                                Reduce  file  size by removing
                                                  FORMAT fields  not  listed  on
                                                  the  command  line from sample
                                                  specifications in the output
       vcfkeepinfo                                To decrease file  size  remove
                                                  INFO  fields not listed on the
                                                  command line
       vcfkeepsamples                             outputs each record in the vcf
                                                  file,  removing  samples   not
                                                  listed on the command line
       vcfld                                      Compute LD
       vcfleftalign                               Left-align  indels and complex
                                                  variants in the input using  a
                                                  pairwise   ref/alt   alignment
                                                  followed by a heuristic, iter‐
                                                  ative left realignment process
                                                  that shifts indel  representa‐
                                                  tions  to their absolute left‐
                                                  most (5’) extent.
       vcflength                                  Add length info field
       vcfnullgenofields                          Makes  the  FORMAT  for   each
                                                  variant  line  the  same (uses
                                                  all  the  FORMAT  fields   de‐
                                                  scribed in the header).  Fills
                                                  out per-sample fields to match
                                                  FORMAT.   Expands GT values of
                                                  `.'  with  number  of  alleles
                                                  based on ploidy (eg: `./.' for
                                                  dipolid).
       vcfnumalt                                  outputs a VCF stream where NU‐
                                                  MALT  has  been  generated for
                                                  each record using sample geno‐
                                                  types
       vcfoverlay                                 Overlay records in  the  input
                                                  vcf files with order as prece‐
                                                  dence.
       vcfprimers                                 For  each  VCF record, extract
                                                  the  flanking  sequences,  and
                                                  write  them to stdout as FASTA
                                                  records  suitable  for  align‐
                                                  ment.
       vcfqual2info                               Puts  QUAL  into an info field
                                                  tag keyed by [key].
       vcfremap                                   For each alternate allele, at‐
                                                  tempt to realign  against  the
                                                  reference   with  lowered  gap
                                                  open penalty.  If  realignment
                                                  is  possible, adjust the cigar
                                                  and reference/alternate  alle‐
                                                  les.   Observe  how  different
                                                  alignment parameters,  includ‐
                                                  ing context and entropy-depen‐
                                                  dent  ones,  influence variant
                                                  classification and interpreta‐
                                                  tion.
       vcfremoveaberrantgenotypes                 strips samples which  are  ho‐
                                                  mozygous but have observations
                                                  implying  heterozygosity.  Re‐
                                                  move samples for which the re‐
                                                  ported genotype (GT)  and  ob‐
                                                  servation counts disagree (AO,
                                                  RO).
       vcfremovesamples                           outputs each record in the vcf
                                                  file,  removing samples listed
                                                  on the command line
       vcfsample2info                             Take annotations given in  the
                                                  per-sample  fields and add the
                                                  mean, median, min, or  max  to
                                                  the site-level INFO.
       vcfsamplediff                              Establish   putative   somatic
                                                  variants using  reported  dif‐
                                                  ferences  between germline and
                                                  somatic  samples.   Tags  each
                                                  record where the listed sample
                                                  genotypes  differ  with .  The
                                                  first sample is assumed to  be
                                                  germline,  the second somatic.
                                                  Each  record  is  tagged  with
                                                  ={germline,somatic,loh}     to
                                                  specify the  type  of  variant
                                                  given  the genotype difference
                                                  between the two samples.
       vcfsamplenames                             List sample names
       vcfstreamsort                              Sorts the input (either  stdin
                                                  or  file)  using  a  streaming
                                                  sort  algorithm.    Guarantees
                                                  that  the  positional order is
                                                  correct provided  out-of-order
                                                  variants  are no more than 100
                                                  positions  in  the  VCF   file
                                                  apart.
       vcfwave                                    Realign  reference  and alter‐
                                                  nate alleles with WFA, parsing
                                                  out  the  `primitive'  alleles
                                                  into   multiple  VCF  records.
                                                  New records have IDs that ref‐
                                                  erence the source  record  ID.
                                                  Genotypes/samples  are handled
                                                  correctly.  Deletions generate
                                                  haploid/missing  genotypes  at
                                                  overlapping sites.

   statistics
       statistics command                         description
       ──────────────────────────────────────────────────────────────────────────
       bFst                                       bFst is a Bayesian approach to
                                                  Fst.    Importantly  bFst  ac‐
                                                  counts for genotype uncertain‐
                                                  ty in the model using genotype
                                                  likelihoods.  For a  more  de‐
                                                  tailed   description  see:  `A
                                                  Bayesian approach to inferring
                                                  population structure from dom‐
                                                  inant markers’ by Holsinger et
                                                  al. Molecular Ecology Vol  11,
                                                  issue  7 2002.  The likelihood
                                                  function has been modified  to
                                                  use  genotype likelihoods pro‐
                                                  vided  by   variant   callers.
                                                  There are five free parameters
                                                  estimated  in  the model: each
                                                  subpopulation’s  allele   fre‐
                                                  quency  and  Fis (fixation in‐
                                                  dex,  within  each  subpopula‐
                                                  tion),  a  free  parameter for
                                                  the total population’s  allele
                                                  frequency, and Fst.
       genotypeSummary                            Generates  a table of genotype
                                                  counts.   Summarizes  genotype
                                                  counts for bi-allelic SNVs and
                                                  indel
       iHS                                        iHS  calculates the integrated
                                                  haplotype score which measures
                                                  the relative decay of extended
                                                  haplotype  homozygosity  (EHH)
                                                  for the reference and alterna‐
                                                  tive  alleles  at a site (see:
                                                  voight et al. 2006,  Spiech  &
                                                  Hernandez 2014).
       meltEHH
       pFst                                       pFst  is  a  probabilistic ap‐
                                                  proach for  detecting  differ‐
                                                  ences  in  allele  frequencies
                                                  between two populations.
       pVst                                       pVst calculates vst, a measure
                                                  of CNV stratification.
       permuteSmooth                              permuteSmooth is a method  for
                                                  adding    empirical   p-values
                                                  smoothed wcFst scores.
       plotHaps                                   plotHaps provides the  format‐
                                                  ted  output  that  can be used
                                                  with `bin/plotHaplotypes.R'.
       popStats                                   General   population   genetic
                                                  statistics for each SNP
       segmentFst                                 segmentFst   creates   genomic
                                                  segments (bed  file)  for  re‐
                                                  gions with high wcFst
       segmentIhs                                 Creates  genomic segments (bed
                                                  file) for  regions  with  high
                                                  wcFst
       sequenceDiversity                          The  sequenceDiversity program
                                                  calculates two popular metrics
                                                  of haplotype diversity: pi and
                                                  extended  haplotype  homozygo‐
                                                  isty  (eHH).  Pi is calculated
                                                  using the Nei and Li 1979 for‐
                                                  mulation.   eHH  a  convenient
                                                  way  to  think about haplotype
                                                  diversity.  When eHH =  0  all
                                                  haplotypes  in  the window are
                                                  unique and when eHH  =  1  all
                                                  haplotypes  in  the window are
                                                  identical.
       vcfaltcount                                count the number of  alternate
                                                  alleles  in all records in the
                                                  vcf file
       vcfcountalleles                            Count alleles
       vcfgenosummarize                           Adds  summary  statistics   to
                                                  each record summarizing quali‐
                                                  ties  reported in called geno‐
                                                  types.   Uses:  RO  (reference
                                                  observation count), QR (quali‐
                                                  ty sum reference observations)
                                                  AO    (alternate   observation
                                                  count), QA (quality sum alter‐
                                                  nate observations)
       vcfgenotypecompare                         adds statistics  to  the  INFO
                                                  field of the vcf file describ‐
                                                  ing  the amount of discrepancy
                                                  between the genotypes (GT)  in
                                                  the vcf file and the genotypes
                                                  reported  in  the  .  use this
                                                  after vcfannotategenotypes  to
                                                  get  correspondence statistics
                                                  for two vcfs.
       vcfgenotypes                               Report the genotypes for  each
                                                  sample,  for  each  variant in
                                                  the VCF.  Convert the  numeri‐
                                                  cal represenation of genotypes
                                                  provided  by the GT field to a
                                                  human-readable  genotype  for‐
                                                  mat.
       vcfparsealts                               Alternate    allele    parsing
                                                  method.   This   method   uses
                                                  pairwise  alignment of REF and
                                                  ALTs  to  determine  component
                                                  allelic  primitives  for  each
                                                  alternate allele.
       vcfrandom                                  Generate a random VCF file
       vcfrandomsample                            Randomly sample sites from  an
                                                  input  VCF  file, which may be
                                                  provided as stdin.  Scale  the
                                                  sampling  probability  by  the
                                                  field specified in KEY.   This
                                                  may be used to provide uniform
                                                  sampling  across  allele  fre‐
                                                  quencies, for instance.
       vcfroc                                     Generates a  pseudo-ROC  curve
                                                  using  sensitivity  and speci‐
                                                  ficity estimated against a pu‐
                                                  tative truth set.   Threshold‐
                                                  ing  is provided by successive
                                                  QUAL cutoffs.
       vcfsitesummarize                           Summarize by site
       vcfstats                                   Prints statistics about  vari‐
                                                  ants in the input VCF file.
       wcFst                                      wcFst  is  Weir  & Cockerham’s
                                                  Fst for two populations.  Neg‐
                                                  ative values are  VALID,  they
                                                  are sites which can be treated
                                                  as  zero Fst.  For more infor‐
                                                  mation see Evolution, Vol.  38
                                                  N.  6 Nov 1984.   Specifically
                                                  wcFst uses equations 1,2,3,4.

SOURCE CODE

       See the source code repository at https://github.com/vcflib/vcflib

CREDIT

       Citations  are the bread and butter of Science.  If you are using this software in your research and want
       to support our future work, please cite the following publication:

       Please cite:

       A spectrum of free software tools for processing the VCF variant call format:  vcflib,  bio-vcf,  cyvcf2,
       hts-nim   and   slivar  (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009123).
       Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P (2022),  PLoS  Comput  Biol  18(5):  e1009123.
       https://doi.org/10.1371/journal.pcbi.1009123

LICENSE

       Copyright 2011-2023 (C) Erik Garrison and vcflib contributors.  MIT licensed.

AUTHORS

       Erik Garrison and vcflib contributors.

vcflib                                                                                                 vcflib(1)