Provided by: libvcflib-tools_1.0.9+dfsg1-3build1_amd64 bug

NAME

       vcfwave - reduces complex alleles by pairwise alignment with BiWFA

SYNOPSIS

       vcfwave

DESCRIPTION

       vcfwave  reduces complex alleles into simpler primitive representation using pairwise alignment with BiW‐
       FA.

       Often variant callers are not perfect.  vcfwave with its companion tool vcfcreatemulti can take an exist‐
       ing VCF file that contains multiple complex overlapping and even nested alleles and, like Humpty  Dumpty,
       can  take them apart and put them together again in a more sane VCF output.  Thereby getting rid of false
       positives and often greatly simplifying the output.  We created these tools for the output from long-read
       pangenome genotypers - with 10K base pair realignments - and is used in  the  Human  Pangenome  Reference
       Consortium analyses (HPRC).

       vcfwave  realigns  reference  and  alternate alleles with the recently introduced super fast bi-wavefront
       aligner (WFA).  vcfwave parses out the original `primitive’ alleles into multiple VCF records and vcfcre‐
       atemulti puts them together again.  These tools can handle insertions, deletions, inversions  and  nested
       sequences.   In  both  tools information is tracked on original positions and genotypes are handled.  New
       records have IDs that reference the source record ID.  Deletion alleles will result in  haploid  (missing
       allele) genotypes overlapping the deleted region.

       A  typical  workflow will call vcfwave to realign all ALT alleles against the reference and spit out sim‐
       plified VCF records.  Next use a tool such as bcftools norm -m- to normalise the VCF  records  and  split
       out  multiple  ALT  alleles into separate VCF records.  Finally use vcfcreatemulti to create multi-allele
       VCF records again.

       PERFORMANCE:

       Unlike traditional aligners that run in quadratic time, the recently  introduced  wavefront  aligner  WFA
       runs  in  time  O(ns+s^2),  proportional to the sequence length n and the alignment score s, using O(s^2)
       memory (or O(s) using the ultralow/BiWFA mode).  Therefore WFA does not choke on longer alignments.

       Speed-wise vcfwave can still be faster.  See also the performance docs for some metrics and discussion.

       READING:

       See also the humpty dumpty companion tool vcfcreatemulti.

   Options
       -h, –help
              shows help message and exits.

       See more below.

EXIT VALUES

       0      Success

       not 0  Failure

EXAMPLES

       Current command line options:

              >>> head("vcfwave -h",26)
              >
              usage: vcfwave [options] [file]
              >
              Realign reference and alternate alleles with WFA, parsing out the
              'primitive' alleles into multiple VCF records. New records have IDs that
              reference the source record ID.  Genotypes/samples are handled
              correctly. Deletions generate haploid/missing genotypes at overlapping
              sites.
              >
              options:
                  -p, --wf-params PARAMS  use the given BiWFA params (default: 0,19,39,3,81,1)
                                          format=match,mismatch,gap1-open,gap1-ext,gap2-open,gap2-ext
                  -f, --tag-parsed FLAG   Annotate decomposed records with the source record position
                                          (default: ORIGIN).
                  -L, --max-length LEN    Do not manipulate records in which either the ALT or
                                          REF is longer than LEN (default: unlimited).
                  -K, --inv-kmer K        Length of k-mer to use for inversion detection sketching (default: 17).
                  -I, --inv-min LEN       Minimum allele length to consider for inverted alignment (default: 64).
                  -t, --threads N         Use this many threads for variant decomposition (default is 1).
                                          For most datasets threading may actually slow vcfwave down.
                  --quiet                 Do not display progress bar.
                  -d, --debug             Debug mode.
              >
              Note the -k,--keep-info switch is no longer in use and ignored.
              >
              Type: transformation

       vcfwave picks complex regions and simplifies nested alignments.  For example:

              >>> sh("grep 10158243 ../samples/10158243.vcf")
              grch38#chr4     10158243        >3655>3662      ACCCCCACCCCCACC ACC,AC,ACCCCCACCCCCAC,ACCCCCACC,ACA     60      .       AC=64,3,2,3,1;AF=0.719101,0.0337079,0.0224719,0.0337079,0.011236;AN=89;AT=>3655>3656>3657>3658>3659>3660>3662,>3655>3656>3660>3662,>3655>3660>3662,>3655>3656>3657>3658>3660>3662,>3655>3656>3657>3660>3662,>3655>3656>3661>3662;NS=45;LV=0     GT      0|0     1|1     1|1     1|0     5|1     0|4     0|1     0|1     1|1     1|1     1|1     1|1     1|1     1|1     1|1     4|3     1|1     1|1     1|1     1|0     1|0     1|0     1|0     1|1     1|1     1|4     1|1     1|1     3|0     1|0     1|1     0|1     1|1     1|1     2|1     1|2     1|1     1|1     0|1     1|1     1|1     1|0     1|2     1|1     0

       This aligns and adjusts the genotypes accordingly splitting into multiple records, one  for  each  unique
       allele found in the alignments:

              >>> sh("../build/vcfwave -L 1000 ../samples/10158243.vcf|grep -v ^\#")
              grch38#chr4     10158244        >3655>3662_1    CCCCCACCCCCAC   C       60      .       AC=1;AF=0.011236;AN=89;AT=>3655>3656>3657>3660>3662;NS=45;LV=0;ORIGIN=grch38#chr4:10158243;LEN=12;INV=0;TYPE=del        GT      0|0     0|0     0|0     0|0     1|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0
              grch38#chr4     10158244        >3655>3662_2    CCCCCACCCCCACC  C       60      .       AC=3;AF=0.033708;AN=89;AT=>3655>3656>3660>3662;NS=45;LV=0;ORIGIN=grch38#chr4:10158243;LEN=13;INV=0;TYPE=del     GT      0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     1|0     0|1     0|0     0|0     0|0     0|0     0|0     0|0     0|1     0|0     0
              grch38#chr4     10158245        >3655>3662_3    CCCCACCCCCACC   C       60      .       AC=64;AF=0.719101;AN=89;AT=>3655>3656>3657>3658>3659>3660>3662;NS=45;LV=0;ORIGIN=grch38#chr4:10158243;LEN=12;INV=0;TYPE=del     GT      0|0     1|1     1|1     1|0     0|1     0|0     0|1     0|1     1|1     1|1     1|1     1|1     1|1     1|1     1|1     0|0     1|1     1|1     1|1     1|0     1|0     1|0     1|0     1|1     1|1     1|0     1|1     1|1     0|0     1|0     1|1     0|1     1|1     1|1     0|1     1|0     1|1     1|1     0|1     1|1     1|1     1|0     1|0     1|1     0
              grch38#chr4     10158251        >3655>3662_4    CCCCACC C       60      .       AC=3;AF=0.033708;AN=89;AT=>3655>3656>3657>3658>3660>3662;NS=45;LV=0;ORIGIN=grch38#chr4:10158243;LEN=6;INV=0;TYPE=del    GT      0|0     0|0     0|0     0|0     0|0     0|1     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     1|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|1     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0
              grch38#chr4     10158256        >3655>3662_5    CC      C       60      .       AC=2;AF=0.022472;AN=89;AT=>3655>3660>3662;NS=45;LV=0;ORIGIN=grch38#chr4:10158243;LEN=1;INV=0;TYPE=del   GT      0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|1     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     1|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0
              grch38#chr4     10158257        >3655>3662_6    C       A       60      .       AC=1;AF=0.011236;AN=89;AT=>3655>3656>3657>3660>3662;NS=45;LV=0;ORIGIN=grch38#chr4:10158243;LEN=1;INV=0;TYPE=snp GT      0|0     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     .|.     0

./vcfwave -L 10000 ../samples/grch38#chr8_36353854-36453166.vcf > ../test/data/regression/vcfwave_4.vcf

                            run_stdout(“vcfwave    -L    10000    ../samples/grch38#chr8_36353854-36453166.vcf”,
                            ext=“vcf”) output in vcfwave_4.vcf

./vcfwave -L 10000 ../samples/grch38#chr4_10083863-10181258.vcf > ../test/data/regression/vcfwave_5.vcf

                            run_stdout(“vcfwave    -L    10000    ../samples/grch38#chr4_10083863-10181258.vcf”,
                            ext=“vcf”) output in vcfwave_5.vcf

              ## Inversions

              We can also handle inversions.
              This test case includes one that was introduced by building a variation graph with an inversion and then decomposing it into a VCF with `vg deconstruct` and finally "popping" the inversion variant with [`vcfbub`](https://github.com/pangenome/vcfbub).

              From

       a 281 >1>9 AGCCGGGGCAGAAAGTTCTTCCTTGAATGTGGTCATCTGCATTTCAGCTCAGGAATCCTGCAAAAGACAG CTGTCTTTTGCAGGATTCCTGT‐
       GCTGAAATGCAGATGACCGCATTCAAGGAAGAACTATCTGCCCCGGCT                           60                           .
       AC=1;AF=1;AN=1;AT=>1>2>3>4>5>6>7>8>9,>1<8>10<6>11<4>12<2>9;NS=1;LV=0 GT 1

              To

              ```python
              >>> sh("../build/vcfwave ../samples/inversion.vcf|grep -v ^\#|head -3")
              a       293     >1>9_1  A       T       60      .       AC=1;AF=1.000000;AN=1;AT=>1>2>3>4>5>6>7>8>9;NS=1;LV=0;ORIGIN=a:281;LEN=1;INV=1;TYPE=snp GT      1
              a       310     >1>9_2  T       C       60      .       AC=1;AF=1.000000;AN=1;AT=>1>2>3>4>5>6>7>8>9;NS=1;LV=0;ORIGIN=a:281;LEN=1;INV=1;TYPE=snp GT      1
              a       329     >1>9_3  T       A       60      .       AC=1;AF=1.000000;AN=1;AT=>1>2>3>4>5>6>7>8>9;NS=1;LV=0;ORIGIN=a:281;LEN=1;INV=1;TYPE=snp GT      1

LICENSE

       Copyright 2022-2023 (C) Erik Garrison, Pjotr Prins and vcflib contributors.  MIT licensed.

AUTHORS

       Erik Garrison, Pjotr Prins and other vcflib contributors.

vcfwave (vcflib)                                                                                      VCFWAVE(1)