Provided by: qtltools_1.3.1+dfsg-4build3_amd64 bug

NAME

       QTLtools trans - trans QTL analysis

SYNOPSIS

       QTLtools  trans  --vcf  [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]  --bed  quantifications.bed.gz  [--nominal  |
       --permute | --sample integer | --adjust in.txt] --out output.txt [OPTIONS]

DESCRIPTION

       This mode maps trans (distal) quantitative trait loci (QTLs) that affect  the  phenotypes,  using  linear
       regression.   The  method is detailed in <https://www.nature.com/articles/ncomms15452>.  We first regress
       out the provided covariates from the phenotype data, followed by running the  linear  regression  between
       the  phenotype residuals and the genotype.  If --normal and --cov are provided at the same time, then the
       residuals after the covariate correction are rank  normal  transformed.   It  incorporates  an  efficient
       permutation  scheme.   You can run a nominal pass (--nominal) listing all genotype-phenotype associations
       below a certain threshold, a permutation pass (--permute or --sample no_genes_to_sample)  to  empirically
       characterize  the null distribution of associations, or adjust the nominal p-values based on permutations
       (--adjust).

       In the full permutation scheme (--permute) we  permute  all  phenotypes  using  the  same  random  number
       sequence  to  preserve the correlation structure.  By doing so, the only association we actually break in
       the data is between the genotype and the phenotype data.  Then, we proceed with  a  standard  association
       scan  identical to the one used in the nominal pass.  In practice, we repeat this for 100 permutations of
       the phenotype data.  Subsequently, we can proceed with FDR correction by ranking all the nominal p-values
       in ascending order and by counting how many p-values  in  the  permuted  data  sets  are  smaller.   This
       provides  an  FDR  estimate:  if we have 500 p-values in the permuted data sets that are smaller than the
       100th smallest nominal p-value, we can then assume that the FDR for the 100 first associations is  around
       5% (=500/(100 × 100)).

       To  enable  fast screening in trans, we also designed an approximation of the method described just above
       based on what we already do in cis.  To make it possible, we assume that the phenotypes  are  independent
       and  normally  distributed  (which can be enforced with --normal).  The idea is that since all phenotypes
       are normally distributed, effectively they are the same, and  also  the  cis  region  removed  from  each
       phenotype  is  so  small compared to rest of the genome that its phenotype specific impact is negligible.
       Hence the number of and the correlation amongst variants for each phenotype is  approximately  the  same,
       and  each  phenotype  is  approximately  the  same;  thus  we can run permutations with a small number of
       phenotypes  rather  then  all,  which  drastically  decreases  the  computational  burden  and  the  null
       distribution  generated  can  be  applied  to  all phenotypes.  The implementation draws from the null by
       permuting some randomly chosen phenotypes, testing for  associations  with  all  variants  in  trans  and
       storing  the  smallest  p-value.  When we repeat this many times (typically 1000), effectively building a
       null distribution of the strongest associations for a single phenotype.  We then make  it  continuous  by
       fitting  a  beta  distribution as we do in cis and use it to adjust every nominal p-value coming from the
       initial pass for the number of variants being tested.  To correct for  the  number  of  phenotypes  being
       tested,  we  estimate FDR as we do in cis; that is from the best adjusted p-values per phenotype (one per
       phenotype).  This also gives an adjusted p-value threshold that we use to identify all  phenotype-variant
       pairs  that are whole-genome significant.  In our experiments, this approach gives similar results to the
       full permutation scheme both in term of FDR estimates and number of discoveries, while running faster.

       Since linear regressions assumes normally distributed data, we highly recommend using the --normal option
       to rank normal transform the phenotype quantifications in order to avoid false positive associations  due
       to  outliers.   If  you are using the approximate permutation scheme (--sample) you MUST use the --normal
       option or make sure that your phenotypes are normally distributed.

OPTIONS

       --vcf [in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
              Genotypes in VCF/BCF format, or another molecular phenotype in BED format.  If there is a DS field
              in  the  genotype  FORMAT  of  a  variant  (dosage  of  the  genotype  calculated  from   genotype
              probabilities, e.g. after imputation), then this is used as the genotype.  If there is only the GT
              field in the genotype FORMAT then this is used and it is converted to a dosage.  REQUIRED.

       --bed quantifications.bed.gz
              Molecular phenotype quantifications in BED format.  REQUIRED.

       --out output.txt
              Output file.  REQUIRED.

       --cov covariates.txt
              Covariates to correct the phenotype data with.

       --normal
              Rank  normal  transform  the  phenotype  data  so  that  each  phenotype  is normally distributed.
              RECOMMENDED.

       --window integer
              Size of the cis window to remove flanking each phenotype's start position.  DEFAULT=5000000.

       --threshold float
              P-value threshold below which hits are reported.  Give 1.0 to print everything, which may generate
              a huge file.  When --adjust  is  provided,  this  threshold  applies  to  the  adjusted  p-values.
              DEFAULT=1e-5.

       --bins integer
              Number of bins to use to categorize all p-values above --threshold.  DEFAULT=1000.

       --nominal
              Calculate  the nominal p-value for the genotype-phenotype associations and print out the ones that
              pass the provided threshold.  Mutually exclusive with --permute, --sample and --adjust.

       --permute
              Permute all phenotypes together, once.  For multiple permutations you need to  change  the  random
              seed using --seed for each permutation.  Mutually exclusive with --nominal, --sample and --adjust.

       --sample integer
              Permute  randomly  chosen phenotypes integer times.  Mutually exclusive with --nominal, --permute,
              --adjust, and --chunk.

       --adjust filename
              Test and adjust p-values using  the  null  distribution  in  filename.   Mutually  exclusive  with
              --nominal, --permute, and --sample.

       --chunk integer1 integer2
              For  parallelization.   Divide  the  data  into integer2 number of chunks and process chunk number
              integer1.  Minimum number of chunks has to be at least the same number of chromosomes in the --bed
              file.

OUTPUT FILES

       .hits.txt.gz
        Space separated results output file detailing the variant-phenotype pairs that pass the  threshold  with
        the following columns:
        1   The phenotype ID
        2   The phenotype chromosome
        3   Start position of the phenotype
        4   The variant ID
        5   The variant chromosome
        6   The start position of the variant
        7   The nominal p-value of the association between the variant and the phenotype.
        8   The adjusted p-value of the association between the variant and the phenotype.  Requires --adjust
        9   Correlation coefficient

       .best.txt.gz
        Space separated output file listing the most significant variant per phenotype.
        1   The phenotype ID
        2   The adjusted p-value of the association between the variant and the phenotype.  Requires --adjust
        3   The nominal p-value of the association between the variant and the phenotype.
        4   The variant ID

       .bins.txt.gz
        Space  separated  output  file  containing  the  binning  of all hits with a p-value below the specified
        --threshold.
        1   The index of the bin
        2   The lower bound of the correlation coefficient for this bin
        3   The upper bound of the correlation coefficient for this bin
        4   The upper bound of the p-value for this bin
        5   The lower bound of the p-value for this bin

FULL PERMUTATION ANALYSIS EXAMPLE

       1 Run a nominal analysis, rank normal transforming the phenotypes and outputting all associations with  a
         p-value below 1e-5:

         QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --nominal --normal --out
         trans.nominal

       2 Run  a full permutation analysis with 100 jobs on a compute cluster, run the following making sure that
         you change the seed for each permutation iteration (qsub needs to be  changed  to  the  job  submission
         system used [bsub, psub, etc...])

         for j in $(seq 1 100); do
             echo  "QTLtools  trans  --vcf  genotypes.chr22.vcf.gz  --bed genes.simulated.chr22.bed.gz --permute
             --normal --out trans.perm$j.txt --seed $j" | qsub
         done

APPROXIMATE PERMUTATION ANALYSIS EXAMPLE

       1 Build the null distribution randomly selecting  1000  phenotypes,  and  rank  normal  transforming  the
         phenotypes:

         QTLtools  trans  --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --sample 1000 --normal
         --out trans.sample

       2 Run the nominal pass adjusting the p-values with the given null distribution, rank normal  transforming
         the phenotypes, and printing out associations with an adjusted p-value less than 0.1:

         QTLtools    trans    --vcf    genotypes.chr22.vcf.gz    --bed   genes.simulated.chr22.bed.gz   --adjust
         trans.sample.best.txt.gz --threshold 0.1 --normal --out trans.adjust

SEE ALSO

       QTLtools(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       o Versions up to and including 1.2, suffer from a bug in reading  missing  genotypes  in  VCF/BCF  files.
         This  bug  affects  variants with a DS field in their genotype's FORMAT and have a missing genotype (DS
         field is .) in one of the samples, in which case genotypes for all the  samples  are  set  to  missing,
         effectively removing this variant from the analyses.

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Delaneau,  O.,  Ongen, H., Brown, A. et al. A complete tool set for molecular QTL discovery and analysis.
       Nat Commun 8, 15452 (2017).  <https://doi.org/10.1038/ncomms15452>

AUTHORS

       Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)

QTLtools-v1.3                                      06 May 2020                                 QTLtools-trans(1)