Ubuntu Manpage: scoary - pangenome-wide association studies

NAME

       scoary - pangenome-wide association studies

SYNOPSIS

       scoary  [-h]  [-t  TRAITS]  [-g  GENES]  [-n  NEWICKTREE]  [-s  START_COL]  [--delimiter  DELIMITER]  [-r
       RESTRICT_TO]  [-o  OUTDIR]  [-u]  [-p  P_VALUE_CUTOFF  [P_VALUE_CUTOFF  ...]]    [-c   [{I,B,BH,PW,EPW,P}
       [{I,B,BH,PW,EPW,P}  ...]]] [-m MAX_HITS] [--include_input_columns GRABCOLS] [-w] [--no-time] [-e PERMUTE]
       [--no_pairwise] [--collapse] [--threads THREADS] [--test] [--citation] [--version]

OPTIONS

optional arguments:
-h, --help
show this help message and exit

Input options:
-t TRAITS, --traits TRAITS
Input trait table (comma-separated-values). Trait presence is indicated by 1, trait absence by 0.
Assumes strain names in the first column and trait names in the first row

-g GENES, --genes GENES
Input gene presence/absence table (comma-separatedvalues) from ROARY. Strain names must be equal
to those in the trait table

-n NEWICKTREE, --newicktree NEWICKTREE
Supply a custom tree (Newick format) for phylogenetic analyses instead instead of calculating it
internally.

-s START_COL, --start_col START_COL
On which column in the gene presence/absence file do individual strain info start. Default=15.
(1-based indexing)

--delimiter DELIMITER
The delimiter between cells in the gene presence/absence and trait files, as well as the output
file.

-r RESTRICT_TO, --restrict_to RESTRICT_TO
Use if you only want to analyze a subset of your strains. Scoary will read the provided
comma-separated table of strains and restrict analyzes to these.

Output options:
-o OUTDIR, --outdir OUTDIR
Directory to place output files. Default = .

-u, --upgma_tree
This flag will cause Scoary to write the calculated UPGMA tree to a newick file

-p P_VALUE_CUTOFF [P_VALUE_CUTOFF ...], --p_value_cutoff P_VALUE_CUTOFF [P_VALUE_CUTOFF ...]
P-value cut-off / alpha level. For Fishers, Bonferronis, and Benjamini-Hochbergs tests, SCOARY
will not report genes with higher p-values than this. For empirical p-values, this is treated as
an alpha level instead. I.e. 0.02 will filter all genes except the lower and upper percentile from
this test. Run with "-p 1.0" to report all genes. Accepts standard form (e.g. 1E-8). Provide a
single value (applied to all) or exactly as many values as correction criteria and in
corresponding order. (See example under correction). Default = 0.05

-c [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]], --correction [{I,B,BH,PW,EPW,P} [{I,B,BH,PW,EPW,P} ...]]
Apply the indicated filtration measure. Allowed values are I, B, BH, PW, EPW, P. I=Individual
(naive) p-value. B=Bonferroni adjusted p-value. BH=BenjaminiHochberg adjusted p. PW=Best (lowest)
pairwise comparison. EPW=Entire range of pairwise comparison p-values. P=Empirical p-value from
permutations. You can enter as many correction criteria as you would like. These will be
associated with the p_value_cutoffs you enter. For example "-c I EPW -p 0.1 0.05" will apply the
following cutoffs: Naive p-value must be lower than 0.1 AND the entire range of pairwise
comparison values are below 0.05 for this gene. Note that the empirical p-values should be
interpreted at both tails. Therefore, running "-c P -p 0.05" will apply an alpha of 0.05 to the
empirical (permuted) p-values, i.e. it will filter everything except the upper and lower 2.5
percent of the distribution. Default = Individual p-value. (I)

-m MAX_HITS, --max_hits MAX_HITS
Maximum number of hits to report. SCOARY will only report the top max_hits results per trait

--include_input_columns GRABCOLS
Grab columns from the input Roary file. and puts them in the output. Handles comma and ranges,
e.g. --include_input_columns 4,6,8,16-23. The special keyword ALL will include all relevant input
columns in the output

-w, --write_reduced
Use with -r if you want Scoary to create a new gene presence absence file from your reduced set of
isolates. Note: Columns 1-14 (No. sequences, Avg group size nuc etc) in this file do not reflect
the reduced dataset. These are taken from the full dataset.

--no-time
Output file in the form TRAIT.results.csv, instead of TRAIT_TIMESTAMP.csv. When used with the -w
argument will output a reduced gene matrix in the form gene_presence_absence_reduced.csv rather
than gene_presence_absence_reduced_TIMESTAMP.csv

Analysis options:
-e PERMUTE, --permute PERMUTE
Perform N number of permutations of the significant results post-analysis. Each permutation will
do a label switching of the phenotype and a new p-value is calculated according to this new
dataset. After all N permutations are completed, the results are ordered in ascending order, and
the percentile of the original result in the permuted p-value distribution is reported.

--no_pairwise
Do not perform pairwise comparisons. Inthis mode, Scoary will perform population structure-naive
calculations only. (Fishers test, ORs etc). Useful for summary operations and exploring sets.
(Genes unique in groups, intersections etc) but not causal analyses.

--collapse
Add this to collapse correlated genes (genes that have identical distribution patterns in the
sample) into merged units.

Misc options:
--threads THREADS
Number of threads to use. Default = 1

--test Run Scoary on the test set in exampledata, overriding all other parameters.

--citation
Show citation information, and exit.

--version
Display Scoary version, and exit.

by Ola Brynildsrud (olbb@fhi.no)

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.

scoary 1.6.16                                     January 2019                                         SCOARY(1)