Ubuntu Manpage: BuildConsensus.py - Builds a consensus sequence for each set of input sequences

NAME

       BuildConsensus.py - Builds a consensus sequence for each set of input sequences

DESCRIPTION

       usage: BuildConsensus.py [--version] [-h] -s SEQ_FILES [SEQ_FILES ...]

       [-o OUT_FILES [OUT_FILES ...]] [--outdir OUT_DIR]
              [--outname OUT_NAME] [--log LOG_FILE] [--failed] [--fasta] [--delim DELIMITER DELIMITER DELIMITER]
              [--nproc  NPROC]  [-n  MIN_COUNT]  [--bf  BARCODE_FIELD] [-q MIN_QUAL] [--freq MIN_FREQ] [--maxgap
              MAX_GAP] [--pf PRIMER_FIELD] [--prcons PRIMER_FREQ] [--cf COPY_FIELDS [COPY_FIELDS  ...]]   [--act
              {min,max,sum,set,majority}  [{min,max,sum,set,majority}  ...]]   [--dep] [--maxdiv MAX_DIVERSITY |
              --maxerror MAX_ERROR]

       Builds a consensus sequence for each set of input sequences

   help:
       --version
              show program's version number and exit

       -h, --help
              show this help message and exit

   standard arguments:
       -s SEQ_FILES [SEQ_FILES ...]
              A list of FASTA/FASTQ files containing sequences to process. (default: None)

       -o OUT_FILES [OUT_FILES ...]
              Explicit output file name(s). Note, this argument cannot be used with the --failed,  --outdir,  or
              --outname  arguments.  If  unspecified,  then  the  output  filename  will  be  based on the input
              filename(s).  (default: None)

       --outdir OUT_DIR
              Specify to changes the output directory to the location specified. The  input  file  directory  is
              used if this is not specified. (default: None)

       --outname OUT_NAME
              Changes  the  prefix of the successfully processed output file to the string specified. May not be
              specified with multiple input files. (default: None)

       --log LOG_FILE
              Specify to write verbose logging to a file. May  not  be  specified  with  multiple  input  files.
              (default: None)

       --failed
              If specified create files containing records that fail processing. (default: False)

       --fasta
              Specify to force output as FASTA rather than FASTQ.  (default: None)

       --delim DELIMITER DELIMITER DELIMITER
              A list of the three delimiters that separate annotation blocks, field names and values, and values
              within a field, respectively. (default: ('|', '=', ','))

       --nproc NPROC
              The  number  of simultaneous computational processes to execute (CPU cores to utilized). (default:
              4)

   consensus generation arguments:
       -n MIN_COUNT
              The minimum number of sequences needed to define a valid consensus. (default: 1)

       --bf BARCODE_FIELD
              Position of description barcode field to group sequences by. (default: BARCODE)

       -q MIN_QUAL
              Consensus quality score cut-off under which an ambiguous character is  assigned;  does  not  apply
              when quality scores are unavailable. (default: 0)

       --freq MIN_FREQ
              Fraction of character occurrences under which an ambiguous character is assigned. (default: 0.6)

       --maxgap MAX_GAP
              If  specified,  this  defines a cut-off for the frequency of allowed gap values for each position.
              Positions exceeding the threshold are deleted from the consensus. If not  defined,  positions  are
              always retained. (default: None)

       --pf PRIMER_FIELD
              Specifies the field name of the primer annotations (default: None)

       --prcons PRIMER_FREQ
              Specify to define a minimum primer frequency required to assign a consensus primer, and filter out
              sequences with minority primers from the consensus building step. (default: None)

       --cf COPY_FIELDS [COPY_FIELDS ...]
              Specifies  a  set of additional annotation fields to copy into the consensus sequence annotations.
              (default: None)

       --act {min,max,sum,set,majority} [{min,max,sum,set,majority} ...]
              List of actions to take for each copy field which defines how each  annotation  will  be  combined
              into  a  single  value.  The  actions  "min",  "max", "sum" perform the corresponding mathematical
              operation on numeric annotations. The action "set" combines annotations  into  a  comma  delimited
              list of unique values and adds an annotation named <FIELD>_COUNT specifying the count of each item
              in the set. The action "majority" assigns the most frequent annotation to the consensus annotation
              and  adds  an  annotation  named  <FIELD>_FREQ  specifying  the  frequency  of the majority value.
              (default: None)

       --dep  Specify to calculate consensus quality with a nonindependence assumption (default: False)

       --maxdiv MAX_DIVERSITY
              Specify to calculate the nucleotide diversity of each read group (average pairwise error rate) and
              remove groups exceeding the given diversity threshold.  Diversity is calculate for  all  positions
              within  the  read  group,  ignoring any character filtering imposed by the -q, --freq and --maxgap
              arguments. Mutually exclusive with --maxerror. (default: None)

       --maxerror MAX_ERROR
              Specify to calculate the error rate of each read group (rate of  mismatches  from  consensus)  and
              remove  groups exceeding the given error threshold. The error rate is calculated against the final
              consensus sequence, which may include masked positions due to the -q and --freq arguments and  may
              have  deleted positions due to the --maxgap argument. Mutually exclusive with --maxdiv.  (default:
              None)

   output files:
              consensus-pass

              consensus reads.

              consensus-fail

              raw reads failing consensus filtering criteria.

   output annotation fields:
              PRIMER

              a comma delimited list of unique primer annotations found within the barcode read group.

              PRCOUNT

              a comma delimited list of the corresponding counts of unique primer annotations.

              PRCONS

              the majority primer within the barcode read group.

              PRFREQ

              the frequency of the majority primer.

              CONSCOUNT

              the count of reads within the barcode read group which contributed to the consensus sequence. This
              is the total size of the read group,  minus  sequence  excluded  due  to  user  defined  filtering
              criteria.

AUTHOR

        This manpage was written by Andreas Tille for the Debian distribution and
        can be used for any other usage of the program.

BuildConsensus.py 0.6.0                             May 2020                                BUILDCONSENSUS.PY(1)