Provided by: toppic_1.5.3+dfsg1-1build3_amd64 bug

NAME

       toppic - Top-down mass spectrometry based Proteoform Identification and Characterization

SYNOPSIS

          toppic [options] database-file-name spectrum-file-names

DESCRIPTION

       TopPIC  (Top-down  mass spectrometry based Proteoform Identification and Characterization) identifies and
       characterizes proteoforms at the proteome level by searching  top-down  tandem  mass  spectra  against  a
       protein  sequence  database.   TopPIC  is a successor to MS-Align+. It efficiently identifies proteoforms
       with unexpected alterations, such as mutations and post-translational  modifications  (PTMs),  accurately
       estimates  the  statistical  significance of identifications, and characterizes reported proteoforms with
       unknown mass shifts. It uses several techniques, such as indexes, spectral alignment, generation function
       methods, and the modification identification score (MIScore), to increase  the  speed,  sensitivity,  and
       accuracy.

       1. Input

             • A protein database file in the FASTA format

             • A mass spectrum data file in the msalign format

             • A text file containing LC/MS feature information (optional)

             • A text file of fixed PTMs (optional)

             • A text file of PTMs for the characterization of unexpected mass shifts (optional)

       2. Output

       TopPIC  outputs  two  comma  separated value (csv) files, an xml file, and a collection of html files for
       identified proteoforms. For example,  when  the  input  data  file  is  spectra_ms2.msalign,  the  output
       includes:

          • spectra_ms2_toppic_prsm.csv:  a  csv  file containing identified proteoform spectrum-matches (PrSMs)
            with an E-value or spectrum level FDR cutoff.

          • spectra_ms2_toppic_proteoform.csv: a csv file containing identified proteoforms with an  E-value  or
            proteoform level FDR cutoff.

          • spectra_ms2_toppic_proteoform.xml: an xml file containing identified proteoforms with the E-value or
            proteoform level FDR cutoff.

          • spectra_html/toppic_prsm_cutoff: a folder containing java script files of identified PrSMs using the
            E-value or spectrum level FDR cutoff.

          • spectra_html/toppic_proteoform_cutoff:  a  folder  containing  java script files of identified PrSMs
            using the E-value or proteoform level FDR cutoff.

          • spectra_html/topview: a folder containing html files for the visualization of identified PrSMs.

       To browse  identified  proteins,  proteoforms,  and  PrSMs,  use  a  chrome  browser  to  open  the  file
       spectrum_html/topview/index.html. Google Chrome is recommended (Firefox and IE are not recommended).

       When  the  input  contains  two  or  more  data  files,  TopPIC outputs two csv files, an xml file, and a
       collection of html files for each input file. When a file name is specified for combined identifications,
       it combines spectra and proteoforms identified from all the input  files,  removes  redundant  proteoform
       identifications,  and reports two csv files, an xml file, and a collection of html files for the combined
       results. For example, when the input is spectra1_ms2.msalign and spectra2_ms2.msalign  and  the  combined
       output file name is "combined," the output files are:

          • combined_ms2_toppic_prsm.csv:  a  csv file containing PrSMs identified from all the input files with
            an E-value or spectrum level FDR cutoff.

          • combined_ms2_toppic_proteoform.csv: a csv file containing proteoforms identified from all the  input
            files with an E-value or proteoform level FDR cutoff.

          • combined_ms2_toppic_proteoform.xml: an xml file containing proteoforms identified from all the input
            files with the E-value or proteoform level FDR cutoff.

          • combined_html/toppic_prsm_cutoff: a folder containing java script files of PrSMs identified from all
            the input files using the E-value or spectrum level FDR cutoff.

          • combined_html/toppic_proteoform_cutoff:  a  folder  containing java script files of PrSMs identified
            from all the input files using the E-value or proteoform level FDR cutoff.

          • combined_html/topview: a folder containing html files for the visualization of identified PrSMs.

OPTIONS

       -h [ --help ] Print the help message.

       -a [ --activation ] <CID|HCD|ETD|UVPD|FILE> Set the fragmentation method(s) of MS/MS spectra. When "FILE"
       is selected, the fragmentation methods of spectra are given in the  input  spectrum  data  file.  Default
       value: FILE.

       -f  [ --fixed-mod ] <C57|C58|a fixed modification file> Set fixed modifications. Three available options:
       C57, C58, or the name of a text file containing the information of fixed modifications  (see  an  example
       file). When C57 is selected, carbamidomethylation on cysteine is the only fixed modification. When C58 is
       selected, carboxymethylation on cysteine is the only fixed modification.

       -n  [  --n-terminal-form  ]  <a  list of allowed N-terminal forms> Set N-terminal forms of proteins. Four
       N-terminal forms can be selected: NONE, NME, NME_ACETYLATION,  and  M_ACETYLATION.  NONE  stands  for  no
       modifications,  NME  for N-terminal methionine excision, NME_ACETYLATION for N-terminal acetylation after
       the initiator methionine is removed,  and  M_ACETYLATION  for  N-terminal  methionine  acetylation.  When
       multiple     forms     are     allowed,    they    are    separated    by    commas.    Default    value:
       NONE,M_ACETYLATION,NME,NME_ACETYLATION.

       -d [ --decoy ] Use a shuffled decoy protein database to estimate spectrum and proteoform level FDRs. When
       -d is chosen, a shuffled decoy database is automatically generated and appended to  the  target  database
       before database search, and FDR rates are estimated using the target-decoy approach.

       -e  [  --mass-error-tolerance  ]  <a positive integer> Set the error tolerance for precursor and fragment
       masses in part-per-million (ppm). Default value: 15. When the lookup table  approach  (-l)  is  used  for
       E-value estimation, valid error tolerance values are 5, 10, and 15 ppm.

       -p  [  --proteoform-error-tolerance  ]  <a  positive number> Set the error tolerance for identifying PrSM
       clusters (in Dalton). Default value: 1.2 Dalton.

       -M [ --max-shift ] <a number> Set the maximum value for  unexpected  mass  shifts  (in  Dalton).  Default
       value: 500.

       -m [ --min-shift ] <a number> Se the minimum value for unexpected mass shifts (in Dalton). Default value:
       -500.

       -s [ --num-shift ] <0|1|2> Set the maximum number of unexpected mass shifts in a PrSM. Default value: 1.

       -t  [  --spectrum-cutoff-type  ]  <EVALUE|FDR>  Set  the  spectrum level cutoff type for filtering PrSMs.
       Default value: EVALUE.

       -v [ --spectrum-cutoff-value ] <a positive number> Set the spectrum  level  cutoff  value  for  filtering
       PrSMs. Default value: 0.01.

       -T  [  --proteoform-cutoff-type  ]  <EVALUE|FDR>  Set  the  proteoform  level  cutoff  type for filtering
       proteoforms and PrSMs. Default value: EVALUE.

       -V [ --proteoform-cutoff-value ] <a positive number> Set the proteoform level cutoff value for  filtering
       proteoforms and PrSMs. Default value: 0.01.

       -l  [  --lookup-table  ] Use a lookup table method for computing p-values and E-values. It is faster than
       the default generating function approach, but it may reduce the number of identifications.

       -r [ --num-combined-spectra ] <a positive integer> Set the number of combined spectra. The  parameter  is
       set  to  2  (or  3) for combining spectral pairs (or triplets) generated by the alternating fragmentation
       mode. Default value: 1.

       -i [ --mod-file-name ] <a common modification file> Specify a text file containing a list of common  PTMs
       for  proteoform  characterization. The PTMs are used to identify and localize PTMs in reported PrSMs with
       unknown mass shifts. See an example file.

       -H [ --miscore-threshold ] <a number between 0 and 1> Set the score  threshold  (MIScore)  for  filtering
       results of PTM characterization. Default value: 0.45.

       -u  [  --thread-number  ]  <a positive number> Set the number of threads used in the computation. Default
       value: 1. The maximum number of threads is determined by the CPU and memory  of  the  computer  used  for
       computation.  About  4  GB memory is required for each thread. If the computer has 16 GB memory and a CPU
       with 8 cores, the maximum number of threads is 4 because about 16 GB memory is needed for 4 threads.

       -x [ --no-topfd-feature ] Specify that there are no TopFD feature files for proteoform identification.

       -c [ --combined-file-name ] <a filename> Specify an output file name for  combined  identifications  when
       the input consists of multiple spectrum files.

       -k [ --keep ] Keep intermediate files generated by TopPIC.

EXAMPLES

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta with a feature file spectra.feature. The user does not need to specify the feature  file
         name.  TopPIC  will  automatically  obtain  the  names  of  feature  files  from the spectrum file name
         spectra_ms2.msalign.

         toppic proteins.fasta spectra_ms2.msalign

       • Search two deconvoluted MS/MS spectrum files spectra1_ms2.msalign and  spectra2_ms2.msalign  against  a
         protein  database file proteins.fasta with feature files. In addition, all identifications are combined
         and reported using a file name "combined."

         toppic -c combined proteins.fasta spectra1_ms2.msalign spectra2_ms2.msalign

       • Search all deconvoluted MS/MS spectrum files in the current folder  against  a  protein  database  file
         proteins.fasta with feature files.

         toppic proteins.fasta *_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta without feature files.

         toppic -x proteins.fasta spectra_ms2.msalign

       • Search a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database  file
         proteins.fasta with feature files and a fixed modification: carbamidomethylation on cysteine.

         toppic -f C57 proteins.fasta spectra_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta with feature files. In an identified proteoform, at most 2 mass shifts are  allowed  and
         the maximum allowed mass shift value is 10,000 Dalton.

         toppic -s 2 -M 10000 proteins.fasta spectra_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta with feature files. The error tolerance for precursor and fragment masses is 5 ppm.

         toppic -e 5 proteins.fasta spectra_ms2.msalign

       • Search a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database  file
         proteins.fasta  with  feature  files.  Use  the  target  decoy  approach  to compute spectrum level and
         proteoform level FDRs, filter identified proteoform spectrum-matches by a 5% spectrum  level  FDR,  and
         filter identified proteoforms by a 5% proteoform level FDR.

         toppic -d -t FDR -v 0.05 -T FDR -V 0.05 proteins.fasta spectra_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  with alternating CID, HCD, and ETD
         spectra against a protein database file proteins.fasta with feature  files.  Combine  alternating  CID,
         HCD, and ETD spectra to increase proteoform coverage.

         toppic -r 3 proteins.fasta spectra_ms2.msalign

       • Search  a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database file
         proteins.fasta with feature files. After proteoforms with unexpected mass shifts are identified, TopPIC
         matches the mass shifts to four common PTMs: acetylation, phosphorylation, oxidation  and  methylation,
         and  uses an MIScore cutoff 0.3 to filter reported PTM sites. The modification file common_mods.txt can
         be found here.

         toppic -i common_mods.txt -H 0.3 proteins.fasta spectra_ms2.msalign

       • Search a  deconvoluted  MS/MS  spectrum  file  spectra_ms2.msalign  against  a  protein  database  file
         proteins.fasta with feature files. Use 6 CPU threads to speed up the computation.

         toppic -u 6 proteins.fasta spectra_ms2.msalign

SEE ALSO

       • topfd (1)

       • topmg (1)

       • topdiff (1)

MAN PAGE PRODUCTION

       This   man   page   was  written  by  Filippo  Rusconi  <lopippo@debian.org>.  Material  was  taken  from
       http://proteomics.informatics.iupui.edu/software/toppic/manual.html.

AUTHOR

       Filippo Rusconi <lopippo@debian.org> and upstream authors (Dr. Xiaowen Liu's Lab at  Indiana  University-
       Purdue University Indianapolis and others)

COPYRIGHT

       Filippo Rusconi and Indiana University-Purdue University Indianapolis

1                                                   20200521                                           TOPPIC(1)