Provided by: pftools_3.2.12-1_amd64 bug

NAME

       pfmake - generate a profile from a multiple sequence alignment

SYNOPSIS

       pfmake    [ -0123abcehlms ] [ -E gap_extend ] [ -F score_multiplier ] [ -G gap_open ] [ -H high_init/term
                 ] [ -I gap_increment ] [ -L low_init/term ] [ -M gap_multiplier ] [ -S matrix_multiplier ] [ -T
                 gap_region ] [ -X gap_excision ] [ ms_file | - ] score_matrix [ profile ] [ parameters ]

DESCRIPTION

       pfmake generates a PROSITE profile from a multiple sequence alignment using methods described by Gribskov
       et  al.   (1990),  Luethy  et al.  (1994), and Thompson et al.  (1994), with modifications to exploit the
       features of the new profile format.  The file containing the multiple sequence alignment  (ms_file)  must
       be  either  in  MSF  format  as generated by GCG programs or by readseq (checksums are ignored) or in MSA
       format as created by psa2msa(1).  If '-' is specified  instead  of  a  filename,  the  multiple  sequence
       alignment is read from the standard input. The score_matrix file must also be in GCG format.

       If  an  already existing profile is given as input via the third optional argument, the parameters of the
       DISJOINT, NORMALIZATION and CUT_OFF blocks will be read from input, all other profile parameters will  be
       recalculated.   Header  and  footer lines outside the matrix block will also be transferred from input to
       output.

       If no input profile is given, the disjointness definition will be set to  PROTECT  with  borders  leaving
       short  unprotected  tails  (maximum  5  positions)  at  the  beginning  and  at  the  end of the profile.
       Furthermore, one normalization mode (n_score = raw_score / F, where F is the output score multiplier, see
       below), and two cut-off values (level 0: 8.5, level -1: 6.5) will be defined.

OPTIONS

       ms_file
              Input multiple sequence alignment.
              The content of the file must be either in MSF or in MSA format.  If the filename is replaced by  a
              '-', pfmake will read the input alignment from stdin.

       score_matrix
              Residue score matrix file.
              Contains the substitution scores for all pairs of residues of the sequence alphabet. The file must
              be in GCG format.

       profile
              Optional profile file.
              If  a  filename  is  specified,  the  profile will be parsed and those parameters mentioned in the
              description section will be kept for the computation of the output profile.

       -0     Global alignment mode.
              Initiation (termination) at low cost is possible only if the alignment  starts  at  the  beginning
              (end) of the profile and at the beginning (end) of the sequence.

       -1     Domain global alignment mode.
              Initiation  (termination)  at  low cost is possible only at the beginning (end) of the profile; it
              may start and end at any position within the sequence.

       -2     Semi-global alignment mode.
              Initiation (termination) at low cost is possible if the alignment starts either at  the  beginning
              (end) of the profile or at the beginning (end) of the sequences.
              This is the default alignment mode.

       -3     Local alignment mode.
              Initiation  (termination)  at  low cost is possible anywhere. The high-cost initiation/termination
              score (parameter H) is meaningless.

       -a     Causes pfsearch to weight gaps asymmetrically, as in Gribskov et al.  (1990).

       -b     Block profile mode.
              By imposing additional constraints on  the  placement  of  insertions  and  deletions,  this  mode
              produces  profiles  that  favor  alignments with insertions and deletions positioned symmetrically
              around a few positions. For each gap region a gap center is defined which usually  corresponds  to
              the  place  where  gap  excision  has been applied (see parameter X).  If no gap excision has been
              applied, the position is chosen such as to maximize the sum of deletion opening events before, and
              deletion closing events after the gap center.  Within a given gap region reduced deletion  opening
              penalties  are  offered  only  before,  reduced deletion closing penalties only after, and reduced
              insertion penalties only at the center.
              This option is incompatible with options -a and -e and automatically disables them.

       -c     Circular profile.
              The topology of the profile is declared as circular. The first and the last insert  positions  are
              merged by retaining the higher value of each parameter type.

       -e     Enables  endgap-weighting  mode  as  implemented  in  the GCG program ProfileMake.  Endgaps in the
              multiple sequence alignment will be interpreted as deletions relative to the other  sequences  and
              thus  be  considered  for  the  delineation of gap regions.  The default is no endgap weighting as
              introduced by Thompson et al.  (1994) in the program ProfileWeight.

       -h     Display usage help text.

       -l     Remove output line length limit. Individual lines of the output profile can exceed a length of 132
              characters, removing the need to wrap them over several lines.

       -m     Input multiple sequence alignment is in MSA format.

       -s     Causes pfsearch to weight gaps symmetrically  (default  mode).  The  initial  gap  opening  scores
              (MD, MI) computed from the maximal gap length and the command-line parameters E, G, I, and M, will
              be  divided  by  two  and the resulting value will be assigned to both gap opening and gap closing
              scores (MI, IM, MD, DM).

       -E gap_extend
              Gap extension penalty.  See Gribskov et al.  (1990).
              Default: 0.2 (appropriate for 1/3 bit-scaled blosum45 matrix)

       -F score_multiplier
              Output score multiplier.
              On output, all profile scores are multiplied by this factor and rounded to nearest integers.
              Default: 100

       -G gap_open
              Gap opening penalty.  See Gribskov et al.  (1990).
              Default: 2.1 (appropriate for 1/3 bit-scaled blosum45 matrix)

       -H high_init/term
              High-cost initiation/termination score.
              This score will be applied  to  all  external  and  internal  initiation  and  termination  scores
              corresponding to path matrix positions where initiation or termination at low cost is not possible
              according to the alignment mode specified.
              Default: * (low-value)

       -I gap_increment
              Gap penalty multiplier increment.  See Gribskov et al.  (1990).
              Default: 0.1

       -L low_init/term
              Low-cost initiation/termination score.
              This  score  will  be  applied  to  all  external  and  internal initiation and termination scores
              corresponding to path matrix positions where initiation or termination at  low  cost  is  possible
              according to the alignment mode specified.
              Default: 0

       -M gap_multiplier
              Maximum gap penalty multiplier.  See Gribskov et al.  (1990).  Default: 0.333

       -S matrix_multiplier
              Score matrix multiplier.
              On input, the numbers of the score matrix are multiplied by this factor.
              Default: 0.1

       -T gap_region
              Gap region threshold.
              This  is  the  minimal fraction of gap characters a column of the multiple sequence alignment must
              contain in order to be considered part of a gap region.
              Default: 0.01

       -X gap_excision
              Gap excision threshold.
              This is the minimal fraction of non-gap characters a column of  the  multiple  sequence  alignment
              must  contain  in  order to be converted into a match position. The IM and MI transition scores of
              insert positions corresponding to excised columns are set to zero;  the  other  parameters  remain
              unchanged.
              Default: 0.5

PARAMETERS

       Note:  for  backwards  compatibility, release 2.3 of the pftools package will parse the version 2.2 style
              parameters, but these are deprecated and the corresponding option (refer to the  options  section)
              should be used instead.

       E=#    Gap extension penalty.
              Use option -E instead.

       F=#    Output score multiplier.
              Use option -F instead.

       G=#    Gap opening penalty
              Use option -G instead.

       H=#    High cost initiation/termination score.
              Use option -H instead.

       I=#    Gap penalty multiplier increment.
              Use option -I instead.

       L=#    Low cost initiation/termination score.
              Use option -L instead.

       M=#    maximum gap penalty multiplier.
              Use option -M instead.

       S=#    Score matrix multiplier.
              Use option -S instead.

       T=#    Gap region threshold.
              Use option -T instead.

       X=#    Gap excision threshold.
              Use option -X instead.

EXAMPLES

       (1)    pfmake -b1 -H 0.6 sh3.msf blosum45.cmp > sh3_block.prf

              Generates  a  domain-global  block  profile  from  a  multiple  alignment of SH3 domains using the
              blosum45 matrix.  The file 'sh3.msf' contains a multiple alignment of 20 SH3 domains  from  SWISS-
              PROT  release  32  including sequence weights.  The file 'blosum45.cmp' contains a 1/3 bits-scaled
              blosum45 matrix in GCG format.
              Note that fragment matches (alignments to parts of the profile) are not prohibited  but  penalized
              by the option -H 0.6.

EXIT CODE

       On  successful  completion  of  its  task,  pfmake  will  return an exit code of 0. If an error occurs, a
       diagnostic message will be output on standard error and the exit code will  be  different  from  0.  When
       conflicting  options  where  passed to the program but the task could nevertheless be completed, warnings
       will be issued on standard error.

REFERENCES

       Bucher P, Karplus K, Moeri N  &  Hofmann,  K.  (1996).   A  flexible  motif  search  technique  based  on
       generalized profiles.  Comput. Chem.  20:3-24.

       Gribskov M, Luethy R & Eisenberg D (1990).  Profile analysis.  Meth. Enzymol.  183:146-159.

       Luethy R, Xenarios I & Bucher P (1994).  Improving the sensitivity of the sequence profile method.  Prot.
       Sci.  3:139-146.

       Thompson  JD,  Higgins  DG & Gibson TJ (1994) Improved sensitivity of profile searches through the use of
       sequence weights and gap excision.  Comput. Appl. Biosci.  10:19-29.

SEE ALSO

       pfsearch(1), pfscan(1), psa2msa(1), psa(5), xpsa(5)

AUTHOR

       The pftools package was developed by Philipp Bucher.
       Any comments or suggestions should be addressed to <pftools@sib.swiss>.

pftools 2.3                                        August 2003                                         PFMAKE(1)