Provided by: pftools_3.2.12-1_amd64 

NAME
pfmake - generate a profile from a multiple sequence alignment
SYNOPSIS
pfmake [ -0123abcehlms ] [ -E gap_extend ] [ -F score_multiplier ] [ -G gap_open ] [ -H high_init/term
] [ -I gap_increment ] [ -L low_init/term ] [ -M gap_multiplier ] [ -S matrix_multiplier ] [ -T
gap_region ] [ -X gap_excision ] [ ms_file | - ] score_matrix [ profile ] [ parameters ]
DESCRIPTION
pfmake generates a PROSITE profile from a multiple sequence alignment using methods described by Gribskov
et al. (1990), Luethy et al. (1994), and Thompson et al. (1994), with modifications to exploit the
features of the new profile format. The file containing the multiple sequence alignment (ms_file) must
be either in MSF format as generated by GCG programs or by readseq (checksums are ignored) or in MSA
format as created by psa2msa(1). If '-' is specified instead of a filename, the multiple sequence
alignment is read from the standard input. The score_matrix file must also be in GCG format.
If an already existing profile is given as input via the third optional argument, the parameters of the
DISJOINT, NORMALIZATION and CUT_OFF blocks will be read from input, all other profile parameters will be
recalculated. Header and footer lines outside the matrix block will also be transferred from input to
output.
If no input profile is given, the disjointness definition will be set to PROTECT with borders leaving
short unprotected tails (maximum 5 positions) at the beginning and at the end of the profile.
Furthermore, one normalization mode (n_score = raw_score / F, where F is the output score multiplier, see
below), and two cut-off values (level 0: 8.5, level -1: 6.5) will be defined.
OPTIONS
ms_file
Input multiple sequence alignment.
The content of the file must be either in MSF or in MSA format. If the filename is replaced by a
'-', pfmake will read the input alignment from stdin.
score_matrix
Residue score matrix file.
Contains the substitution scores for all pairs of residues of the sequence alphabet. The file must
be in GCG format.
profile
Optional profile file.
If a filename is specified, the profile will be parsed and those parameters mentioned in the
description section will be kept for the computation of the output profile.
-0 Global alignment mode.
Initiation (termination) at low cost is possible only if the alignment starts at the beginning
(end) of the profile and at the beginning (end) of the sequence.
-1 Domain global alignment mode.
Initiation (termination) at low cost is possible only at the beginning (end) of the profile; it
may start and end at any position within the sequence.
-2 Semi-global alignment mode.
Initiation (termination) at low cost is possible if the alignment starts either at the beginning
(end) of the profile or at the beginning (end) of the sequences.
This is the default alignment mode.
-3 Local alignment mode.
Initiation (termination) at low cost is possible anywhere. The high-cost initiation/termination
score (parameter H) is meaningless.
-a Causes pfsearch to weight gaps asymmetrically, as in Gribskov et al. (1990).
-b Block profile mode.
By imposing additional constraints on the placement of insertions and deletions, this mode
produces profiles that favor alignments with insertions and deletions positioned symmetrically
around a few positions. For each gap region a gap center is defined which usually corresponds to
the place where gap excision has been applied (see parameter X). If no gap excision has been
applied, the position is chosen such as to maximize the sum of deletion opening events before, and
deletion closing events after the gap center. Within a given gap region reduced deletion opening
penalties are offered only before, reduced deletion closing penalties only after, and reduced
insertion penalties only at the center.
This option is incompatible with options -a and -e and automatically disables them.
-c Circular profile.
The topology of the profile is declared as circular. The first and the last insert positions are
merged by retaining the higher value of each parameter type.
-e Enables endgap-weighting mode as implemented in the GCG program ProfileMake. Endgaps in the
multiple sequence alignment will be interpreted as deletions relative to the other sequences and
thus be considered for the delineation of gap regions. The default is no endgap weighting as
introduced by Thompson et al. (1994) in the program ProfileWeight.
-h Display usage help text.
-l Remove output line length limit. Individual lines of the output profile can exceed a length of 132
characters, removing the need to wrap them over several lines.
-m Input multiple sequence alignment is in MSA format.
-s Causes pfsearch to weight gaps symmetrically (default mode). The initial gap opening scores
(MD, MI) computed from the maximal gap length and the command-line parameters E, G, I, and M, will
be divided by two and the resulting value will be assigned to both gap opening and gap closing
scores (MI, IM, MD, DM).
-E gap_extend
Gap extension penalty. See Gribskov et al. (1990).
Default: 0.2 (appropriate for 1/3 bit-scaled blosum45 matrix)
-F score_multiplier
Output score multiplier.
On output, all profile scores are multiplied by this factor and rounded to nearest integers.
Default: 100
-G gap_open
Gap opening penalty. See Gribskov et al. (1990).
Default: 2.1 (appropriate for 1/3 bit-scaled blosum45 matrix)
-H high_init/term
High-cost initiation/termination score.
This score will be applied to all external and internal initiation and termination scores
corresponding to path matrix positions where initiation or termination at low cost is not possible
according to the alignment mode specified.
Default: * (low-value)
-I gap_increment
Gap penalty multiplier increment. See Gribskov et al. (1990).
Default: 0.1
-L low_init/term
Low-cost initiation/termination score.
This score will be applied to all external and internal initiation and termination scores
corresponding to path matrix positions where initiation or termination at low cost is possible
according to the alignment mode specified.
Default: 0
-M gap_multiplier
Maximum gap penalty multiplier. See Gribskov et al. (1990). Default: 0.333
-S matrix_multiplier
Score matrix multiplier.
On input, the numbers of the score matrix are multiplied by this factor.
Default: 0.1
-T gap_region
Gap region threshold.
This is the minimal fraction of gap characters a column of the multiple sequence alignment must
contain in order to be considered part of a gap region.
Default: 0.01
-X gap_excision
Gap excision threshold.
This is the minimal fraction of non-gap characters a column of the multiple sequence alignment
must contain in order to be converted into a match position. The IM and MI transition scores of
insert positions corresponding to excised columns are set to zero; the other parameters remain
unchanged.
Default: 0.5
PARAMETERS
Note: for backwards compatibility, release 2.3 of the pftools package will parse the version 2.2 style
parameters, but these are deprecated and the corresponding option (refer to the options section)
should be used instead.
E=# Gap extension penalty.
Use option -E instead.
F=# Output score multiplier.
Use option -F instead.
G=# Gap opening penalty
Use option -G instead.
H=# High cost initiation/termination score.
Use option -H instead.
I=# Gap penalty multiplier increment.
Use option -I instead.
L=# Low cost initiation/termination score.
Use option -L instead.
M=# maximum gap penalty multiplier.
Use option -M instead.
S=# Score matrix multiplier.
Use option -S instead.
T=# Gap region threshold.
Use option -T instead.
X=# Gap excision threshold.
Use option -X instead.
EXAMPLES
(1) pfmake -b1 -H 0.6 sh3.msf blosum45.cmp > sh3_block.prf
Generates a domain-global block profile from a multiple alignment of SH3 domains using the
blosum45 matrix. The file 'sh3.msf' contains a multiple alignment of 20 SH3 domains from SWISS-
PROT release 32 including sequence weights. The file 'blosum45.cmp' contains a 1/3 bits-scaled
blosum45 matrix in GCG format.
Note that fragment matches (alignments to parts of the profile) are not prohibited but penalized
by the option -H 0.6.
EXIT CODE
On successful completion of its task, pfmake will return an exit code of 0. If an error occurs, a
diagnostic message will be output on standard error and the exit code will be different from 0. When
conflicting options where passed to the program but the task could nevertheless be completed, warnings
will be issued on standard error.
REFERENCES
Bucher P, Karplus K, Moeri N & Hofmann, K. (1996). A flexible motif search technique based on
generalized profiles. Comput. Chem. 20:3-24.
Gribskov M, Luethy R & Eisenberg D (1990). Profile analysis. Meth. Enzymol. 183:146-159.
Luethy R, Xenarios I & Bucher P (1994). Improving the sensitivity of the sequence profile method. Prot.
Sci. 3:139-146.
Thompson JD, Higgins DG & Gibson TJ (1994) Improved sensitivity of profile searches through the use of
sequence weights and gap excision. Comput. Appl. Biosci. 10:19-29.
SEE ALSO
pfsearch(1), pfscan(1), psa2msa(1), psa(5), xpsa(5)
AUTHOR
The pftools package was developed by Philipp Bucher.
Any comments or suggestions should be addressed to <pftools@sib.swiss>.
pftools 2.3 August 2003 PFMAKE(1)