Provided by: pftools_3.2.12-1_amd64 bug

NAME

       psa - biological sequence alignment file format

DESCRIPTION

       psa  is  an output format used by the pftools package to describe alignments between biological sequences
       (DNA or protein) and PROSITE profiles.

       psa is apparented to the widely used biological sequence file format fasta.   Nevertheless  it  does  not
       only describe a biological sequence, it is especially used to include information of alignments between a
       motif  descriptor like a PROSITE profile and a given sequence. This information is included in the header
       and reflected in the structure of the sequence following the header line.

SYNTAX

       Each sequence in a psa alignment file or output must be preceded by a fasta header line.
       The general syntax of such a fasta header line is as follows:

              >seq_id [ free_text ]

       The header must start with a '>' character which is directly followed by the seq_id field. This field  is
       interpreted  by  most programs as the sequence's identifier and/or accession number. It ends at the first
       encountered whitespace character.
       The pftools programs will use the free_text to add  information  about  the  match  score,  position  and
       description  of  the  sequence  or motif.  Please refer to the man page of the corresponding programs for
       further information about the output formats.
       The header can only extend over one line. The following lines up to  a  new  line  starting  with  a  '>'
       character or the end of the file are interpreted as sequence data.

       The  line  following the header, starts the alignment data between a sequence and a PROSITE profile. This
       data can span over several lines of different length.
       The data is formed by upper or lower-case characters of  the  corresponding  sequence  alphabet  (DNA  or
       protein).  The gap characters '.' and '-' are also supported.
       The  alignment  always  has at least the length of the matching profile. Insertions or deletions detected
       during the motif/sequence alignment step will vary the length of the data reported, and can be identified
       using the following conventions:

              upper-case character
                     Any upper-case character of the sequence alphabet identifies a match position  between  the
                     sequence and the motif descriptor.

              lower-case character
                     A  lower-case  character  of the sequence alphabet is used to symbolize an insertion in the
                     sequence compared to the motif descriptor.

              '-' (dash) character
                     A '-' character in the output identifies  the  presence  of  a  deletion  in  the  sequence
                     compared to the motif descriptor.

EXAMPLES

       (1)    >YD28_SCHPO 556 pos. 291 - 332 sp|Q10256|YD28_SCHPO

              PTDPGlnsKIAQLVSMGFDPLEAAQALDAANGDLDVAASFLL--
              This is an example of the output produced by pfsearch(1) using the '-x' (i.e.  psa output) option.
              The  first  line starting with the '>' character is the fasta header. It also contains information
              about the raw score of the alignment as well as its position in the input sequence.
              On the next line you find the alignment proper. Starting at position 6, we can find  an  insertion
              of  the  'lns' residues in the sequence compared to the motif. The last two positions of the motif
              are not present in the sequence (i.e. they are deleted).  This is indicated by the presence of two
              '-' (dash) characters at the end of the alignment.

NOTES

       (1)    The xpsa(5) format defines a more strict syntax of the  header  line,  allowing  the  exchange  of
              information between different sequence analysis tools. It uses keyword=value pairs to annotate the
              current  match  between  a  sequence  and a motif descriptor. This syntax can be easily parsed and
              extended, according to the needs of bioinformatic tools.

       (2)    The current implementation of the pftools package does not use the '.'  (dot) character in the psa
              output. Nevertheless psa2msa(1) will read it and interpret it in the same manner as the '-' (dash)
              character.

SEE ALSO

       xpsa(5), pfsearch(1), pfscan(1), pfw(1), pfmake(1), psa2msa(1)

AUTHOR

       This manual page was originally written by Volker Flegel.
       The pftools package was developed by Philipp Bucher.
       Any comments or suggestions should be addressed to <pftools@sib.swiss>.

pftools 2.3                                        April 2003                                             PSA(5)