Provided by: vienna-rna_2.5.1+dfsg-1build3_amd64 bug

NAME

       AnalyseSeqs - Analyse a set of sequences of common length

SYNOPSIS

       AnalyseSeqs [-X[bswn]] [-Q] [-M{mask}[+|!]] [-D{H|A|G}] [-d{S|H|D|B}]

DESCRIPTION

       AnalyseSeqs  reads  a set of sequences from stdin and tries a variety of methods for sequence analysis on
       them. Currently available are:
       Statistical geometry for quadruples of sequences; THIS IS PRELIMINARY AND NOT WELL TESTED BY NOW.
       split decomposition; neighbour joining and Ward's variance method for  reconstructing  phylogenies  using
       various  distance  measures.   For  statistical  geometry  and  the  cluster methods PostScript output is
       available.
       The program continues reading until it encounters one of  the  separator  characters  '@'  or  '%'.  Only
       sequences  of  alphabetical  characters  or  of  a  specified alphabet are processed, all other lines are
       ignored. The program stops reading if it either encounters an EOF condition, or if  there  are  no  valid
       sequence data between two lines beginning with separator characters.
       A  list  of  taxa  names can be specified in the input stream. The list begins with a line beginning with
       '*'. Optionally, a file name prefix [fn] for the PostScript output can be specified in  this  line.   The
       entries  have  the  form 'x : Taxon', where x is the number of taxon, i.e., of the corresponding entry in
       the list of input sequences. The taxa list need not be complete.  It  must  end,  however,  with  a  line
       beginning with '*' or any of the separator characters. The taxa list is printed on top of the output. The
       specified taxa names are used as labels in the PostScript output.

OPTIONS

       -X[bswn]
              specifies the analysis methods to be used.

       [b]    Statistical  Geometry.  A PostScript file named '[fn_]box.ps' giving a graphical representation of
              the statistical geometry is created. The resulting box is a good measure of 'tree likeness' of the
              data set.  This is the default.

       [s]    Split decomposition.

       [w]    Cluster analysis  using  Ward's  method.  A  PostScript  file  named  '[fn_]wards.ps'  is  created
              containing a drawing of the tree.

       [n]    Cluster  analysis using Saitou's neighbour joining method. A PostScript file named '[fn_]nj.ps' is
              created containing a drawing of the tree.

       -Q     indicates that a statistical geometry analysis is to be performed comparing four  data  sets,  for
              instance  to  confirm  the  significance  of  a proposed phylogeny. This option is only useful for
              statistical geometry analysis and hence the -X option is ignored. Each of the four data sets  must
              be of the form
              * [filename_prefix]
              # number
              [list of taxa names]
              *
              list of sequences
              %
              where number is 1,2,3,4 for the four groups to be compared.

       -M{mask}[+|!]
              allows  one  to  specify  a  mask for the input file. '{mask}' can be one of the following letters
              indicating a predefined alphabet or the %-sign followed by all characters to be accepted. A + sign
              at the very end of the mask indicates that the input is to be handled case sensitive.  Default  is
              conversion  of the input to upper case. A ! sign can be used to convert the input data to RY code:
              GgAaXx -> R, UuCcKkTt -> Y, all other letters are converted to *.

       -Ma    all letters A-Z and a-z.

       -Mu    uppercase letters.

       -Ml    lowercase letters.

       -Mc    digits [0-9].

       -Mn    all alphanumeric characters.

       -MR    RNA alphabet (GCAUgcau).

       -MD    DNA alphabet (GCATgcat).

       -MA    Amino acids in one-letter code.

       -MS    Secondary strcutures coded as '^.()'

       -M%alphabet
              use the specified alphabet.

       -D     specifies the algorithm to be used for calculating the distance matrix  of  the  input  data  set.
              Available are

       -DH    Hamming Distance

       -DA[,cost]
              Simple  alignment distance according to Needleman and Wunsch.  A gap cost different from 1. can be
              specified after the comma.

       -DG[,cost1,cost2]
              Gotoh's distance with  gap  cost  function  g(k)  =  cost2+cost1*(k-1).  cost2<=cost1  has  to  be
              fulfilled.  Default values are cost1=1., cost2=1., yielding the same distance as option A.
              ONLY THE HAMMING DISTANCE IS WELL TESTED BY NOW !!!

       -d     specifies the edit cost matrix to be used. Available are

       -dS    simple  distance.  Indel  and substitution of different characters all have cost 1. The indel cost
              can be set by specifying the gap costs with the  algorithm  options  -DA  and  -DG.  This  is  the
              default.

       -dH    A  distance  matrix  for  RNA  secondary  structures.  Inspired  by  Hogeweg's  similarity measure
              (J.Mol.Biol 1988).  Gap-function is set automatically.

       -dD    Dayhoff's matrix for amino acid distances.

       -dB    Distinguish purines and pyrimidines only.  CAUTION this  option  of  course  influences  only  the
              calculation  of  distances.   It  does NOT affect computation of the statistical geometry. This is
              done directly on the sequences. If you want to do statistical geometry on RY sequences use  the  !
              sign with the -M option, for instance -MR!.

REFERENCES

       The  method  of  statistical  geometry  has been introduced by M. Eigen, R. Winkler-Oswatitsch and A.W.M.
       Dress (Proc Natl Acad Sci, 85:1988,5912).  The method of split decomposition was proposed by H.J. Bandelt
       and A.W.M. Dress (Adv Math, 92:1992,47).  The variance method for cluster analysis is due to H.J. Ward (J
       Amer Stat Ass, 58:1963,236).  The neighbour joining method was published by  Saitou  and  Nei  (Mol  Biol
       Evol, 4:1987,406).

       This program is part of the Vienna RNA Package

WARNING

       This is the beta test version. Some options or combinations of options may still produce nonsense. Please
       send bug reports to ivo@tbi.univie.ac.at.

VERSION

       This man page is part of the Vienna RNA Package version 1.2.

AUTHOR

       Peter F Stadler, Ivo L. Hofacker.

BUGS

       Comments should be sent to ivo@itc.univie.ac.at.

                                                                                                  ANALYSESEQS(1)