Provided by: pscan-tfbs_1.2.2-4build1_amd64 bug

NAME

       pscan - detection of transcription factor binding sites in DNA sequences

SYNOPSIS

       pscan -q multifastafile -p multifastafile [options]
       pscan -p multifastafile [options]
       pscan -q multifastafile -M matrixfile [options]

DESCRIPTION

       Pscan inspects the upstream non-coding regions of many genes to derive subsequences that are characteris‐
       tic  for  the binding of proteins, i.e. transcription factors, that control the tissue- and situation-de‐
       pendent expression of a gene.  The tool is supported by the JASPAR database and other data that is  down‐
       loadable from the tool's home page.

       The  command line tool pscan is meant for bulk submission. The tool is  also offered with a web interface
       that has all auxillary data updated.

OPTIONS

       pscan options only have single dashes (`-') and (with notable exceptions) followed by  a  single  letter.
       Options are case-sensitive.  A summary of options is included below.

       -h     Show summary of options similar to this man page.

       -v     Show version of program.

       -q     file Specify the multifasta file containing the foreground sequences.

       -p     file Specify the multifasta file containing the background sequences.

       -m     file Use it if the background data are already available in a file (see -g option).

       -M     file  Scan  the  foreground  sequences using only the Jaspar/Transfac matrix file contained in the
              specified file.

       -l     file Use the matrices contained in that file (for matrix file format see below).

       -N     name Use only the matrix with that name (usable only in association with -l).

       -ss    Perform single strand only analysis.

       -rs    Perform single strand only analysis on the reverse strand.

       -split num1num2 Sequences are scanned only from position num1 and for num2 nucleotides.

       -trashn
              Discards sequences containing "N".

       -n     Oligos containing "N" will not be discarded. Instead a "N" will obtain an "average" score.

       -g     If a background sequences file is used than a file will be written containing the data  calculated
              for  that background sequences and the current set of matrices.  From now on one can use that file
              (-m option) instead of the sequences file for faster processing.

       -ui file
              An index of the background file will be used to avoid duplicated sequences.

       -bi    Build an index of the background sequences file (to be used later with the -ui option).   This  is
              useful  when  you  have  duplicated sequences in your background that may introduce a bias in your
              results.

NOTES

       The sequences to be used with Pscan have to be promoter sequences.  To  obtain  meaningful  results  it's
       critical  that the background and the foreground sequences are consistent between them either in size and
       in position (with respect to the transcription start site). For optimal results the foreground set should
       be a subset of the background set.

       If the "-l" option is not used Pscan will try to find Jaspar/Transfac matrix files in the current folder.
       Jaspar files have ".pfm" extension while Transfac ones have ".pro" extension.  If Jaspar matrix files are
       used than a file called "matrix_list.txt" must be  present  in  the  same  folder.   That  file  contains
       required info about the matrices in the ".pfm" files.

EXAMPLES

       1) pscan -p human_450_50.fasta -bi

       This  command  will  scan  the file "human_450_50.fasta" using the matrices in the current folder.  It is
       handy to use that command the first time one uses a set of matrices with  a  given  background  sequences
       file.   A file called human_450_50.short_matrix will be written and it can be used from now on every time
       you want to use the same background sequences  with  the  same  set  of  matrices.   A  file  called  hu‐
       man_450_50.index  will  be  written too and it will be useful every time you will use the same background
       file.

       2) pscan -q human_nfy_targets.fasta -m human_450_50.short_matrix -ui human_450_50.index

       This command will scan the file human_nfy_targets.fasta  searching  for  over-represented  binding  sites
       (with respect to the preprocessed background contained in the "human_450_50.short_matrix" file) using the
       matrices in the current folder.  Please note that the query file "human_nfy_targets.fasta" must be a sub‐
       set of the sequences contained in the background file "human_450_50.fasta" in order to use the index file
       with  the "-ui" option. This means that both the sequences and their FASTA headers used in the query file
       must appear in the background file as well. Using the "-ui" option when the sequences  contained  in  the
       query file are not a subset of the background file will have undefined/unpredictable outcomes.   The out‐
       put  will be a file called "human_nfy_targets.fasta.res" where you will find all the used matrices sorted
       by ascending P-value.  The lower the P-value obtained by a matrix, the higher are the  chances  that  the
       transcription  factor  associated  to  that  matrix  is a regulator of the input promoter sequences.  The
       fields of the output are the following: "Transcription Factor Name", "Matrix ID",  "Z  Score",  "Pvalue",
       "Foreground Average", "Background Average".

       3) pscan -q human_nfy_targets.fasta -M MA0108.pfm

       This  command  will  scan  the  sequences  file  "human_nfy_targets.fasta"  using the matrix contained in
       "MA0108.pfm".  The result will be written in a file called "human_nfy_targets.fasta.ris" where  you  will
       find  the  sequences  in  input sorted by a descending score (between 1 and 0). The higher the score, the
       better is the oligo found with respect to the used matrix.  The fields of the output are  the  following:
       "Sequence Header", "Score", "Position from the end of sequence", "Oligo that obtained the score", "Strand
       where the oligo was found".

       4) pscan -p human_450_50.fasta -bi -l matrixfile.wil

       This command is like Example #1 with the difference that the matrices set to be used is the one contained
       in  the  "matrixfile.wil" file.  Please look at the "example_matrix_file.wil" file included in this Pscan
       distribution to see the correct format for matrices file.

       5) pscan -q human_nfy_targets.fasta -l matrixfile.wil -N MATRIX1

       This command is like  Example  #3  but  it  will  use  the  matrix  called  "MATRIX1"  contained  in  the
       "matrixfile.wil" file.

SEE ALSO

       For info on how Pscan works pleare refer to the paper.

                                                   May  3 2018                                          PSCAN(1)