Provided by: perm_0.4.0-8_amd64 bug

NAME

       perm - Efficient mapping of short reads with periodic spaced seeds

       If you have any usage questions, please email "yanghoch at usc dot edu".

SYNOPSIS

       To use the command line, type perm with the args in the order.

EXAMPLES

       For single-end reads:

       perm Ref Reads [options]

       Examples:

       perm Ref.fasta Reads.fasta -v 5 -o out.mapping -u ummappedReads.fa

       perm RefFilesList.txt ReadsSetFilesList.txt -v 5 -u unmappedReads.fa -E

       perm Ref.fasta Reads.csfasta -v 5 -m -s my.index --delimiter ´,´ --seed F3

       perm my.index SingleEndReads.csfasta -v 5 -o out.sam -k 10 -a ambiguous10.csfasta

       For paired-end reads:

       perm Ref -1 F3_Reads -2 R3_Reads [options]

       Examples:

       perm ref.fa -1 F3.fa -2 R3.fa -U 3000 -L 100 -v 5 -A -m -s -o out.sam

       perm ref.txt -1 F3.fq -2 R3.fq -v 5 -m -s my.index -o out.mapping --seed F3

       perm my.index -1 F3.fq -2 R3.fq -U 3000 -L 100 -v 5 -A -o out.sam

       To build an index only:

       perm Ref Read_Length --readFormat <.csfasta|.fasta> -m -s index path --seed F3

       Example:

       perm hg18.txt 50 --readFormat .csfasta -m -s hg18_50_SOLiD.index

OPTIONS

       Required Arguments

       •   The reference file should be in FASTA format with the either the .fasta, .fna, or .fa file extension.
           For  a transcriptome with multiple genes or isoforms as reference, concatenate all FASTA sequences in
           a single FASTA file. Alternatively, if there are many files, for  example  one  per  chromosome,  ex:
           chr1.fa to chrY.fa, list the FASTA filenames one per line in a file which has the .txt extension. The
           .txt  is  important  because  PerM examines the file extension to know if the input file is a list of
           filenames. The filenames need to include the file path (relative or absolute) unless the FASTA  files
           are all in the same directory that PerM is run from.

       •   The  read  file(s)  should  be in the .fasta, .fastq, .csfasta or .csfastq format. PerM parses a file
           according to its extension, or the format explicitly specified by the --readFormat <format> flag.  If
           there are multiple read files, list each filename, one per line, in a .txt file. PerM takes it as the
           input and can map multiple read sets in parallel by [http://en.wikipedia.org/wiki/OpenMP OpenMP].

       Short Options (grouped by related functionality)

       -A     Output all alignments within the mismatch threshold (see -v option), end-to-end.

       -B     Output  best alignments in terms of mismatches in the threshold (see -v option). For example, if a
              read has no perfect  match  alignments,  two  single  base  mismatch  alignments,  and  additional
              alignments  with  more mismatches, only the two single base mismatch alignments will be output. -B
              is the default mode if neither -A or -B is specified.

       -E     Output only uniquely mapped reads remaining after the best down  selection  has  been  applied  if
              applicable.  When  combined  with  the  -A  option,  only reads with a single alignment within the
              mismatch threshold (see -v option) will be output.

       -v     Maximum number of mismatches allowed (or allowed in each end  for  pair-end  reads).  The  default
              value is the number of mismatches that the seed used is fully sensitive to.

       -k     Specifies  maximum  number of alignments to output. The default value is 200 if the -k flag is not
              given. Alignments for reads mapping to more than the maximum number positions will not be  output.
              Use the -a option to collect reads which exceeded the maximum.

       -t     Number  of  bases at the 5´ end of each read to ignore. For example, if the first 5 bases are used
              as a barcode or to index multiple samples together, use -t 5. If not specified, no  initial  bases
              will be ignored.

       -T     Number of bases in each read to use, starting after any bases ignored by -t option. Later bases at
              the  3´  of the read are ignored. For example, -T 30 means use only first 30 bases (signals) after
              the any bases ignored due to the -t option.

       -m     Create the reference index without reusing the saved index even if available.

       -s path
              Save the reference index to accelerate the mapping in the future. If path is  not  specified,  the
              index  will  be  created  in the current working directory (i.e. where PerM is run from) using the
              default index name. If path is a directory, the index will be created in the  specified  directory
              using the default index name (directory must exist; it will not automatically be created). If path
              is a file path, the index will be created with the specified name.

       -o filepath
              Name  of mapping output file when mapping a single read set. The output file format will be either
              the .mapping tab delimited text format or the SAM format as determined by  the  extension  of  the
              output   filename.   For   example   {{{-o   out.sam}}}   will   output   in   SAM  format;  {{{-o
              /path/to/out.mapping}}} will output in  .mapping  format.  Use  --outputFormat  to  override  this
              behavior.  The  -o option does not apply when multiple reads sets are being mapped at once to take
              advantage of multiple CPUs (cores); see the -d option for that case.

       -d dirpath
              Output directory for mapping output files when mapping multiple read sets (output  files  will  be
              named  automatically).  If  the  directory  specified does not exist, the output directory will be
              created provided the parent directory exists. If the -d switch is not  specified,  files  will  be
              written to the directory PerM is run from. Note: if -d filepath is specified when mapping a single
              read set, dirpath will be prepended to filepath; however, this usage is not recommended.

       -a filepath
              Create  a FASTA (FASTQ) file for reads mapped to more positions than the threshold specified by -k
              or the default of 200.

       -b filepath
              Create a FASTA (FASTQ) file for reads that  is  shorter  than  expected  length  or  with  strange
              characters.

       -u filepath
              Create  a  FASTA  (FASTAQ)  file  of  unmapped  reads.  When a single read set is mapped, filename
              specifies the name of the output file. When multiple read sets are mapped, filename is  irrelevant
              and  should be omitted; the files of unmapped sequences will automatically be named and created in
              the directory PerM is run from.

       Long Options

       --ambiguosReadOnly
              Output only ambiguous mapping to find repeats (similar  regions  within  substitution  threshold).
              When  this  option is specified, reads that mapped to over mapping number threshold that specified
              by -k will still be printed.

       --ambiguosReadInOneLine
              utput reads mapped to more than k places in one line. When this option is  specified,  reads  that
              mapped  to  over mapping number threshold specified by -k will still be printed but printed in one
              line.

       --noSamHeader
              Do not include a SAM header. This makes it easier to concatenate multiple SAM output files.

       --includeReadsWN
              Map reads with equal or fewer N or ´.´ bases than the specified threshold by encoding N or ´.´  as
              A or 3. Reads with more ´N´ will be discarded. The default setting discards read with any ´N´.

       --statsOnly
              Output the mapping statistics to stdout only, without saving alignments to files.

       --ignoreQS
              Ignore the quality scores in FASTQ or QUAL files.

       --printNM
              When  quality  scores  are  available,  use  this  flag  to print number of mismatches, instead of
              mismatch scores in mapping format.

       --seed {F,,0,, | F,,1,, | F,,2,, | F,,3,, | F,,4,, | S,,11,, | S,,20,, | S,,12,,}
              Specify the seed pattern. The F,,0,,, F,,1,,, F,,2,,, F,,3,,, and F,,4,, seeds are fully sensitive
              to 0-4 mismatches respectively. The S,,11,, S,,20,, S,,12,,  seeds  are  designed  for  the  SOLiD
              sequencer.  An S,,kj,, seed is full sensitive to k adjacent mismatch pairs (SNP signature is color
              space) and j isolated mismatches. See [http://code.google.com/p/perm/wiki/Algorithms the algorithm
              page] for more information about the seed patterns.

       --refFormat {fasta | list | index }
              Assume references sequence(s) are in the specified format, instead of guessing according to file´s
              extension.

       --readFormat |{fasta | fastq | csfasta | csfastq}
              Assume reads are in the specified format, instead of guessing according to the file(s)´ extension.

       --outputFormat { sam | mapping }
              Override the default output mapping format option or specify it explicitly when  the  output  file
              extension is not .sam or .mapping.

       --delimiter char
              char  is a character used as the delimiter to separate the the read id, and the additional info in
              the line with > when reading a FASTA or CSFASTA file.

       --log filepath
              filepath specifies the name of the log file which contains the mapping statistics that  will  also
              be printed on the screen.

       --forwardOnly
              Map reads to the forward strand only: (This is for SOLiD Strand specific sequencing).

       --reverseOnly
              Map reads to the reverse strand only: (This is for SOLiD Strand specific sequencing)

       Options for Paired-end Reads

       PerM deals with mate-paired reads by mapping each end separately. All combinations of mated pairs mapping
       to  the same reference sequence will be output if their separation are in the allowed ranged as specified
       by the -L and -U flags.

       -e     Exclude ambiguous paired.

       -L / --lowerBound Int
              lower bound for mate-paired separation distance

       -U / --upperBound Int
              upper bound for mate-paired separation distance

       The upper bound and lower bound can be negative, which may catch the re-arrangement variations.  Use  the
       -A  argument  to  avoid missing the correct pairs. However, this may greatly increase the running time if
       both ends are in repetitive regions.

       --fr   Map paired-end reads to different strand only

       --ff   Map paired-end reads to the same strand only

       --printRefSeq
              Print the mapped reference paired sequence as the two last  columns  in  .mapping  format.  |  The
              default option output mapping in both the same or different strand.

DEFAULT SETTINGS

       The  following  are  the  default  settings  when the corresponding command line option is not specified.
       Please specify the option to change the default settings.

       •   Allow only two mismatches only in each end and use seed F,,2,, S,,11,, or F,,3,, ,selected  according
           to the read lengths and types.

       •   Print the best alignments for each read in terms of the number of mismatches.

       •   Output files in *.mapping format.

       •   Searches for a saved index with the default file name before building the new index.

       •   Won´t save the index in file, unless {{{-s}}} is specified.

       •   For paired end reads, the default allowed separation distance is 0-3000 bp. Change with the -L and -U
           options.

       Parallel Mapping

       PerM  simultaneously  maps  multiple  reads sets in a list by querying the same index. It will detect how
       many CPUs (cores) are available and assign each of them a read set. If a read set is done, the next  read
       set  in the list will be processed automatically. Each read set will have its own mapping output file. To
       better utilize all CPUs on a node, large reads set should be split into many small read sets and put in a
       list. When multiple nodes are used in the same file system, the index should be pre-built  first  by  one
       node;  the  other  nodes  will  read  the pre-built index without building index again. Without pre-built
       index, each machine will try to build its own index, wasting CPU time and storage space.

Exit codes

       PerM sets the exit code to 0 upon successful completion, the normal Unix  behavior.  If  the  program  is
       terminated  via  Ctrl-C  (SIGINT),  the exit code will be 2, the number for SIGINT (see man kill). If you
       invoke PerM from another language, you can check the return code and do something intelligent. Here is  a
       Perl pseudo-code example:

           while (... some sort of loop ...) {
             my $cmd = "PerM ... arguments and switches";
             my $ec = system($cmd);
             if ($ec == 2) {
           print STDERR "PerM terminated via Ctrl-C. Stopping run.\n\n";
           # Maybe do some cleanup such as deleting the small files the read file was
           # split into for parallel processing.
           exit($ec);
             }
           }

Use PerM on Galaxy

       Thanks   to   Prof   Anton   Nekrutenko   and   Kelly   Vincent   at   PSU,  you  can  now  use  PerM  on
       [http://test.g2.bx.psu.edu/ Galaxy´s test server]. Follow the  hyperlink  to  Galaxy´s  page,  and  click
       NGS:Mapping in the tool menu. Please choose Map with PerM for SOLiD and Illumina. You can upload your own
       reference  or  use  the  pre-built  hg19  index  in  the  system.  Please  email  me if you encounter any
       difficulties. Once the system proves its stability, it will be moved to Galaxy´s main  server  with  more
       pre-built reference index.

Unit Test

       When  PerM was developed, a unit cppUnit test module was also prepared. If you are interested in the test
       code for PerM, please email me.

Yangho Chen                                        April 2014                                            PERM(1)