Provided by: segemehl_0.3.4-5build2_amd64 

NAME
segemehl - Heuristic mapping of short sequences
SYNOPSIS
segemehl [-besVOc] -d <file> [<file>] [-q <file>] [-p <file>] [-i <file>] [-j <file>] [-x <file>] [-y
<file>] [-G <file>] [-g <string>] [-t <n>] [-o <string>] [-u <file>] [-B <string>] [-F <n>] [-S
[<basename>]] [-A <n>] [-D <n>] [-E <double>] [-H] [-m <n>] [-Z <n>] [-W <n>] [-U <n>] [-l <f>] [-w
<double>] [-X <n>] [-J <n>] [-I <n>] [-M <n>] [-n <n>] [-r <n>] [--skipidcheck] [--showalign] [--nohead]
DESCRIPTION
Segemehl is a software to map short sequencer reads to reference genomes. Segemehl implements a matching
strategy based on enhanced suffix arrays (ESA). Segemehl accepts fasta and fastq queries (gzip’ed and
bgzip'ed). In addition to the alignment of reads from standard DNA- and RNA-seq protocols, it also allows
the mapping of bisulfite converted reads (Lister and Cokus) and implements a split read mapping strategy.
The output of segemehl is a SAM or BAM formatted alignment file. In the case of split-read mapping,
additional BED files are written to the disc. These BED files may be summarized with the postprocessing
tool haarz. In the case of the alignment of bisulfite converted reads, raw methylation rates may also be
called with haarz.
In brief, for each suffix of a read, segemehl aims to find the best-scoring seed. Seeds might contain
insertions, deletions, and mismatches (differences). The number of differences allowed within a single
seed is user-controlled and is crucial for the runtime of the program. Subsequently, seeds that undercut
the user-defined E-value are passed on to an exact semi-global alignment procedure. Finally, reads with a
minimum accuracy of percent are reported to the user.
OPTIONS
INPUT
-d, --database <file> [<file>]
list of path/filename(s) of fasta database sequence(s)
-q, --query <file>
path/filename of query sequences (default:none)
-p, --mate <file>
path/filename of mate pair sequences (default:none)
-i, --index <file>
path/filename of db index (default:none)
-j, --index2 <file>
path/filename of second db index (default:none)
-x, --generate <file>
generate db index and store to disk (default:none)
-y, --generate2 <file>
generate second db index and store to disk (default:none)
-G, --readgroupfile <file>
filename to read @RG header (default:none)
-g, --readgroupid <string>
read group id (default:none)
-t, --threads <n>
start <n> threads (default:1)
OUTPUT
-o, --outfile <string>
outputfile (default:none)
-b, --bamabafixoida
generate a bam output (-o <filename> required)
-u, --nomatchfilename <file>
filename for unmatched reads (default:none)
-e, --briefcigar
brief cigar string (M vs X and =)
-s, --progressbar
show a progress bar
-B, --filebins <string>
file bins with basename <string> for easier data handling (default:none)
-V, --MEOP
output MEOP field for easier variance calling in SAM (XE:Z:)
ALIGNMENT
-F, --bisulfite <n>
bisulfite aln with methylC-seq/Lister et al. (=1) or bs-seq/Cokus et al. protocol (=2) (default:0)
-S, --splits [<basename>]
detect split/spliced reads. (default:none)
-A, --accuracy <n>
min percentage of matches per read in semi-global alignment (default:90)
-D, --differences <n>
search seeds initially with <n> differences (default:1)
-E, --evalue <double>
max evalue (default:5.000000)
-H, --hitstrategy
report only best scoring hits (=1) or all (=0) (default:1)
-m, --minsize <n>
minimum length of queries (default:12)
-Z, --minfraglen <n>
min length of a spliced fragment (default:20)
-W, --minsplicecover <n>
min coverage for spliced transcripts (default:80)
-U, --minfragscore <n>
min score of a spliced fragment (default:18)
-l, --splicescorescale <f>
report spliced alignment with score s only if <f>*s is larger than next best spliced alignment
(default:0.900000)
-w, --maxsplitevalue <double>
max evalue for splits (default:50.000000)
SPECIAL
-X, --dropoff <n>
dropoff parameter for extension (default:8)
-J, --jump <n>
search seeds with jump size <n> (0=automatic) (default:0)
-O, --order
sorts the output by chromsome and position (might take a while!)
-I, --maxpairinsertsize <n>
maximum size of the inserts (paired end) in case of multiple hits (default:200000)
-M, --maxinterval <n>
maximum width of a suffix array interval, i.e. a query seed will be omitted if it matches more
than <n> times (default:100)
-c, --checkidx
check index
-n, --extensionpenalty <n>
penalty for a mismatch during extension (default:4)
-r, --maxout <n>
maximum number of alignments that will be reported. If set to zero, all alignments will be
reported (default:0)
--skipidcheck
do not check whether the fastq ids of mates / paired ends match. Instead, the first mate (-q) will
be used for output only.
--showalign
show alignments
--nohead
do not output header
BUGS
Please report bugs to steve@bioinf.uni-leipzig.de
SEE ALSO
http://www.bioinf.uni-leipzig.de/Software/segemehl/
REFERENCES
2008 Bioinformatik Leipzig
2018 Leibniz Institute on Aging (FLI)
AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
of the program.
segemehl 0.3 October 2018 SEGEMEHL(1)