Provided by: gmap_2024-02-22+ds-1build1_amd64 bug

NAME

       gmap_build - Tool for genome database creation for GMAP or GSNAP

SYNOPSIS

       gmap_build [options...] -d <genome> [-c <transcriptome> -T <transcript_fasta>] <genome_fasta_files>

DESCRIPTION

       gmap_build:  Builds  a  gmap  database  for  a genome to be used by GMAP or GSNAP.  Part of GMAP package,
       version 2024-02-22.

       You are free to name <genome> and <transcriptome> as  you  wish.   You  will  use  the  same  names  when
       performing alignments subsequently using GMAP or GSNAP.

       Note:  If  adding  a  transcriptome  to  an  existing  genome,  then  there  is  no  need  to specify the
       genome_fasta_files.  This way you can add transcriptome information to an existing genome database.

OPTIONS

       -D, --dir=STRING
              Destination directory for installation (defaults to gmapdb directory specified at configure time)

       -d, --genomedb=STRING
              Genome name (required)

       -n, --names=STRING
              Substitute names for contigs, provided in a file.

              The file can have two formats:

       1.     A file with one column per line, with each line corresponding to a FASTA file, in the order  given
              to  gmap_build.   The  chromosome  name  for  each  FASTA  file  will be replaced with the desired
              chromosome name in the file.  Every chromosome in the FASTA must have a corresponding line in  the
              file.  This is useful if you want to rename chromosomes with a systematic numbering pattern.

       2.     A  file  with  two  columns  per line, separated by white space.  In each line, the original FASTA
              chromosome name should be in column 1 and the desired chromosome name will be in column 2.

              The meaning of file format 2 depends on whether --limit-to-names is specified.  If so, the  genome
              build  will be limited to those chromosomes in this file.  Otherwise, all chromosomes in the FASTA
              file will be included, but only those chromosomes in this file will be re-named, which provides an
              easy way to change just a few chromosome names.

              This file can be combined with the --sort=names option, in which the order of chromosomes is  that
              given  in the file.  In this case, every chromosome must be listed in the file, and for chromosome
              names that should not be changed, column 2 can be blank (or the same as column 1).  The option  of
              a  blank  column  2  is  allowed only when specifying --sort=names, because otherwise, the program
              cannot distinguish between a 1-column and 2-column names file.

       -L, --limit-to-names
              Determines whether to limit the genome build to the lines listed in the  --names  file.   You  can
              limit  a  genome  build  to  certain chromosomes with this option, plus a --names file that either
              renames chromosomes, or lists the same names in both columns for the desired chromosomes.

       -k, --kmer=INT
              k-mer value for genomic index (allowed: 15 or less, default is 15)

       -q INT sampling interval for genomoe (allowed: 1-3, default 3)

       -s, --sort=STRING
              Sort chromosomes using given method: none - use chromosomes as found in  FASTA  file(s)  (default)
              alpha  -  sort  chromosomes alphabetically (chr10 before chr 1) numeric-alpha - chr1, chr1U, chr2,
              chrM, chrU, chrX, chrY chrom - chr1, chr2, chrM, chrX, chrY, chr1U, chrU names - sort  chromosomes
              based on file provided to --names flag

       -g, --gunzip
              Files are gzipped, so need to gunzip each file first

       -E, --fasta-pipe=STRING
              Interpret argument as a command, instead of a list of FASTA files

       -Q, --fastq
              Files are in FASTQ format

       -R, --revcomp
              Reverse complement all contigs

       -w INT Wait (sleep) this many seconds after each step (default 2)

       -o, --circular=STRING
              Circular  chromosomes (either a list of chromosomes separated by a comma, or a filename containing
              circular chromosomes, one per line).  If you use the --names feature,  then  you  should  use  the
              substitute  name  of the chromosome, not the original name, for this option.  (NOTE: This behavior
              is different from previous versions, and starts with version 2020-10-20.)

       -2, --altscaffold=STRING
              File with alt scaffold info, listing alternate scaffolds, one per line,  tab-delimited,  with  the
              following  fields:  (1)  alt_scaf_acc,  (2)  parent_name, (3) orientation, (4) alt_scaf_start, (5)
              alt_scaf_stop, (6) parent_start, (7) parent_end.

       -e, --nmessages=INT
              Maximum number of messages (warnings, contig reports) to report (default 50)

       --sarray=INT
              Whether to build suffix array: 0=no (default), 1=yes

   Options for older genome formats:
       -M, --mdflag=STRING
              Use MD file from NCBI for mapping contigs to chromosomal coordinates

       -C, --contigs-are-mapped
              Find a chromosomal region in each FASTA header line.  Useful for contigs that have been mapped  to
              chromosomal coordinates.  Ignored if the --mdflag is provided.

   Options for transcriptome-guided alignment:
       -c, --transcriptomedb=STRING
              Transcriptome name, plus one of these four flags:

       --gtf=FILE
              GTF file containing transcripts

       --gff3=FILE
              GFF3 file containing transcripts

       -G, --genes=FILE
              Genes file containing transcripts

       -T, --transcripts=FILE
              FASTA file containing transcripts

       -t, --nthreads=INT
              Number  of  threads  for  GMAP  alignment  of  transcripts  to  genome  (default  8).   Applies if
              --transcripts option is given

       Other tools of GMAP suite are located in /usr/lib/gmap

gmap_build 2024-02-22+ds-1build1                   March 2024                                      GMAP_BUILD(1)