Provided by: ncbi-tools-bin_6.1.20170106+dfsg2-2_amd64 bug

NAME

       fa2htgs - formatter for high throughput genome sequencing project submissions

SYNOPSIS

       fa2htgs  [-]  [-6 str]  [-7 str]  [-A filename]  [-C str]  [-D] [-L filename] [-M str] [-N] [-O filename]
       [-P str] [-Q filename] [-S str] [-T filename] [-X] [-a str] [-b N] [-c str] [-d str]  [-e filename]  [-f]
       -g str  [-h str]  [-i filename]  [-k str]  [-l N] [-m] [-n str] [-o filename] [-p N] [-q] [-r str] -s str
       [-t filename] [-u] [-v] [-w] [-x str]

DESCRIPTION

       fa2htgs is a program used to generate Seq-submits (an ASN.1 sequence submission file) for high throughput
       genome sequencing projects.

       fa2htgs will read a FASTA file (or an Ace Contig file with  Phrap  sequence  quality  values),  a  Sequin
       submission  template  file, (to get contact and citation information for the submission), and a series of
       command line arguments (see below).  This  program  will  then  combines  these  information  to  make  a
       submission  suitable  for  GenBank.  Once you have generated your submission file, you need to follow the
       submission protocol (see the README present on your FTP account or mailed out to your Center).

       fa2htgs is intended for the automation by scripts for bulk submission of unannotated genome sequence.  It
       can  easily  be extended from its current simple form to allow more complicated processing.  A submission
       prepared with fa2htgs can also be read into Psequin(1), and then annotated more extensively.

       Questions and concerns about this processing protocol, or how to use this tool  should  be  forwarded  to
       <htgs@ncbi.nlm.nih.gov>.

OPTIONS

       A summary of options is included below.

       -      Print usage message

       -6 str SP6 clone (e.g., Contig1,left)

       -7 str T7 clone (e.g., Contig2,right)

       -A filename
              Filename  for accession list input (mutually exclusive with -T and -i).  The input file contains a
              tab-delimited table with three to five columns, which are accession number, start  position,  stop
              position, and (optionally) length and strand.  If start > stop, the minus strand on the referenced
              accession  is used.  A gap is indicated by the word "gap" instead of an accession, 0 for the start
              and stop positions, and a number for the length.

       -C str Clone library name (will appear as /clone-lib="str" on the source feature)

       -D     HTGS_DRAFT sequence

       -L filename
              Read phrap contig order from filename.  This is a tab-delimited file that can be used to drive the
              order of contigs (normally specified by -P), as well as indicating the SP6 and T7  ends.   It  can
              also be used when contigs are known to be in opposite orientation.  For example:

                  Contig2     +       1       SP6     left
                  Contig3     +       1
                  Contig1     -               T7      right

              The  first  column  is  the  contig  name,  the  second  is  the  orientation,  the  third  is the
              fragment_group, the fourth indicates the SP6 or T7 end, and the fifth says which side of SP6 or T7
              end had vector removed.

       -M str Map name (will appear as /map="str" on the source feature)

       -N     Annotate assembly_fragments

       -O filename
              Read comment from filename (100-character-per-line maximum; ~ is a linebreak and `~ is  a  literal
              ~.  You can check the format with PSequin(1).)

       -P str Contigs  to  use,  separated  by  commas.   If  -P  is  not indicated with the -T option, then the
              fragments will go in in the order that they are in the ace file (which is appropriate for a  phase
              1  record,  but  not for a phase 2 or 3).  If you need to set the order of the segments of the ace
              file, you need to set it with the -P flag, like this: -P "Contig1,Contig4,Contig3,Contig2,Contig5"

       -Q filename
              Read quality scores from filename

       -S str Strain name

       -T filename
              Filename for phrap input (mutually exclusive with -A and -i)

       -X     The coordinates in the input file are on the resulting segmented sequence.  (Bases 1 through n  of
              each accession are used.)  Otherwise, the coordinates are on the individual accessions, which need
              not start at base 1 of the record.

       -a str GenBank accession; use if and only if updating a sequence.

       -b N   Gap length (default = 100; anything from 0 to 1000000000 is legal)

       -c str Clone name (will appear as /clone in the source feature; can be the same as -s)

       -d str Title for sequence (will appear in GenBank DEFINITION line)

       -e filename
              Log errors to filename

       -f     htgs_fulltop keyword

       -g str Genome Center tag (probably the same as your login name on the NCBI FTP server)

       -h str Chromosome (will appear as /chromosome in the source feature)

       -i filename
              Filename for fasta input (default is stdin; mutually exclusive with -A and -T)

       -k str Add the supplied string as a keyword.

       -l N   Length  of  sequence in bp (default = 0). The length is checked against the actual number of bases
              we get. For phase 1 and 2 sequence it is also used to estimate gap lengths.  For  phase  1  and  2
              records, it is important to use a number GREATER than the amount of provided nucleotide, otherwise
              this  will  generate  false  `gaps'.   Here is assumed that the putative full length of the BAC or
              cosmid will be used.  There should be at least 20 to 30 `n' in between the segments (you can check
              for these in Sequin), as this will ensure proper behavior when this sequence is used  with  BLAST.
              Otherwise `artifactual' unrelated segment neighbors may be brought into proximity of each other.

       -m     Take comment from template

       -n str Organism name (default = Homo sapiens)

       -o filename
              Filename for asn.1 output (default = stdout)

       -p N   HTGS phase:
              1      A  collection  of  unordered contigs with gaps of unknown length.  A Phase 1 record must at
                     the very least have two segments with one gap.  (default)
              2      A series of ordered contigs, possibly with known gap  lengths.   This  could  be  a  single
                     sequence without gaps, if the sequence has ambiguities to resolve.
              3      A single contiguous sequence.  This sequence is finished, but not necessarily annotated.

       -q     htgs_cancelled keyword

       -r str Remark for update (brief comment describing the nature of the update, such as "new sequence", "new
              citation", or "updated features")

       -s str Sequence  name.  The sequence must have a name that is unique within the genome center. We use the
              combination of the genome center name (-g argument) and the  sequence  name  (-s)  to  track  this
              sequence  and  to  talk  to  you about it.  The name can have any form you like but must be unique
              within your center.

       -t filename
              Filename for Seq-submit template (default = template.sub)

       -u     Take biosource from template

       -v     htgs_activefin keyword

       -w     Whole Genome Shotgun flag

       -x str Secondary accession numbers, separated by commas, s.t. U10000,L11000.

              In some cases a large  segment  will  supersede  another  or  group  of  other  accession  numbers
              (records).   These  records  which are no longer wanted in GenBank should be made secondary. Using
              the -x argument you can list the Accession Numbers you want to make secondary.  This will instruct
              us to remove the accession number(s) from GenBank, and will no  longer  be  part  of  the  GenBank
              release. They will nonetheless be available from Entrez.

              GREAT  CARE  should  be  taken when using this argument!!!  Improper use of accession numbers here
              will result in the inappropriate withdrawal of GenBank records from GenBank, EMBL  and  DDBJ.   We
              provide  this parameter as a convenience to submitting centers, but this may need to be removed if
              it is not used carefully.

AUTHOR

       The National Center for Biotechnology Information.

SEE ALSO

       Psequin(1), /usr/share/doc/ncbi-tools-bin/README.fa2htgs.gz

NCBI                                               2006-05-29                                         FA2HTGS(1)