Provided by: dictfmt_1.13.1+dfsg-1build1_amd64 bug

NAME

       dictfmt - formats a DICT protocol dictionary database

SYNOPSIS

       dictfmt  -c5|-t|-e|-f|-h|-j|-p [options]  basename
       dictfmt  -i|-I [options]

DESCRIPTION

       dictfmt  takes  a  file,  FILE,  on  stdin,  and  creates a dictionary database named basename.dict, that
       conforms to the DICT protocol.  It also creates an index file  named  basename.index.   By  default,  the
       index  is  sorted  according  to  the  C  locale, and only alphanumeric characters and spaces are used in
       sorting, however this may be changed with the --locale and --allchars options.  (  basename  is  commonly
       chosen to correspond to the basename of FILE , but this is not mandatory.)

       Unless  the  database  is extremely small, it is highly recommended that basename.dict be compressed with
       /usr/bin/dictzip to create basename.dict.dz.  (dictzip is included in the dictd source package.)

       FILE may be in any of the several formats described by the format options -c5, -t, -e, -f, -h, -j, -p, -i
       or -I.  Exactly one of these options must be given.

       dictfmt prepends several headers are to the .dict file.  The 00-database-url header gives  the  value  of
       the  -u  option  as  the URL of the site from which the original database was obtained.  The 00-database-
       short header gives the value of the -s option as the short name of the dictionary.  (This "short name" is
       the identifying name given by the "dict- D" option.)  If the -u and/or  -s  options  are  omitted,  these
       values will be shown as "unknown", which is undesirable for a publicly distributed database.

       The  date of conversion (formatting) is given in the 00-database-info header.  All text in the input file
       prior to the first headword (as defined by the appropriate formatting option) is appended to this header.
       All text in the input file following a headword, up to the next headword,  is  copied  unchanged  to  the
       .dict file.

FORMATTING OPTIONS

       -c5    FILE is formatted with headwords preceded by 5 or more underscore characters (_) and a blank line.
              All  text  until  the  next headword is considered the definition.  Any leading `@' characters are
              stripped out, but the file is otherwise unchanged. This option was written to format the CIA WORLD
              FACTBOOK 1995.

       -t     -c5, --without-info and --without-headword options are implied.  Use  this  option,  if  an  input
              database comes from dictunformat utility.

       -e     FILE is in html format, with the headword tagged as bold.  (<B>headword - </B>)
              This option was written to format EASTON'S 1897 BIBLE DICTIONARY.  A typical entry from Easton is:

              <A NAME="T0000005">
              <B>Abagtha - </B>
              one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

              This is converted to:
              Abagtha
                 one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

              The heading "<A NAME="T0000005"> is omitted, and the headword `Abagtha' is indexed.

              NOTE:  This  option  should  be used with caution.  It removes several html tags (enough to format
              Easton properly), but not all.  The Makefile that was originally  written  to  format  dict-easton
              uses  sed  scripts  to modify certain cross reference tags.  It may be necessary to pipe the input
              file through a sed script, or hack the source of dictfmt in order to properly  format  other  html
              databases.

       -f     FILE  is  formatted with the headwords starting in column 0, with the definition indented at least
              one space (or tab character) on subsequent lines.  The third line starting in column 0 is taken as
              the first headword , and the first two lines starting in column 0  are  treated  as  part  of  the
              00-database-info header.  This option was written to format the F.O.L.D.O.C.

       -h     FILE  is  formatted  with  the  headwords  starting  in  column  0,  followed by a comma, with the
              definition continuing on the same line.  All text  before  the  first  single  character  line  is
              included  in 00-database-info header, and lines with only one character are omitted from the .dict
              file.  The first headword is on the line following the first single character line.  The  headword
              is  indexed;  the  text of the file is not changed.  This option was written to format HITCHCOCK'S
              BIBLE NAMES DICTIONARY.

       -j     FILE is formatted with  headwords  starting  in  col  0,  enclosed  in  colons,  followed  by  the
              definition.   The colons surrounding the headword are removed, and the headword is indexed.  Lines
              beginning with '*', '=', or '-' are also removed.  All text before the first headword is  included
              in the headers.  This option was written to format the JARGON FILE.
              NOTE:  Some recent versions of the JARGON FILE had three blanks inserted before the first colon at
              each headword.  These must be removed before processing with dictfmt.  (sed scripts have been used
              for this purpose. ed, awk, or perl scripts are also possible.)

       -p     FILE is formatted with '%h' in column 0, followed by a blank, followed by the headword, optionally
              followed by a line containing `%d' in column 0.  The definition starts on the following line.  The
              first line beginning '%h' and any lines beginning '%d' are stripped from the .dict file, and '%h '
              is stripped from in front of the headword.  All text before the first headword is included in  the
              headers.  The second line beginning '%h' is taken as the first headword.
              This option was written to format Jay Kominek's elements database.

       -i -I  These  two  options  are different from all other formatting options.  They are intended to resort
              (according to dictd requirement) an .index file given  on  stdin.   That  is  .dict  file  is  not
              generated  at  all.  Only resorting is made.  Three- or four-column .index like input is expected.
              -i expects decimal offset and length, while -I expects them in base64 format.

OPTIONS

       -u url Specifies the URL of the site from which the  raw  database  was  obtained.   If  this  option  is
              specified, 00-database-url headword and appropriate definition will be ignored.

       -s name
              Specifies  the  name  and,  optionally,  the version and date, of the database.  (If this contains
              spaces, it must  be  quoted.)   If  this  option  is  specified,  00-database-short  headword  and
              appropriate definition will be ignored.

       -L     display license and copyright information

       -V     display version information

       -D     output debugging information

       --help display a help message

       --locale locale
              Specifies  the  locale  used  for sorting.  If no locale is specified, the "C" locale is used. For
              using UTF-8 mode, --utf8 is needed.

       --8bit generates database in 8-bit mode, see --locale option also.
              Note: This option is deprecated.  Use it for creating  8-bit  (non-UTF8)  dictionaries  only.   In
              order to create UTF-8 dictionary, use --utf8 option instead.

       --utf8 If specified, UTF-8 database is created.

       --allchars
              Specifies  that  all characters should be used for the search, by default only alphabetic, numeric
              characters and spaces are put to .index file and therefore are used in search. Creates the special
              entry 00-database-allchars.

       --case-sensitive
              makes the search case sensitive.  Creates the special entry 00-database-case-sensitive.

       --headword-separator sep
              sets the headword separator, which allows several words to have the same definition.  For example,
              if '--headword-separator %%%' is given, and the input file contains 'autumn%%%fall', both 'autumn'
              and 'fall' will be indexed as  headwords, with the same definition.

       --index-data-separator sep
              sets the index/data separator, which allows one to set the first and fourth columns of .index file
              independently. That is the first column can be treated as an index column (where the MATCH command
              searches) and the fourth column as a result column (where the MATCH gets things to  be  returned),
              and  they (1-st and 4-th columns) are completely independent of each other.  The default value for
              this separator is ASCII symbol " \034".

       --break-headwords
              multiple headwords  will  be  written  on  separate  lines  in  the  .dict  file.   For  use  with
              '--headword-separator.

       --index-keep-orig
              When  --utf-8  is  specified  headwords are lowercased and non-alphanumeric characters are removed
              from it before saving to .index file in order to  simplify  the  search.   When  --index-keep-orig
              option  is  used  fourth column is created (if necessary) in .index file, and contains an original
              headword which is returned by MATCH command.  This option may be useful to  prevent  converting  "
              AT&T" to " ATT" or to keep proper nouns with uppercased first letter.

       --without-headword
              headwords will not be included in .dict file

       --without-header
              header will not be copied to DB info entry

       --without-url
              URL will not be copied to DB info entry

       --without-time
              time of creation will not be copied to DB info entry

       --without-ver
              By default dictfmt creates a special entry 00-database-dictfmt-X.Y.Z that contains (in .dict file)
              dictfmt version in format dictfmt-X.Y.Z. This option suppresses this.

       --without-info
              DB  info  entry  will not be created.  This may be useful if 00-database-info headword is expected
              from stdin (dictunformat outputs it).

       --columns columns
              By default dictfmt wraps strings read from stdin to 72 columns.  This option changes this default.
              If it is set to zero or negative value, wrapping is off.

       --default-strategy strategy
              Sets the default search strategy for the database.  It will  be  used  instead  of  strategy  '.'.
              Special  entry  00-database-default-strategy  is  created  for  this  purpose.  This option may be
              useful, for example, for dictionaries containing mainly phrases but  the  single  words.   In  any
              case, use this option if you are absolutely sure what you are doing.

       --mime-header mime_header
              When  client  sends  OPTION  MIME  command  to  the dictd , definitions found in this database are
              prepended by the specified MIME header. Creates the special entry 00-database-mime-header.

CREDITS

       dictfmt was written by Rik Faith (faith@cs.unc.edu)  as  part  of  the  dict-misc  package.   dictfmt  is
       distributed  under  the  terms  of the GNU General Public License.  If you need to distribute under other
       terms, write to the author.

AUTHOR

       This manual page was written by Robert D. Hilliard <hilliard@debian.org> .

SEE ALSO

       dict(1), dictd(8), dictzip(1), dictunformat(1), http://www.dict.org, RFC 2229

                                                25 December 2000                                      DICTFMT(1)