Provided by: eoconv_1.5-5_amd64 

NAME
eoconv - Convert text files between various Esperanto encodings
USAGE
eoconv [-q] --from=encoding --to=encoding [file ...]
Options:
--from specify input encoding (see below)
--to specify output encoding (see below)
-q, --quiet suppress warnings
--help detailed help message
--man full documentation
--version display version information
Valid encodings:
post-h post-H post-x post-X post-caret pre-caret latex
html-hex html-dec iso-8859-3 utf-7 utf-8 utf-16 utf-32
DESCRIPTION
eoconv will read the given input files (or stdin if no files are specified) containing Esperanto text in
the encoding specified by --from, and then output it in the encoding specified by --to.
OPTIONS
--from=encoding Specify character encoding for input
--to=encoding Specify character encoding for output
-q --quiet Suppress non-essential warning messages
-? --help Print a brief help message and exit.
--man Print the manual page and exit.
--version Print version information and exit.
CHARACTER ENCODINGS
post-h ASCII postfix h notation
post-H ASCII postfix H notation
post-x ASCII postfix x notation
post-X ASCII postfix X notation
post-caret ASCII postfix caret (^) notation
pre-caret ASCII prefix caret (^) notation
latex, LaTeX ASCII LaTeX sequences
html-hex, HTML-hex
ASCII HTML hexadecimal entities
html-dec, HTML-dec
ASCII HTML decimal entities
iso-8859-3, ISO-8859-3, latin3, latin-3, Latin3, Latin-3
ISO-8859-3
utf-7, UTF-7, utf7, UTF7
Unicode UTF-7
utf-8, UTF-8, utf8, UTF8
Unicode UTF-8
utf-16, UTF-16, utf16, UTF16
Unicode UTF-16
utf-32, UTF-32, utf32, UTF32
Unicode UTF-32
ESPERANTO ORTHOGRAPHY
Esperanto is written in an alphabet of 28 letters. However, only 22 of these letters can be found in the
standard ASCII character set. The remaining six -- `c', `g', `h', `j', and `s' with circumflex, and `u'
with breve -- are not available in ASCII; neither are they among the characters available in the common
8-bit ISO-8859-1 character encoding. Therefore, while the six special Esperanto characters pose no
problem for handwritten texts, they were impossible to represent on standard typewriters, and are
somewhat problematic even on modern-day computers. Various encoding systems have been developed to
represent Esperanto text in printed and typed text.
POSTFIX-h NOTATION
This was the solution proposed by the creator of Esperanto, L. L. Zamenhof. He recommended using `u' for
`u-breve' and appending an `h' to a letter to indicate that it should have a circumflex. However, the
letters `u' and `h' are already part of the Esperanto alphabet, so using them for another purpose invites
ambiguity and mispronunciation. It also makes conversion of Esperanto text to postfix-h notation `lossy'
or one-way; it is generally not possible to convert from postfix-h notation via automated means. This
notation suffers from the additional drawback that the text cannot be sorted with standard rules for
ASCII text.
POSTFIX-H NOTATION
This is the same as postfix-h notation, except that `H' is used instead of `h' following a capital
letter.
POSTFIX-x NOTATION
This is the most common ASCII notation encountered today. It involves appending an `x' to a letter to
indicate that it should have an accent (be it circumflex or breve). Since `x' is not a letter in the
Esperanto alphabet, no ambiguity results. However, ASCII sorting algorithms still fail with postfix-x
text.
POSTFIX-X NOTATION
This is the same as postfix-x notation, except that `X' is used instead of `x' following a capital
letter.
PREFIX- AND POSTFIX-CARET NOTATION
Two slightly less popular ASCII encodings are to prepend or append a caret (`^') to a letter to indicate
that it should have an accent.
ISO-8859-3 (LATIN-3)
ISO 8859-3, also known as Latin-3 or South European, is an 8-bit character encoding for Esperanto. High-
bit characters are used to encode the accented Esperanto letters. ISO-8859-3 can also be used for
encoding English, Finnish, German, Italian, Latin, Maltese, Turkish, and Portuguese, making it useful for
texts which mix Esperanto with one or more of these languages.
UNICODE (ISO/IEC 10646)
Unicode is a standard for matching every character of every human language to a specific code. The
mapping methods are known as Unicode Transformation Formats (UTF). Among them are UTF-32, UTF-16, UTF-8
and UTF-7, where the numbers indicate the number of bits in one unit.
LaTeX SEQUENCES
The popular LaTeX typesetting package is capable of representing virtually any accented character. Note
that conversion from LaTeX sequences assumes that characters to be accented are enclosed in braces -- for
example, `\^{C}' will be recognized as `C' with circumflex, but `\^C' will not be.
HTML ENTITIES
Unicode codes for Esperanto characters can be escaped in HTML documents by using HTML entities. The
codes can be represented in either decimal (base-10) or hexadecimal (base-16) notation; the two are
functionally equivalent.
BUGS AND LIMITATIONS
Because the postfix-h and postfix-H notations are inherently ambiguous, conversion from postfix-h or -H
text is unlikely to result in coherent text. Use at your own risk, and carefully proofread the results.
Report bugs to <psychonaut@nothingisreal.com>.
AUTHOR
Tristan Miller <psychonaut@nothingisreal.com>
SEE ALSO
charsets(7), ascii(7), iso_8859-3(7), unicode(7), utf-8(7), latex(1)
LICENSE AND COPYRIGHT
Copyright (C) 2004-2016 Tristan Miller.
Permission is granted to make and distribute verbatim or modified copies of this manual provided the
copyright notice and this permission notice are preserved on all copies.
perl v5.40.1 2025-02-25 EOCONV(1)