Provided by: tcl9.0-doc_9.0.1+dfsg-1_all bug

NAME

       encoding - Manipulate encodings

SYNOPSIS

       encoding option ?arg arg ...?
________________________________________________________________________________________________________________

INTRODUCTION

       Strings  in  Tcl are logically a sequence of Unicode characters.  These strings are represented in memory
       as a sequence of bytes that may be in one of several encodings: modified UTF-8 (which uses 1 to  4  bytes
       per character), or a custom encoding start as 8 bit binary data.

       Different  operating  system  interfaces  or applications may generate strings in other encodings such as
       Shift-JIS.  The encoding command helps to bridge the gap between Unicode and these other formats.

DESCRIPTION

       Performs one of several encoding related operations, depending on option.  The legal options are:

       encoding convertfrom ?encoding? data

       encoding convertfrom ?-profile profile? ?-failindex var? encoding data                                    2
              Converts data, which should be in binary string encoded as per  encoding,  to  a  Tcl  string.  If
              encoding is not specified, the current system encoding is used.

       The  -profile  option  determines  the  command  behavior  in  the presence of conversion errors. See the 2
       PROFILES section below for details. Any premature termination of processing due  to  errors  is  reported 2
       through an exception if the -failindex option is not specified.                                           2

       If the -failindex is specified, instead of an exception being raised on premature termination, the result 2
       of the conversion up to the point of the error is returned as the result of the command. In addition, the 2
       index  of the source byte triggering the error is stored in var. If no errors are encountered, the entire 2
       result of the conversion is returned and the value -1 is stored in var.

       encoding convertto ?encoding? data

       encoding convertto ?-profile profile? ?-failindex var? encoding data
              Convert string to the specified encoding. The result is a Tcl  binary  string  that  contains  the
              sequence  of bytes representing the converted string in the specified encoding. If encoding is not
              specified, the current system encoding is used.

       The -profile and -failindex options have the same  effect  as  described  for  the  encoding  convertfrom 2
       command.

       encoding dirs ?directoryList?
              Tcl can load encoding data files from the file system that describe additional encodings for it to
              work  with.  This  command  sets  the  search  path  for  *.enc encoding data files to the list of
              directories directoryList. If directoryList is omitted then the command returns the  current  list
              of  directories  that  make up the search path. It is an error for directoryList to not be a valid
              list. If, when a search for an encoding data file is happening, an element in  directoryList  does
              not refer to a readable, searchable directory, that element is ignored.

       encoding names
              Returns  a  list  containing  the names of all of the encodings that are currently available.  The
              encodings “utf-8” and “iso8859-1” are guaranteed to be present in the list.

       encoding profiles
              Returns a list of the names of encoding profiles. See PROFILES below.                              2

       encoding system ?encoding?
              Set the system encoding to encoding. If encoding is omitted then the command returns  the  current
              system encoding.  The system encoding is used whenever Tcl passes strings to system calls.

PROFILES

       Operations  involving encoding transforms may encounter several types of errors such as invalid sequences 2
       in the source data, characters that cannot be encoded in the  target  encoding  and  so  on.   A  profile 2
       prescribes the strategy for dealing with such errors in one of two ways:

       •      Terminating  further  processing  of  the  source  data.  The  profile does not determine how this 2
              premature termination is conveyed to the caller. By default,  this  is  signalled  by  raising  an 2
              exception. If the -failindex option is specified, errors are reported through that mechanism.

       •      Continue  further  processing  of  the  source data using a fallback strategy such as replacing or 2
              discarding the offending bytes in a profile-defined manner.

       The following profiles are currently implemented with strict being the default if  the  -profile  is  not
       specified.                                                                                                2

       strict                                                                                                    2
              The  strict  profile always stops processing when an conversion error is encountered. The error is 2
              signalled via an exception or the -failindex option mechanism. The  strict  profile  implements  a 2
              Unicode standard conformant behavior.                                                              2

       tcl8                                                                                                      2
              The  tcl8  profile  always  follows  the  first  strategy above and corresponds to the behavior of 2
              encoding transforms in Tcl 8.6. When converting from an external encoding other than utf-8 to  Tcl 2
              strings  with  the  encoding  convertfrom  command,  invalid bytes are mapped to their numerically 2
              equivalent code points. For example, the byte 0x80 which is invalid in ASCII would  be  mapped  to 2
              code point U+0080. When converting from utf-8, invalid bytes that are defined in CP1252 are mapped 2
              to  their Unicode equivalents while those that are not fall back to the numerical equivalents. For 2
              example, byte 0x80 is defined by CP1252 and is therefore mapped to its Unicode  equivalent  U+20AC 2
              while byte 0x81 which is not defined by CP1252 is mapped to U+0081. As an additional special case, 2
              the sequence 0xC0 0x80 is mapped to U+0000.                                                        2

              When  converting  from  Tcl  strings  to  an  external  encoding  format using encoding convertto, 2
              characters that cannot be represented in the target encoding are replaced by an encoding-dependent 2
              character, usually the question mark ?.                                                            2

       replace                                                                                                   2
              Like the tcl8 profile, the replace profile always continues processing on  conversion  errors  but 2
              follows a Unicode standard conformant method for substitution of invalid source data.              2

              When converting an encoded byte sequence to a Tcl string using encoding convertfrom, invalid bytes 2
              are replaced by the U+FFFD REPLACEMENT CHARACTER code point.                                       2

              When  encoding a Tcl string with encoding convertto, code points that cannot be represented in the 2
              target encoding are transformed to an encoding-specific  fallback  character,  U+FFFD  REPLACEMENT 2
              CHARACTER for UTF targets and generally `?` for other encodings.

EXAMPLES

       These examples use the utility proc below that prints the Unicode code points comprising a Tcl string.

              proc codepoints s {join [lmap c [split $s {}] {
                  string cat U+ [format %.6X [scan $c %c]]}]
              }

       Example 1: convert a byte sequence in Japanese euc-jp encoding to a TCL string:

              % codepoints [encoding convertfrom euc-jp "\xA4\xCF"]
              U+00306F

       The result is the unicode codepoint “\u306F”, which is the Hiragana letter HA.                            2

       Example 2: Error handling based on profiles:                                                              2

       The letter A is Unicode character U+0041 and the byte "\x80" is invalid in ASCII encoding.                2

              % codepoints [encoding convertfrom -profile tcl8 ascii A\x80]                                      2
              U+000041 U+000080                                                                                  2
              % codepoints [encoding convertfrom -profile replace ascii A\x80]                                   2
              U+000041 U+00FFFD                                                                                  2
              % codepoints [encoding convertfrom -profile strict ascii A\x80]                                    2
              unexpected byte sequence starting at index 1: '\x80'                                               2

       Example 3: Get partial data and the error location:                                                       2

              % codepoints [encoding convertfrom -profile strict -failindex idx ascii AB\x80]                    2
              U+000041 U+000042                                                                                  2
              % set idx                                                                                          2
              2                                                                                                  2

       Example 4: Encode a character that is not representable in ISO8859-1:                                     2

              % encoding convertto iso8859-1 A\u0141                                                             2
              A?                                                                                                 2
              % encoding convertto -profile strict iso8859-1 A\u0141                                             2
              unexpected character at index 1: 'U+000141'                                                        2
              % encoding convertto -profile strict -failindex idx iso8859-1 A\u0141                              2
              A                                                                                                  2
              % set idx                                                                                          2
              1                                                                                                  2

SEE ALSO

       Tcl_GetEncoding(3tcl), fconfigure(3tcl)

KEYWORDS

       encoding, unicode

Tcl                                                    8.1                                        encoding(3tcl)