Provided by: libmandoc-dev_1.14.6-3_amd64 bug

NAME

       mandoc_escape — parse roff escape sequences

SYNOPSIS

       #include <sys/types.h>
       #include <mandoc.h>

       enum mandoc_esc
       mandoc_escape(const char **end, const char **start, int *sz);

DESCRIPTION

       This function scans a roff(7) escape sequence.

       An escape sequence consists of
       -   an initial backslash character (‘\’),
       -   a single ASCII character called the escape sequence identifier,
       -   and, with only a few exceptions, an argument.

       Arguments can be given in the following forms; some escape sequence identifiers only accept some of these
       forms as specified below.  The first three forms are called the standard forms.

       In brackets: [argument]
           The  argument  starts  after the initial ‘[’, ends before the final ‘]’, and the escape sequence ends
           with the final ‘]’.

       Two-character argument short form: (ar
           This form can only be used for arguments consisting of exactly  two  characters.   It  has  the  same
           effect as [ar].

       One-character argument short form: a
           This form can only be used for arguments consisting of exactly one character.  It has the same effect
           as [a].

       Delimited form: CargumentC
           The  argument  starts after the initial delimiter character C, ends before the next occurrence of the
           delimiter character C, and the escape sequence ends with that second C.  Some escape sequences  allow
           arbitrary  characters C as quoting characters, some restrict the range of characters that can be used
           as quoting characters.

       Upon function entry, end is expected to point to the escape sequence identifier.  The values passed in as
       start and sz are ignored and overwritten.

       By design, this function cannot handle those roff(7) escape sequences that require in-place expansion, in
       particular user-defined strings \*, number registers \n, width measurements \w, and numerical  expression
       control \B.  These are handled by roff_res(), a private preprocessor function called from roff_parseln(),
       see the file roff.c.

       The function mandoc_escape() is used
       -   recursively  by  itself,  because  some  escape  sequence  arguments can in turn contain other escape
           sequences,
       -   for error detection internally by the roff(7) parser part of the  mandoc(3)  library,  see  the  file
           roff.c,
       -   above  all  externally  by  the  mandoc(1)  formatting modules, in particular -Tascii and -Thtml, for
           formatting purposes, see the files term.c and html.c,
       -   and rarely externally by high-level utilities using the mandoc library, for example makewhatis(8), to
           purge escape sequences from text.

RETURN VALUES

       Upon function return, the pointer end is set to the character after the end of the escape sequence,  such
       that the calling higher-level parser can easily continue.

       For escape sequences taking an argument, the pointer start is set to the beginning of the argument and sz
       is  set  to the length of the argument.  For escape sequences not taking an argument, start is set to the
       character after the end of the sequence and sz is set to 0.  Both start and sz may be NULL; in that case,
       the argument and the length are not returned.

       For sequences taking an argument, the function mandoc_escape() returns one of the following values:

       ESCAPE_FONT
           The escape sequence \f taking an argument in standard form: \f[, \f(, \fa.   Two-character  arguments
           starting  with  the  character  ‘C’ are reduced to one-character arguments by skipping the ‘C’.  More
           specific values are returned for the most commonly used arguments:

           argument    return value
           R or 1      ESCAPE_FONTROMAN
           I or 2      ESCAPE_FONTITALIC
           B or 3      ESCAPE_FONTBOLD
           P           ESCAPE_FONTPREV
           BI          ESCAPE_FONTBI

       ESCAPE_SPECIAL
           The escape sequence \C taking an argument delimited with the single quote character and, as a special
           exception, the escape sequences not having an identifier, that  is,  those  where  the  argument,  in
           standard  form, directly follows the initial backslash: \C', \[, \(, \a.  Note that the one-character
           argument short form can only be used for argument characters that do not clash with  escape  sequence
           identifiers.

           If the argument matches one of the forms described below under ESCAPE_UNICODE, that value is returned
           instead.

           The   ESCAPE_SPECIAL  special  character  escape  sequences  can  be  rendered  using  the  functions
           mchars_spec2cp() and mchars_spec2str() described in the mchars_alloc(3) manual.

       ESCAPE_UNICODE
           Escape sequences of the same format as described above under ESCAPE_SPECIAL, but with an argument  of
           the  forms  uXXXX,  uYXXXX,  or u10XXXX where X and Y are hexadecimal digits and Y is not zero: \C'u,
           \[u.  As a special exception, start is set to the character after the u, and the sz return value does
           not include the u either.

           Such Unicode character escape sequences can be rendered using the function mchars_num2uc()  described
           in the mchars_alloc(3) manual.

       ESCAPE_NUMBERED
           The escape sequence \N followed by a delimited argument.  The delimiter character is arbitrary except
           that  digits  cannot be used.  If a digit is encountered instead of the opening delimiter, that digit
           is considered to be the argument and the end of the sequence, and ESCAPE_IGNORE is returned.

           Such ASCII character escape sequences can be rendered using the function mchars_num2char()  described
           in the mchars_alloc(3) manual.

       ESCAPE_OVERSTRIKE
           The escape sequence \o followed by an argument delimited by an arbitrary character.

       ESCAPE_IGNORE

              The  escape  sequence  \s followed by an argument in standard form or by an argument delimited by
               the single quote character: \s', \s[, \s(, \sa.  As a special exception, an optional ‘+’  or  ‘-’
               character is allowed after the ‘s’ for all forms.

              The escape sequences \F, \g, \k, \M, \m, \n, \V, and \Y followed by an argument in standard form.

              The escape sequences \A, \b, \D, \R, \X, and \Z followed by an argument delimited by an arbitrary
               character.

              The  escape  sequences  \H,  \h,  \L,  \l,  \S, \v, and \x followed by an argument delimited by a
               character that cannot occur in numerical expressions.  However, if any character that  can  occur
               in  numerical expressions is found instead of a delimiter, the sequence is considered to end with
               that character, and ESCAPE_ERROR is returned.

       ESCAPE_ERROR
           Escape sequences taking an argument but not matching any of the above patterns.  In particular,  that
           happens if the end of the logical input line is reached before the end of the argument.

       For  sequences  that  do  not take an argument, the function mandoc_escape() returns one of the following
       values:

       ESCAPE_SKIPCHAR
           The escape sequence "\z".

       ESCAPE_NOSPACE
           The escape sequence "\c".

       ESCAPE_IGNORE
           The escape sequences "\d" and "\u".

FILES

       This function is implemented in mandoc.c.

SEE ALSO

       mchars_alloc(3), mandoc_char(7), roff(7)

HISTORY

       This function has been available since mandoc 1.11.2.

AUTHORS

       Kristaps Dzonsons <kristaps@bsd.lv>
       Ingo Schwarze <schwarze@openbsd.org>

BUGS

       The function doesn't cleanly distinguish between sequences  that  are  valid  and  supported,  valid  and
       ignored,  valid  and unsupported, syntactically invalid, or undefined.  For sequences that are ignored or
       unsupported, it doesn't tell whether that deficiency is likely to cause major formatting problems  and/or
       loss  of  document  content.   The function is already rather complicated and still parses some sequences
       incorrectly.

Debian                                            July 4, 2017                                  MANDOC_ESCAPE(3)