Provided by: tcllib_1.21+dfsg-1_all bug

NAME

       string::token::shell - Parsing of shell command line

SYNOPSIS

       package require Tcl  8.5

       package require string::token::shell  ?1.2?

       package require string::token  ?1?

       package require fileutil

       ::string token shell ?-indices? ?-partial? ?--? string

________________________________________________________________________________________________________________

DESCRIPTION

       This package provides a command which parses a line of text using basic sh-syntax into a list of words.

       The complete set of procedures is described below.

       ::string token shell ?-indices? ?-partial? ?--? string
              This  command  parses  the input string under the assumption of it following basic sh-syntax.  The
              result of the command is a list of words in the string.  An error is thrown if the input does  not
              follow  the  allowed  syntax.   The behaviour can be modified by specifying any of the two options
              -indices and -partial.

              --     When specified option parsing stops at this point. This  option  is  needed  if  the  input
                     string  may start with dash. In other words, this is pretty much required if string is user
                     input.

              -indices
                     When specified the output is not a list of words, but a list  of  4-tuples  describing  the
                     words.  Each  tuple contains the type of the word, its start- and end-indices in the input,
                     and the actual text of the word.

                     Note that the length of the word as given by the indices can differ from the length of  the
                     word  found  in the last element of the tuple. The indices describe the words extent in the
                     input, including delimiters, intra-word quoting, etc. whereas for the actual  text  of  the
                     word delimiters are stripped, intra-word quoting decoded, etc.

                     The possible token types are

                     PLAIN  Plain word, not quoted.

                     D:QUOTED
                            Word is delimited by double-quotes.

                     S:QUOTED
                            Word is delimited by single-quotes.

                     D:QUOTED:PART

                     S:QUOTED:PART
                            Like  the  previous  types,  but  the word has no closing quote, i.e. is incomplete.
                            These token types can occur if and only if the option -partial  was  specified,  and
                            only  for the last word of the result. If the option -partial was not specified such
                            incomplete words cause the command to thrown an error instead.

              -partial
                     When specified the parser will accept an  incomplete  quoted  word  (i.e.  without  closing
                     quote) at the end of the line as valid instead of throwing an error.

       The  basic  shell  syntax  accepted  here  are  unquoted,  single-  and double-quoted words, separated by
       whitespace. Leading and trailing whitespace are possible too, and stripped.   Shell  variables  in  their
       various  forms  are  not recognized, nor are sub-shells.  As for the recognized forms of words, see below
       for the detailed specification.

              single-quoted word
                     A single-quoted word begins with a single-quote character, i.e.  ' (ASCII 39)  followed  by
                     zero or more unicode characters not a single-quote, and then closed by a single-quote.

                     The  word  must  be  followed by either the end of the string, or whitespace. A word cannot
                     directly follow the word.

              double-quoted word
                     A double-quoted word begins with a double-quote character, i.e.  " (ASCII 34)  followed  by
                     zero or more unicode characters not a double-quote, and then closed by a double-quote.

                     Contrary to single-quoted words a double-quote can be embedded into the word, by prefacing,
                     i.e.  escaping,  i.e.  quoting  it  with  a  backslash  character \ (ASCII 92). Similarly a
                     backslash character must be quoted with itself to be inserted literally.

              unquoted word
                     Unquoted words are not delimited by quotes and thus cannot contain  whitespace  or  single-
                     quote  characters. Double-quote and backslash characters can be put into unquoted words, by
                     quting them like for double-quoted words.

              whitespace
                     Whitespace is any unicode space character.  This is equivalent to string is space,  or  the
                     regular expression \\s.

                     Whitespace  may  occur before the first word, or after the last word. Whitespace must occur
                     between adjacent words.

BUGS, IDEAS, FEEDBACK

       This document, and the package it describes, will undoubtedly contain bugs and  other  problems.   Please
       report  such  in  the  category  textutil  of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].
       Please also report any ideas for enhancements you may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note further that attachments are strongly preferred over inlined patches. Attachments  can  be  made  by
       going  to the Edit form of the ticket immediately after its creation, and then using the left-most button
       in the secondary navigation bar.

KEYWORDS

       bash, lexing, parsing, shell, string, tokenization

CATEGORY

       Text processing

tcllib                                                 1.2                            string::token::shell(3tcl)