Provided by: tcllib_1.21+dfsg-1_all bug

NAME

       pt - Parser Tools Application

SYNOPSIS

       package require Tcl  8.5

       pt generate resultformat ?options...? resultfile inputformat inputfile

________________________________________________________________________________________________________________

DESCRIPTION

       Are  you lost ?  Do you have trouble understanding this document ?  In that case please read the overview
       provided by the Introduction to Parser Tools. This document is the entrypoint to  the  whole  system  the
       current package is a part of.

       This document describes pt, the main application of the module, a parser generator. Its intended audience
       are  people  who  wish  to  create  a  parser  for some language of theirs. Should you wish to modify the
       application instead, please see the section about the application's Internals for the basic references.

       It resides in the User Application Layer of Parser Tools.

       IMAGE: arch_user_app

COMMAND LINE

       pt generate resultformat ?options...? resultfile inputformat inputfile
              This sub-command of the application reads the parsing expression grammar stored in  the  inputfile
              in  the  format  inputformat,  converts it to the resultformat under the direction of the (format-
              specific) set of options specified by the user and stores the result in the resultfile.

              The inputfile has to exist, while the resultfile may  be  created,  overwriting  any  pre-existing
              content of the file. Any missing directory in the path to the resultfile will be created as well.

              The  exact  form  of the result for, and the set of options supported by the known result-formats,
              are explained in the upcoming sections of this document, with the list below  providing  an  index
              mapping between format name and its associated section. In alphabetical order:

              c      A resultformat. See section C Parser.

              container
                     A resultformat. See section Grammar Container.

              critcl A resultformat. See section C Parser Embedded In Tcl.

              json   A input- and resultformat. See section JSON Grammar Exchange.

              oo     A resultformat. See section TclOO Parser.

              peg    A input- and resultformat. See section PEG Specification Language.

              snit   A resultformat. See section Snit Parser.

       Of  the  seven  possible  results  four  are  parsers outright (c, critcl, oo, and snit), one (container)
       provides code which can be  used  in  conjunction  with  a  generic  parser  (also  known  as  a  grammar
       interpreter),  and  the  last  two  (json  and  peg) are doing double-duty as input formats, allowing the
       transformation of grammars for exchange, reformatting, and the like.

       The created parsers fall into three categories:

       .nf + --- C ---> critcl, c | + --- specialized -+ |                  | ---+                  + --- Tcl ->
       snit, oo | + --- interpreted (Tcl) ------> container .fi

       Specialized parsers implemented in C
              The fastest parsers are created when using the result formats c and critcl. The first returns  the
              raw C code for the parser, while the latter wraps it into a Tcl package using CriTcl.

              This  makes  the  latter  much easier to use than the former. On the other hand, the former can be
              adapted to the users' requirements through a multitude of options, allowing for things like  usage
              of  the  parser outside of a Tcl environment, something the critcl format doesn't support. As such
              the c format is meant for more advanced users, or users with special needs.

              A disadvantage of all the parsers in this section is the need to run them through a C compiler  to
              make  them  actually  executable.  This is not something everyone has the necessary tools for. The
              parsers in the next section are for people under such restrictions.

       Specialized parsers implemented in Tcl
              As the parsers in this section are implemented in Tcl they are quite a bit  slower  than  anything
              from the previous section. On the other hand this allows them to be used in pure-Tcl environments,
              or  in  environments which allow only a limited set of binary packages. In the latter case it will
              be advantageous to lobby for the inclusion of the C-based runtime support (notes below)  into  the
              environment to reduce the impact of Tcl's on the speed of these parsers.

              The  relevant  formats  are  snit  and oo. Both place their result into a Tcl package containing a
              snit::type, or TclOO class respectively.

              Of the supporting runtime, which is the package pt::rde, the user has to know nothing but that  it
              does  exist and that the parsers are dependent on it. Knowledge of the API exported by the runtime
              for the parsers' consumption is not required by the parsers' users.

       Interpreted parsing implemented in Tcl
              The last category, grammar interpretation. This means that an interpreter for  parsing  expression
              grammars  takes  the  description of the grammar to parse input for, and uses it guide the parsing
              process.  This is the slowest of the available options, as the interpreter has to continually  run
              through  the configured grammar, whereas the specialized parsers of the previous sections have the
              relevant knowledge about the grammar baked into them.

              The only places where using interpretation make sense is where the grammar for some input  may  be
              changed  interactively  by  the user, as the interpretation allows for quick turnaround after each
              change, whereas the previous methods require the generation of a whole new parser, which is not as
              fast.  On the other hand, wherever the grammar to use is fixed, the previous methods are much more
              advantageous as the time to generate the parser is minuscule compared to the time the parser  code
              is in use.

              The  relevant result format is container.  It (quickly) generates grammar descriptions (instead of
              a full parser) which match the API expected by ParserTools' grammar interpreter.   The  latter  is
              provided by the package pt::peg::interp.

       All  the  parsers  generated  by critcl, snit, and oo, and the grammar interpreter share a common API for
       access to the actual parsing functionality, making them all plug-compatible.   It  is  described  in  the
       Parser API specification document.

PEG SPECIFICATION LANGUAGE

       peg,  a  language for the specification of parsing expression grammars is meant to be human readable, and
       writable as well, yet strict enough to allow its processing by machine. Like any  computer  language.  It
       was defined to make writing the specification of a grammar easy, something the other formats found in the
       Parser Tools do not lend themselves too.

       For  either  an  introduction  to or the formal specification of the language, please go and read the PEG
       Language Tutorial.

       When used as a result-format this format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -template string
              The  value  of  this option is a string into which to put the generated text and the values of the
              other options. The various  locations  for  user-data  are  expected  to  be  specified  with  the
              placeholders listed below. The default value is "@code@".

              @user@ To be replaced with the value of the option -user.

              @format@
                     To be replaced with the the constant PEG.

              @file@ To be replaced with the value of the option -file.

              @name@ To be replaced with the value of the option -name.

              @code@ To be replaced with the generated text.

JSON GRAMMAR EXCHANGE

       The  json  format for parsing expression grammars was written as a data exchange format not bound to Tcl.
       It was defined to allow the exchange of grammars with  PackRat/PEG  based  parser  generators  for  other
       languages.

       For  the  formal  specification  of the JSON grammar exchange format, please go and read The JSON Grammar
       Exchange Format.

       When used as a result-format this format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -indented boolean
              If  this  option  is  set  the  system  will  break  the generated JSON across lines and indent it
              according to its inner structure, with each key of a dictionary on a separate line.

              If the option is not set (the default), the whole JSON object will be written on  a  single  line,
              with minimum spacing between all elements.

       -aligned boolean
              If  this  option  is  set  the system will ensure that the values for the keys in a dictionary are
              vertically aligned with each other, for a nice table effect.  To make this work this also  implies
              that -indented is set.

              If  the  option  is  not  set (the default), the output is formatted as per the value of indented,
              without trying to align the values for dictionary keys.

C PARSER EMBEDDED IN TCL

       The critcl format is executable code, a parser for the grammar. It is  a  Tcl  package  with  the  actual
       parser implementation written in C and embedded in Tcl via the critcl package.

       This result-format supports the following options:

       -file string
              The  value of this option is the name of the file or other entity from which the grammar came, for
              which the command is run. The default value is unknown.

       -name string
              The value of this option is the name of the grammar we  are  processing.   The  default  value  is
              a_pe_grammar.

       -user string
              The  value  of this option is the name of the user for which the command is run. The default value
              is unknown.

       -class string
              The value of this option is the name of the  class  to  generate,  without  leading  colons.   The
              default value is CLASS.

              For  a  simple  value X without colons, like CLASS, the parser command will be X::X. Whereas for a
              namespaced value X::Y the parser command will be X::Y.

       -package string
              The value of this option is the name of the package to generate.  The default value is PACKAGE.

       -version string
              The value of this option is the version of the package to generate.  The default value is 1.

C PARSER

       The c format is executable code, a parser for the grammar. The parser implementation is written in C  and
       can be tweaked to the users' needs through a multitude of options.

       The  critcl  format, for example, is implemented as a canned configuration of these options on top of the
       generator for c.

       This result-format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -template string
              The  value  of  this  option  is  a  string  into  which  to  put the generated text and the other
              configuration settings. The various locations for user-data are expected to be specified with  the
              placeholders listed below. The default value is "@code@".

              @user@ To be replaced with the value of the option -user.

              @format@
                     To be replaced with the the constant C/PARAM.

              @file@ To be replaced with the value of the option -file.

              @name@ To be replaced with the value of the option -name.

              @code@ To be replaced with the generated Tcl code.

              The  following  options  are  special,  in that they will occur within the generated code, and are
              replaced there as well.

              @statedecl@
                     To be replaced with the value of the option state-decl.

              @stateref@
                     To be replaced with the value of the option state-ref.

              @strings@
                     To be replaced with the value of the option string-varname.

              @self@ To be replaced with the value of the option self-command.

              @def@  To be replaced with the value of the option fun-qualifier.

              @ns@   To be replaced with the value of the option namespace.

              @main@ To be replaced with the value of the option main.

              @prelude@
                     To be replaced with the value of the option prelude.

       -state-decl string
              A C string representing the argument declaration to use in  the  generated  parsing  functions  to
              refer  to  the  parsing state. In essence type and argument name.  The default value is the string
              RDE_PARAM p.

       -state-ref string
              A C string representing the argument named used in the generated parsing functions to refer to the
              parsing state.  The default value is the string p.

       -self-command string
              A C string representing the reference needed to call the generated parser function  (methods  ...)
              from another parser fonction, per the chosen framework (template).  The default value is the empty
              string.

       -fun-qualifier string
              A  C  string  containing  the attributes to give to the generated functions (methods ...), per the
              chosen framework (template).  The default value is static.

       -namespace string
              The name of the C namespace the parser functions (methods, ...) shall  reside  in,  or  a  general
              prefix to add to the function names.  The default value is the empty string.

       -main string
              The  name  of  the  main function (method, ...) to be called by the chosen framework (template) to
              start parsing input.  The default value is __main.

       -string-varname string
              The name of the variable used for the table of strings used by the generated  parser,  i.e.  error
              messages, symbol names, etc.  The default value is p_string.

       -prelude string
              A  snippet  of  code  to  be inserted at the head of each generated parsing function.  The default
              value is the empty string.

       -indent integer
              The number of characters to indent each line of the generated code by.  The default value is 0.

       -comments boolean
              A flag controlling the generation of code comments containing the original  parsing  expression  a
              parsing function is for.  The default value is on.

SNIT PARSER

       The  snit  format is executable code, a parser for the grammar. It is a Tcl package holding a snit::type,
       i.e. a class, whose instances are parsers for the input grammar.

       This result-format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -class string
              The  value  of  this option is the name of the class to generate, without leading colons. Note, it
              serves double-duty as the name of  the  package  to  generate  too,  if  option  -package  is  not
              specified,  see below.  The default value is CLASS, applying if neither option -class nor -package
              were specified.

       -package string
              The value of this option is the name of the package to generate, without leading colons. Note,  it
              serves  double-duty  as  the name of the class to generate too, if option -class is not specified,
              see above.  The default value is PACKAGE, applying if neither  option  -package  nor  -class  were
              specified.

       -version string
              The value of this option is the version of the package to generate.  The default value is 1.

TCLOO PARSER

       The  oo  format  is executable code, a parser for the grammar. It is a Tcl package holding a TclOO class,
       whose instances are parsers for the input grammar.

       This result-format supports the following options:

       -file string
              The value of this option is the name of the file or other entity from which the grammar came,  for
              which the command is run. The default value is unknown.

       -name string
              The  value  of  this  option  is  the name of the grammar we are processing.  The default value is
              a_pe_grammar.

       -user string
              The value of this option is the name of the user for which the command is run. The  default  value
              is unknown.

       -class string
              The  value  of  this option is the name of the class to generate, without leading colons. Note, it
              serves double-duty as the name of  the  package  to  generate  too,  if  option  -package  is  not
              specified,  see below.  The default value is CLASS, applying if neither option -class nor -package
              were specified.

       -package string
              The value of this option is the name of the package to generate, without leading colons. Note,  it
              serves  double-duty  as  the name of the class to generate too, if option -class is not specified,
              see above.  The default value is PACKAGE, applying if neither  option  -package  nor  -class  were
              specified.

       -version string
              The value of this option is the version of the package to generate.  The default value is 1.

GRAMMAR CONTAINER

       The container format is another form of describing parsing expression grammars. While data in this format
       is  executable  it  does not constitute a parser for the grammar. It always has to be used in conjunction
       with the package pt::peg::interp, a grammar interpreter.

       The format represents grammars by a snit::type, i.e. class, whose instances  are  API-compatible  to  the
       instances of the pt::peg::container package, and which are preloaded with the grammar in question.

       This result-format supports the following options:

       -file string
              The  value of this option is the name of the file or other entity from which the grammar came, for
              which the command is run. The default value is unknown.

       -name string
              The value of this option is the name of the grammar we  are  processing.   The  default  value  is
              a_pe_grammar.

       -user string
              The  value  of this option is the name of the user for which the command is run. The default value
              is unknown.

       -mode bulk|incremental
              The value of this option controls which  methods  of  pt::peg::container  instances  are  used  to
              specify  the  grammar,  i.e.  preload it into the container. There are two legal values, as listed
              below. The default is bulk.

              bulk   In this mode the methods start, add, modes, and rules are used to specify the grammar in  a
                     bulk  manner,  i.e.  as a set of nonterminal symbols, and two dictionaries mapping from the
                     symbols to their semantic modes and parsing expressions.

                     This mode is the default.

              incremental
                     In this mode the methods start, add, mode,  and  rule  are  used  to  specify  the  grammar
                     piecemal, with each nonterminal having its own block of defining commands.

       -template string
              The  value  of  this  option  is  a  string  into  which  to  put the generated code and the other
              configuration settings. The various locations for user-data are expected to be specified with  the
              placeholders listed below. The default value is "@code@".

              @user@ To be replaced with the value of the option -user.

              @format@
                     To be replaced with the the constant CONTAINER.

              @file@ To be replaced with the value of the option -file.

              @name@ To be replaced with the value of the option -name.

              @mode@ To be replaced with the value of the option -mode.

              @code@ To be replaced with the generated code.

EXAMPLE

       In  this  section  we are working a complete example, starting with a PEG grammar and ending with running
       the parser generated from it over some input, following the outline shown in the figure below:

       IMAGE: flow

       Our grammar, assumed to the stored in the file "calculator.peg" is

              PEG calculator (Expression)
                  Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
                  Sign       <- '-' / '+'                                     ;
                  Number     <- Sign? Digit+                                  ;
                  Expression <- Term (AddOp Term)*                            ;
                  MulOp      <- '*' / '/'                                     ;
                  Term       <- Factor (MulOp Factor)*                        ;
                  AddOp      <- '+'/'-'                                       ;
                  Factor     <- '(' Expression ')' / Number                   ;
              END;

       From this we create a snit-based parser via

              pt generate snit calculator.tcl -class calculator -name calculator peg calculator.peg

       which leaves us with the parser package and class written to the file  "calculator.tcl".   Assuming  that
       this  package is then properly installed in a place where Tcl can find it we can now use this class via a
       script like

                  package require calculator

                  lassign $argv input
                  set channel [open $input r]

                  set parser [calculator]
                  set ast [$parser parse $channel]
                  $parser destroy
                  close $channel

                  ... now process the returned abstract syntax tree ...

       where the abstract syntax tree stored in the variable will look like

              set ast {Expression 0 4
                  {Factor 0 4
                      {Term 0 2
                          {Number 0 2
                              {Digit 0 0}
                              {Digit 1 1}
                              {Digit 2 2}
                          }
                      }
                      {AddOp 3 3}
                      {Term 4 4
                          {Number 4 4
                              {Digit 4 4}
                          }
                      }
                  }
              }

       assuming that the input file and channel contained the text

               120+5
       A more graphical representation of the tree would be

       .nf  +-  Digit  0  0  |  1  |             |  +-  Term  0  2  ---  Number  0  2  -+-  Digit  1  1  |  2  |
       |             |  |                            +- Digit 2 2 | 0 |                                        |
       Expression   0   4   ---   Factor   0   4   -+-----------------------------   AddOp   3   3   |    +    |
       | +- Term 4 4 --- Number 4 4 --- Digit 4 4 | 5 .fi

       Regardless,  at  this  point it is the user's responsibility to work with the tree to reach whatever goal
       she desires. I.e. analyze it, transform it, etc. The package pt::ast should be of  help  here,  providing
       commands to walk such ASTs structures in various ways.

       One  important  thing  to  note  is  that  the parsers used here return a data structure representing the
       structure of the input per the grammar underlying the parser. There are no callbacks during  the  parsing
       process, i.e. no parsing actions, as most other parsers will have.

       Going  back  to the last snippet of code, the execution of the parser for some input, note how the parser
       instance follows the specified Parser API.

INTERNALS

       This section is intended for users of the application which wish to  modify  or  extend  it.  Users  only
       interested in the generation of parsers can ignore it.

       The  main  functionality  of  the application is encapsulated in the package pt::pgen. Please read it for
       more information.

BUGS, IDEAS, FEEDBACK

       This document, and the package it describes, will undoubtedly contain bugs and  other  problems.   Please
       report  such  in  the  category pt of the Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].  Please
       also report any ideas for enhancements you may have for either package and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the output of diff -u.

       Note further that attachments are strongly preferred over inlined patches. Attachments  can  be  made  by
       going  to the Edit form of the ticket immediately after its creation, and then using the left-most button
       in the secondary navigation bar.

KEYWORDS

       EBNF,  LL(k),  PEG,  TDPL,  context-free  languages,  expression,  grammar,  matching,  parser,   parsing
       expression,  parsing  expression grammar, push down automaton, recursive descent, state, top-down parsing
       languages, transducer

CATEGORY

       Parsing and Grammars

COPYRIGHT

       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>

tcllib                                                  1                                               pt(3tcl)