Ubuntu Manpage: PPR - Pattern-based Perl Recognizer

NAME

       PPR - Pattern-based Perl Recognizer

VERSION

       This document describes PPR version 0.001010

SYNOPSIS

           use PPR;

           # Define a regex that will match an entire Perl document...
           my $perl_document = qr{

               # What to match            # Install the (?&PerlDocument) rule
               (?&PerlEntireDocument)     $PPR::GRAMMAR

           }x;

           # Define a regex that will match a single Perl block...
           my $perl_block = qr{

               # What to match...         # Install the (?&PerlBlock) rule...
               (?&PerlBlock)              $PPR::GRAMMAR
           }x;

           # Define a regex that will match a simple Perl extension...
           my $perl_coroutine = qr{

               # What to match...
               coro                                           (?&PerlOWS)
               (?<coro_name>  (?&PerlQualifiedIdentifier)  )  (?&PerlOWS)
               (?<coro_code>  (?&PerlBlock)                )

               # Install the necessary subrules...
               $PPR::GRAMMAR
           }x;

           # Define a regex that will match an integrated Perl extension...
           my $perl_with_classes = qr{

               # What to match...
               \A
                   (?&PerlOWS)       # Optional whitespace (including comments)
                   (?&PerlDocument)  # A full Perl document
                   (?&PerlOWS)       # More optional whitespace
               \Z

               # Add a 'class' keyword into the syntax that PPR understands...
               (?(DEFINE)
                   (?<PerlKeyword>

                           class                              (?&PerlOWS)
                           (?&PerlQualifiedIdentifier)        (?&PerlOWS)
                       (?: is (?&PerlNWS) (?&PerlIdentifier)  (?&PerlOWS) )*+
                           (?&PerlBlock)
                   )

                   (?<kw_balanced_parens>
                       \( (?: [^()]++ | (?&kw_balanced_parens) )*+ \)
                   )
               )

               # Install the necessary standard subrules...
               $PPR::GRAMMAR
           }x;

DESCRIPTION

       The PPR module provides a single regular expression that defines a set of independent subpatterns
       suitable for matching entire Perl documents, as well as a wide range of individual syntactic components
       of Perl (i.e. statements, expressions, control blocks, variables, etc.)

       The regex does not "parse" Perl (that is, it does not build a syntax tree, like the PPI module does).
       Instead it simply "recognizes" standard Perl constructs, or new syntaxes composed from Perl constructs.

       Its features and capabilities therefore complement those of the PPI module, rather than replacing them.
       See "Comparison with PPI".

INTERFACE

   Importing and using the Perl grammar regex
       The PPR module exports no subroutines or variables, and provides no methods. Instead, it defines a single
       package variable, $PPR::GRAMMAR, which can be interpolated into regexes to add rules that permit Perl
       constructs to be parsed:

           $source_code =~ m{ (?&PerlEntireDocument)  $PPR::GRAMMAR }x;

       Note that all the examples shown so far have interpolated this "grammar variable" at the end of the
       regular expression. This placement is desirable, but not necessary. Both of the following work
       identically:

           $source_code =~ m{ (?&PerlEntireDocument)   $PPR::GRAMMAR }x;

           $source_code =~ m{ $PPR::GRAMMAR   (?&PerlEntireDocument) }x;

       However, if the grammar is to be extended, then the extensions must be specified before the base grammar
       (i.e. before the interpolation of $PPR::GRAMMAR). Placing the grammar variable at the end of a regex
       ensures that will be the case, and has the added advantage of "front-loading" the regex with the most
       important information: what is actually going to be matched.

       Note too that, because the PPR grammar internally uses capture groups, placing $PPR::GRAMMAR anywhere
       other than the very end of your regex may change the numbering of any explicit capture groups in your
       regex.  For complete safety, regexes that use the PPR grammar should probably use named captures, instead
       of numbered captures.

   Error reporting
       Regex-based parsing is all-or-nothing: either your regex matches (and returns any captures you
       requested), or it fails to match (and returns nothing).

       This can make it difficult to detect why a PPR-based match failed; to work out what the "bad source code"
       was that prevented your regex from matching.

       So the module provides a special variable that attempts to detect the source code that prevented any call
       to the "(?&PerlStatement)" subpattern from matching. That variable is: $PPR::ERROR

       $PPR::ERROR is only set if it is undefined at the point where an error is detected, and will only be set
       to the first such error that is encountered during parsing.

       Note that errors are only detected when matching context-sensitive components (for example in the middle
       of a "(?&PerlStatement), as part of a "(?&PerlContextualRegex)", or at the end of a
       "(?&PerlEntireDocument")".  Errors, especially errors at the end of otherwise valid code, will often not
       be detected in context-free components (for example, at the end of a "(?&PerlStatementSequence), as part
       of a "(?&PerlRegex)", or at the end of a "(?&PerlDocument")".

       A common mistake in this area is to attempt to match an entire Perl document using:

           m{ \A (?&PerlDocument) \Z   $PPR::GRAMMAR }x

       instead of:

           m{ (?&PerlEntireDocument)   $PPR::GRAMMAR }x

       Only the second approach will be able to successfully detect an unclosed curly bracket at the end of the
       document.

       "PPR::ERROR" interface

       If it is set, $PPR::ERROR will contain an object of type PPR::ERROR, with the following methods:

       "$PPR::ERROR->origin($line, $file)"
           Returns a clone of the PPR::ERROR object that now believes that the source code parsing failure it is
           reporting occurred in a code fragment starting at the specified line and file. If the second argument
           is omitted, the file name is not reported in any diagnostic.

       "$PPR::ERROR->source()"
           Returns a string containing the specific source code that could not be parsed as a Perl statement.

       "$PPR::ERROR->prefix()"
           Returns a string containing all the source code preceding the code that could not be parsed. That is:
           the valid code that is the preceding context of the unparsable code.

       "$PPR::ERROR->line( $opt_offset )"
           Returns  an  integer  which  is  the line number at which the unparsable code was encountered. If the
           optional "offset" argument is provided, it will be added to the line number returned. Note  that  the
           offset  is  ignored  if  the  PPR::ERROR object originates from a prior call to "$PPR::ERROR->origin"
           (because in that case you will have already specified the correct offset).

       "$PPR::ERROR->diagnostic()"
           Returns a string containing the diagnostic that would be returned by "perl -c"  if  the  source  code
           were compiled.

           Warning:  The  diagnostic is obtained by partially eval'ing the source code. This means that run-time
           code will not be executed, but "BEGIN" and "CHECK" blocks will run. Do not call this  method  if  the
           source code that created this error might also have non-trivial compile-time side-effects.

       A typical use might therefore be:

           # Make sure it's undefined, and will only be locally modified...
           local $PPR::ERROR;

           # Process the matched block...
           if ($source_code =~ m{ (?<Block> (?&PerlBlock) )  $PPR::GRAMMAR }x) {
               process( $+{Block} );
           }

           # Or report the offending code that stopped it being a valid block...
           else {
               die "Invalid Perl block: " . $PPR::ERROR->source . "\n",
                   $PPR::ERROR->origin($linenum, $filename)->diagnostic . "\n";
           }

   Decommenting code with PPR::decomment()
       The  module  provides  (but does not export) a decomment() subroutine that can remove any comments and/or
       POD from source code.

       It takes a single argument: a string containing the course code.  It returns a  single  value:  a  string
       containing the decommented source code.

       For example:

           $decommented_code = PPR::decomment( $commented_code );

       The  subroutine  will  fail  if the argument wasn't valid Perl code, in which case it returns "undef" and
       sets $PPR::ERROR to indicate where the invalid source code was encountered.

       Note that, due to separate bugs in the regex engine in Perl 5.14 and 5.20, the decomment() subroutine  is
       not available when running under these releases.

   Examples
       Note: In each of the following examples, the subroutine slurp() is used to acquire the source code from a
       file whose name is passed as its argument. The slurp() subroutine is just:

           sub slurp { local (*ARGV, $/); @ARGV = shift; readline; }

       or, for the less twisty-minded:

           sub slurp {
               my ($filename) = @_;
               open my $filehandle, '<', $filename or die $!;
               local $/;
               return readline($filehandle);
           }

       Validating source code

         # "Valid" if source code matches a Perl document under the Perl grammar
         printf(
             "$filename %s a valid Perl file\n",
             slurp($filename) =~ m{ (?&PerlEntireDocument)  $PPR::GRAMMAR }x
                 ? "is"
                 : "is not"
         );

       Counting statements

         printf(                                        # Output
             "$filename contains %d statements\n",      # a report of
             scalar                                     # the count of
                 grep {defined}                         # defined matches
                     slurp($filename)                   # from the source code,
                         =~ m{
                               \G (?&PerlOWS)           # skipping whitespace
                                  ((?&PerlStatement))   # and keeping statements,
                               $PPR::GRAMMAR            # using the Perl grammar
                             }gcx;                      # incrementally
         );

       Stripping comments and POD from source code

         my $source = slurp($filename);                    # Get the source
         $source =~ s{ (?&PerlNWS)  $PPR::GRAMMAR }{ }gx;  # Compact whitespace
         print $source;                                    # Print the result

       Stripping comments and POD from source code (in Perl v5.14 or later)

         # Print  the source code,  having compacted whitespace...
           print  slurp($filename)  =~ s{ (?&PerlNWS)  $PPR::GRAMMAR }{ }gxr;

       Stripping everything "except" comments and POD from source code

         say                                         # Output
             grep {defined}                          # defined matches
                 slurp($filename)                    # from the source code,
                     =~ m{ \G ((?&PerlOWS))          # keeping whitespace,
                              (?&PerlStatement)?     # skipping statements,
                           $PPR::GRAMMAR             # using the Perl grammar
                         }gcx;                       # incrementally

   Available rules
       Interpolating $PPR::GRAMMAR in a regex makes all of the following rules available within that regex.

       Note  that  other rules not listed here may also be added, but these are all considered strictly internal
       to the PPR module and are not guaranteed to continue to exist in future releases. All such "internal-use-
       only" rules have names that start with "PPR_"...

       "(?&PerlDocument)"

       Matches a valid Perl document,  including  leading  or  trailing  whitespace,  comments,  and  any  final
       "__DATA__" or "__END__" section.

       This  rule  is  context-free, so it can be embedded in a larger regex.  For example, to match an embedded
       chunk of Perl code, delimited by "<<<"...">>>":

           $src = m{ <<< (?&PerlDocument) >>>   $PPR::GRAMMAR }x;

       "(?&PerlEntireDocument)"

       Matches an entire valid Perl document, including leading or trailing whitespace, comments, and any  final
       "__DATA__" or "__END__" section.

       This  rule  is not context-free. It has an internal "\A" at the beginning and "\Z" at the end, so a regex
       containing "(?&PerlEntireDocument)" will only match if:

       (a) the "(?&PerlEntireDocument)" is the sole top-level element of  the  regex  (or,  at  least  the  sole
           element of a single top-level "|"-branch of the regex),

       and
       (b) the entire string being matched contains only a single valid Perl document.

       In general, if you want to check that a string consists entirely of a single valid sequence of Perl code,
       use:

           $str =~ m{ (?&PerlEntireDocument)  $PPR::GRAMMAR }

       If  you  want  to  check  that  a string contains at least one valid sequence of Perl code at some point,
       possibly embedded in other text, use:

           $str =~ m{ (?&PerlDocument)  $PPR::GRAMMAR }

       "(?&PerlStatementSequence)"

       Matches zero-or-more valid Perl statements, separated by optional POD sequences.

       "(?&PerlStatement)"

       Matches a single valid Perl statement, including:  control  structures;  "BEGIN",  "CHECK",  "UNITCHECK",
       "INIT", "END", "DESTROY", or "AUTOLOAD" blocks; variable declarations, "use" statements, etc.

       "(?&PerlExpression)"

       Matches  a  single valid Perl expression involving operators of any precedence, but not any kind of block
       (i.e. not control structures, "BEGIN" blocks, etc.) nor any trailing  statement  modifier  (e.g.   not  a
       postfix "if", "while", or "for").

       "(?&PerlLowPrecedenceNotExpression)"

       Matches  an  expression at the precedence of the "not" operator.  That is, a single valid Perl expression
       that involves operators above the precedence of "and".

       "(?&PerlAssignment)"

       Matches an assignment expression.  That is, a single valid Perl expression involving operators above  the
       precedence of comma ("," or "=>").

       "(?&PerlConditionalExpression)" or "(?&PerlScalarExpression)"

       Matches  a conditional expression that uses the "?"...":" ternary operator.  That is, a single valid Perl
       expression involving operators above the precedence of assignment.

       The alterative name comes from the fact that anything matching this rule is what most people think of  as
       a single element of a comma-separated list.

       "(?&PerlBinaryExpression)"

       Matches  an  expression  that  uses  any  high-precedence binary operators.  That is, a single valid Perl
       expression involving operators above the precedence of the ternary operator.

       "(?&PerlPrefixPostfixTerm)"

       Matches a term with optional prefix and/or postfix unary operators and/or a  trailing  sequence  of  "->"
       dereferences.   That  is,  a  single  valid  Perl  expression involving operators above the precedence of
       exponentiation ("**").

       "(?&PerlTerm)"

       Matches a simple high-precedence term within a  Perl  expression.   That  is:  a  subroutine  or  builtin
       function  call;  a  variable  declaration;  a  variable  or typeglob lookup; an anonymous array, hash, or
       subroutine constructor; a quotelike or numeric literal; a regex match; a substitution; a transliteration;
       a "do" or "eval" block; or any other expression in surrounding parentheses.

       "(?&PerlTermPostfixDereference)"

       Matches a sequence of array- or hash-lookup brackets,  or  subroutine  call  parentheses,  or  a  postfix
       dereferencer  (e.g.  "->$*"),  with  explicit  or implicit intervening "->", such as might appear after a
       term.

       "(?&PerlLvalue)"

       Matches any variable or parenthesized list of variables that could be assigned to.

       "(?&PerlPackageDeclaration)"

       Matches the declaration of any package (with or without a defining block).

       "(?&PerlSubroutineDeclaration)"

       Matches the declaration of any named subroutine (with or without a defining block).

       "(?&PerlUseStatement)"

       Matches a "use <module name> ...;" or "use <version number>;" statement.

       "(?&PerlReturnStatement)"

       Matches a "return <expression>;" or "return;" statement.

       "(?&PerlReturnExpression)"

       Matches a "return <expression>" as an expression without trailing end-of-statement markers.

       "(?&PerlControlBlock)"

       Matches an "if", "unless", "while", "until", "for", or "foreach" statement, including its block.

       "(?&PerlDoBlock)"

       Matches a "do"-block expression.

       "(?&PerlEvalBlock)"

       Matches a "eval"-block expression.

       "(?&PerlTryCatchFinallyBlock)"

       Matches an "try" block, followed by an option "catch" block, followed by  an  optional  "finally"  block,
       using the built-in syntax introduced in Perl v5.34 and v5.36.

       Note  that  if  your  code  uses  one  of  the many CPAN modules (such as "Try::Tiny" or "TryCatch") that
       provided try/catch behaviours prior to Perl v5.34, then you  will  most  likely  need  to  override  this
       subrule to match the alternate "try"/"catch" syntax provided by your preferred module.

       For  example,  if  your  code  uses  the  "TryCatch"  module,  you  would need to alter the PPR parser by
       explicitly redefining the subrule for "try" blocks, with something like:

           my $MATCH_A_PERL_DOCUMENT = qr{

               \A (?&PerlEntireDocument) \Z

               (?(DEFINE)
                   # Redefine this subrule to match TryCatch syntax...
                   (?<PerlTryCatchFinallyBlock>
                           try                                  (?>(?&PerlOWS))
                           (?>(?&PerlBlock))
                       (?:                                      (?>(?&PerlOWS))
                           catch                                (?>(?&PerlOWS))
                       (?: \( (?>(?&PPR_balanced_parens)) \)    (?>(?&PerlOWS))  )?+
                           (?>(?&PerlBlock))
                       )*+
                   )
               )

               $PPR::GRAMMAR
           }xms;

       Note that the popular "Try::Tiny" module actually implements "try"/"catch"  as  a  normally  parsed  Perl
       subroutine  call  expression,  rather  than  a statement.  This means that the unmodified PPR grammar can
       successfully parse all the module's constructs.

       However, the unmodified PPR grammar may misclassify some "Try::Tiny" usages as being built-in Perl  v5.36
       "try"  blocks  followed by an unrelated call to the "catch" subroutine, rather than identifying the "try"
       and "catch" as a single expression containing two subroutine calls.

       If that difference in interpretation  matters  to  you,  you  can  deactivate  the  built-in  Perl  v5.36
       "try"/"catch" syntax entirely, like so:

           my $MATCH_A_PERL_DOCUMENT = qr{
               \A (?&PerlEntireDocument) \Z

               (?(DEFINE)
                   # Turn off built-in try/catch syntax...
                   (?<PerlTryCatchFinallyBlock>   (?!)  )

                   # Decanonize 'try' and 'catch' as reserved words ineligible for sub names...
                   (?<PPR_X_non_reserved_identifier>
                       (?! (?> for(?:each)?+ | while   | if    | unless | until | given | when   | default
                           |   sub | format  | use     | no    | my     | our   | state  | defer | finally
                           # Note: Removed 'try' and 'catch' which appear here in the original subrule
                           |   (?&PPR_X_named_op)
                           |   [msy] | q[wrxq]?+ | tr
                           |   __ (?> END | DATA ) __
                           )
                           \b
                       )
                       (?>(?&PerlQualifiedIdentifier))
                       (?! :: )
                   )

               )

               $PPR::GRAMMAR
           }xms;

       For  more  details  and options for modifying PPR grammars in this way, see also the documentation of the
       "PPR::X" module.

       "(?&PerlStatementModifier)"

       Matches an "if", "unless", "while", "until", "for", or "foreach"  modifier  that  could  appear  after  a
       statement. Only matches the modifier, not the preceding statement.

       "(?&PerlFormat)"

       Matches a "format" declaration, including its terminating "dot".

       "(?&PerlBlock)"

       Matches a "{"..."}"-delimited block containing zero-or-more statements.

       "(?&PerlCall)"

       Matches  a  call  to  a  subroutine  or built-in function.  Accepts all valid call syntaxes, either via a
       literal names or a reference, with or without a leading "&", with or without arguments, with  or  without
       parentheses on any argument list.

       "(?&PerlAttributes)"

       Matches  a  list  of  colon-preceded  attributes,  such  as  might  be  specified on the declaration of a
       subroutine or a variable.

       "(?&PerlCommaList)"

       Matches a list of zero-or-more comma-separated subexpressions.  That is, a single valid  Perl  expression
       that involves operators above the precedence of "not".

       "(?&PerlParenthesesList)"

       Matches a list of zero-or-more comma-separated subexpressions inside a set of parentheses.

       "(?&PerlList)"

       Matches  either  a  parenthesized  or  unparenthesized  list  of comma-separated subexpressions. That is,
       matches anything that either of the two preceding rules would match.

       "(?&PerlAnonymousArray)"

       Matches an anonymous array constructor.  That is: a list of  zero-or-more  subexpressions  inside  square
       brackets.

       "(?&PerlAnonymousHash)"

       Matches  an  anonymous  hash  constructor.   That  is: a list of zero-or-more subexpressions inside curly
       brackets.

       "(?&PerlArrayIndexer)"

       Matches a valid indexer that could be applied to look up elements of a array.  That is: a list of or one-
       or-more subexpressions inside square brackets.

       "(?&PerlHashIndexer)"

       Matches a valid indexer that could be applied to look up entries of a hash.  That is: a list of  or  one-
       or-more subexpressions inside curly brackets, or a simple bareword indentifier inside curley brackets.

       "(?&PerlDiamondOperator)"

       Matches  anything  in angle brackets.  That is: any "diamond" readline (e.g. "<$filehandle>" or file-grep
       operation (e.g. "<*.pl>").

       "(?&PerlComma)"

       Matches a short (",") or long ("=>") comma.

       "(?&PerlPrefixUnaryOperator)"

       Matches any high-precedence prefix unary operator.

       "(?&PerlPostfixUnaryOperator)"

       Matches any high-precedence postfix unary operator.

       "(?&PerlInfixBinaryOperator)"

       Matches any infix binary operator whose precedence is between ".." and "**".

       "(?&PerlAssignmentOperator)"

       Matches any assignment operator, including all op"=" variants.

       "(?&PerlLowPrecedenceInfixOperator)"

       Matches "and", <or>, or "xor".

       "(?&PerlAnonymousSubroutine)"

       Matches an anonymous subroutine.

       "(?&PerlVariable)"

       Matches any type of access on any scalar, array, or hash variable.

       "(?&PerlVariableScalar)"

       Matches any scalar variable, including fully qualified package variables, punctuation  variables,  scalar
       dereferences, and the $#array syntax.

       "(?&PerlVariableArray)"

       Matches any array variable, including fully qualified package variables, punctuation variables, and array
       dereferences.

       "(?&PerlVariableHash)"

       Matches  any  hash variable, including fully qualified package variables, punctuation variables, and hash
       dereferences.

       "(?&PerlTypeglob)"

       Matches a typeglob.

       "(?&PerlScalarAccess)"

       Matches any kind of variable access beginning with a "$", including fully  qualified  package  variables,
       punctuation variables, scalar dereferences, the $#array syntax, and single-value array or hash look-ups.

       "(?&PerlScalarAccessNoSpace)"

       Matches  any  kind  of variable access beginning with a "$", including fully qualified package variables,
       punctuation variables, scalar dereferences, the $#array syntax, and single-value array or hash  look-ups.
       But does not allow spaces between the components of the variable access (i.e. imposes the same constraint
       as within an interpolating quotelike).

       "(?&PerlScalarAccessNoSpaceNoArrow)"

       Matches  any  kind  of variable access beginning with a "$", including fully qualified package variables,
       punctuation variables, scalar dereferences, the $#array syntax, and single-value array or hash  look-ups.
       But  does not allow spaces or arrows between the components of the variable access (i.e. imposes the same
       constraint as within a "<...>"-delimited interpolating quotelike).

       "(?&PerlArrayAccess)"

       Matches any kind of variable access beginning with a "@", including arrays, array dereferences, and  list
       slices of arrays or hashes.

       "(?&PerlArrayAccessNoSpace)"

       Matches  any kind of variable access beginning with a "@", including arrays, array dereferences, and list
       slices of arrays or hashes.  But does not allow spaces between the  components  of  the  variable  access
       (i.e. imposes the same constraint as within an interpolating quotelike).

       "(?&PerlArrayAccessNoSpaceNoArrow)"

       Matches  any kind of variable access beginning with a "@", including arrays, array dereferences, and list
       slices of arrays or hashes.  But does not allow spaces or arrows between the components of  the  variable
       access (i.e. imposes the same constraint as within a "<...>"-delimited interpolating quotelike).

       "(?&PerlHashAccess)"

       Matches  any  kind  of variable access beginning with a "%", including hashes, hash dereferences, and kv-
       slices of hashes or arrays.

       "(?&PerlLabel)"

       Matches a colon-terminated label.

       "(?&PerlLiteral)"

       Matches a literal value.  That is: a number, a "qr" or "qw" quotelike, a string, or a bareword.

       "(?&PerlString)"

       Matches a string literal.  That is: a single- or double-quoted string, a "q" or "qq" string,  a  heredoc,
       or a version string.

       "(?&PerlQuotelike)"

       Matches  any  form  of  quotelike  operator.   That  is: a single- or double-quoted string, a "q" or "qq"
       string, a heredoc, a version string, a "qr", a "qw", a "qx", a "/.../" or "m/.../" regex, a substitution,
       or a transliteration.

       "(?&PerlHeredoc)"

       Matches a heredoc specifier.  That is:  just  the  initial  "<<TERMINATOR>"  component,  not  the  actual
       contents of the heredoc on the subsequent lines.

       This  rule  only  matches a heredoc specifier if that specifier is correctly followed on the next line by
       any heredoc contents and then the correct terminator.

       However, if the heredoc specifier is correctly matched, subsequent calls to  either  of  the  whitespace-
       matching  rules  ("(?&PerlOWS)" or "(?&PerlNWS)") will also consume the trailing heredoc contents and the
       terminator.

       So, for example, to correctly match a heredoc plus its contents you could use something like:

           m/ (?&PerlHeredoc) (?&PerlOWS)  $PPR::GRAMMAR /x

       or, if there may be trailing items on the same line as the heredoc specifier:

           m/ (?&PerlHeredoc)
              (?<trailing_items> [^\n]* )
              (?&PerlOWS)

              $PPR::GRAMMAR
           /x

       Note that the saeme limitations apply to other constructs that match heredocs, such a "(?&PerlQuotelike)"
       or "(?&PerlString)".

       "(?&PerlQuotelikeQ)"

       Matches a single-quoted string, either a '...' or a "q/.../" (with any valid delimiters).

       "(?&PerlQuotelikeQQ)"

       Matches a double-quoted string, either a "..."  or a "qq/.../" (with any valid delimiters).

       "(?&PerlQuotelikeQW)"

       Matches a "quotewords" list.  That is a "qw/ list of words /" (with any valid delimiters).

       "(?&PerlQuotelikeQX)"

       Matches a "qx" system call, either a `...` or a "qx/.../" (with any valid delimiters)

       "(?&PerlQuotelikeS)" or "(?&PerlSubstitution)"

       Matches a substitution operation.  That is:  "s/.../.../"  (with  any  valid  delimiters  and  any  valid
       trailing modifiers).

       "(?&PerlQuotelikeTR)" or "(?&PerlTransliteration)"

       Matches  a  transliteration operation.  That is: "tr/.../.../" or "y/.../.../" (with any valid delimiters
       and any valid trailing modifiers).

       "(?&PerlContextualQuotelikeM)" or "(?&PerContextuallMatch)"

       Matches a regex-match operation in any context where it would be allowed in valid Perl.  That is: "/.../"
       or "m/.../" (with any valid delimiters and any valid trailing modifiers).

       "(?&PerlQuotelikeM)" or "(?&PerlMatch)"

       Matches a regex-match operation.  That is: "/.../" or "m/.../" (with any valid delimiters and  any  valid
       trailing  modifiers)  in any context (i.e. even in places where it would not normally be allowed within a
       valid piece of Perl code).

       "(?&PerlQuotelikeQR)"

       Matches a "qr" regex constructor (with any valid delimiters and any valid trailing modifiers).

       "(?&PerlContextualRegex)"

       Matches a "qr" regex constructor  or  a  "/.../"  or  "m/.../"  regex-match  operation  (with  any  valid
       delimiters and any valid trailing modifiers) anywhere where either would be allowed in valid Perl.

       In other words: anything capable of matching within valid Perl code.

       "(?&PerlRegex)"

       Matches a "qr" regex constructor or a "/.../" or "m/.../" regex-match operation in any context (i.e. even
       in places where it would not normally be allowed within a valid piece of Perl code).

       In other words: anything capable of matching.

       "(?&PerlBuiltinFunction)"

       Matches the name of any builtin function.

       To match an actual call to a built-in function, use:

           m/
               (?= (?&PerlBuiltinFunction) )
               (?&PerlCall)
           /x

       "(?&PerlNullaryBuiltinFunction)"

       Matches the name of any builtin function that never takes arguments.

       To match an actual call to a built-in function that never takes arguments, use:

           m/
               (?= (?&PerlNullaryBuiltinFunction) )
               (?&PerlCall)
           /x

       "(?&PerlVersionNumber)"

       Matches  any  number  or  version-string  that  can  be used as a version number within a "use", "no", or
       "package" statement.

       "(?&PerlVString)"

       Matches a version-string (a.k.a v-string).

       "(?&PerlNumber)"

       Matches a valid number, including binary, octal, decimal and  hexadecimal  integers,  and  floating-point
       numbers with or without an exponent.

       "(?&PerlIdentifier)"

       Matches a simple, unqualified identifier.

       "(?&PerlQualifiedIdentifier)"

       Matches  a  qualified or unqualified identifier, which may use either "::" or "'" as internal separators,
       but only "::" as initial or terminal separators.

       "(?&PerlOldQualifiedIdentifier)"

       Matches a qualified or unqualified identifier, which may use either "::" or  "'"  as  both  internal  and
       external separators.

       "(?&PerlBareword)"

       Matches a valid bareword.

       Note that this is not the same as an simple identifier, nor the same as a qualified identifier.

       "(?&PerlPod)"

       Matches  a  single POD section containing any contiguous set of POD directives, up to the first "=cut" or
       end-of-file.

       "(?&PerlPodSequence)"

       Matches any sequence of POD sections, separated and /or surrounded by optional whitespace.

       "(?&PerlNWS)"

       Match one-or-more characters of necessary whitespace, including spaces,  tabs,  newlines,  comments,  and
       POD.

       "(?&PerlOWS)"

       Match  zero-or-more  characters  of  optional whitespace, including spaces, tabs, newlines, comments, and
       POD.

       "(?&PerlOWSOrEND)"

       Match zero-or-more characters of optional whitespace, including spaces, tabs,  newlines,  comments,  POD,
       and any trailing "__END__" or "__DATA__" section.

       "(?&PerlEndOfLine)"

       Matches a single newline ("\n") character.

       This is provided mainly to allow newlines to be "hooked" by redefining "(?<PerlEndOfLine>)" (for example,
       to count lines during a parse).

       "(?&PerlKeyword)"

       Match a pluggable keyword.

       Note  that  there are no pluggable keywords in the default PPR regex; they must be added by the end-user.
       See the following section for details.

   Extending the Perl syntax with keywords
       In Perl 5.12 and later, it's possible to add new types of statements to the language  using  a  mechanism
       called "pluggable keywords".

       This mechanism (best accessed via CPAN modules such as "Keyword::Simple" or "Keyword::Declare") acts like
       a  limited  macro  facility. It detects when a statement begins with a particular, pre-specified keyword,
       passes the trailing text to an associated keyword handler, and replaces the  trailing  source  code  with
       whatever the keyword handler produces.

       For  example, the Dios module uses this mechanism to add keywords such as "class", "method", and "has" to
       Perl 5, providing a declarative OO syntax. And the Object::Result module uses pluggable keywords to add a
       "result" statement that simplifies returning an ad hoc object from a subroutine.

       Unfortunately, because such modules effectively extend the standard Perl syntax, by default  PPR  has  no
       way of successfully parsing them.

       However,  when  setting up a regex using $PPR::GRAMMAR it is possible to extend that grammar to deal with
       new keywords...by defining a rule named "(?<PerlKeyword>...)".

       This rule is always tested as the first option within  the  standard  "(?&PerlStatement)"  rule,  so  any
       syntax declared within effectively becomes a new kind of statement. Note that each alternative within the
       rule must begin with a valid "keyword" (that is: a simple identifier of some kind).

       For example, to support the three keywords from Dios:

           $Dios::GRAMMAR = qr{

               # Add a keyword rule to support Dios...
               (?(DEFINE)
                   (?<PerlKeyword>

                           class                              (?&PerlOWS)
                           (?&PerlQualifiedIdentifier)        (?&PerlOWS)
                       (?: is (?&PerlNWS) (?&PerlIdentifier)  (?&PerlOWS) )*+
                           (?&PerlBlock)
                   |
                           method                             (?&PerlOWS)
                           (?&PerlIdentifier)                 (?&PerlOWS)
                       (?: (?&kw_balanced_parens)             (?&PerlOWS) )?+
                       (?: (?&PerlAttributes)                 (?&PerlOWS) )?+
                           (?&PerlBlock)
                   |
                           has                                (?&PerlOWS)
                       (?: (?&PerlQualifiedIdentifier)        (?&PerlOWS) )?+
                           [\@\$%][.!]?(?&PerlIdentifier)     (?&PerlOWS)
                       (?: (?&PerlAttributes)                 (?&PerlOWS) )?+
                       (?: (?: // )?+ =                       (?&PerlOWS)
                           (?&PerlExpression)                 (?&PerlOWS) )?+
                       (?> ; | (?= \} ) | \z )
                   )

                   (?<kw_balanced_parens>
                       \( (?: [^()]++ | (?&kw_balanced_parens) )*+ \)
                   )
               )

               # Add all the standard PPR rules...
               $PPR::GRAMMAR
           }x;

           # Then parse with it...

           $source_code =~ m{ \A (?&PerlDocument) \Z  $Dios::GRAMMAR }x;

       Or, to support the "result" statement from "Object::Result":

           my $ORK_GRAMMAR = qr{

               # Add a keyword rule to support Object::Result...
               (?(DEFINE)
                   (?<PerlKeyword>
                       result                        (?&PerlOWS)
                       \{                            (?&PerlOWS)
                       (?: (?> (?&PerlIdentifier)
                           |   < [[:upper:]]++ >
                           )                         (?&PerlOWS)
                           (?&PerlParenthesesList)?+      (?&PerlOWS)
                           (?&PerlBlock)             (?&PerlOWS)
                       )*+
                       \}
                   )
               )

               # Add all the standard PPR rules...
               $PPR::GRAMMAR
           }x;

           # Then parse with it...

           $source_code =~ m{ \A (?&PerlDocument) \Z  $ORK_GRAMMAR }x;

       Note  that,  although pluggable keywords are only available from Perl 5.12 onwards, PPR will still accept
       "(&?PerlKeyword)" extensions under Perl 5.10.

   Extending the Perl syntax in other ways
       Other modules (such as "Devel::Declare" and "Filter::Simple") make it possible to extend Perl  syntax  in
       even  more  flexible ways.  The PPR::X module provides support for syntactic extensions more general than
       pluggable keywords.

   Comparison with PPI
       The PPI and PPR modules can both identify valid Perl code, but they do so in very different ways, and are
       optimal for different purposes.

       PPI scans an entire Perl document and builds a hierarchical representation of the various components.  It
       is  therefore  suitable  for  recognition, validation, partial extraction, and in-place transformation of
       Perl code.

       PPR matches only as much of a Perl document as specified by the regex you create, and does not build  any
       hierarchical  representation  of  the  various  components  it  matches.  It  is  therefore  suitable for
       recognition and validation of Perl code. However, unless great care is taken, PPR is not as  reliable  as
       PPI for extractions or transformations of components smaller than a single statement.

       On the other hand, PPI always has to parse its entire input, and build a complete non-trivial nested data
       structure  for  it,  before it can be used to recognize or validate any component. So it is almost always
       significantly slower and more complicated than PPR for those kinds of tasks.

       For example, to determine whether an input string begins with a valid Perl block, PPI requires  something
       like:

           if (my $document = PPI::Document->new(\$input_string) ) {
               my $block = $document->schild(0)->schild(0);
               if ($block->isa('PPI::Structure::Block')) {
                   $block->remove;
                   process_block($block);
                   process_extra($document);
               }
           }

       whereas PPR needs just:

           if ($input_string =~ m{ \A (?&PerlOWS) ((?&PerlBlock)) (.*) }xs) {
               process_block($1);
               process_extra($2);
           }

       Moreover,  the  PPR version will be at least twice as fast at recognizing that leading block (and usually
       four to seven times faster)...mainly because it doesn't have to parse the trailing code at all, nor build
       any representation of its hierarchical structure.

       As a simple rule of thumb, when you only need to quickly detect, identify, or confirm valid Perl (or just
       a single valid Perl component), use PPR.  When you need to examine, traverse, or manipulate the  internal
       structure or component relationships within an entire Perl document, use PPI.

DIAGNOSTICS

"Warning: This program is running under Perl 5.20..."
Due to an unsolved issue with that particular release of Perl, the single regex in the PPR module
takes a ridiculously long time to compile under Perl 5.20 (i.e. minutes, not milliseconds).

The code will work correctly when it eventually does compile, but the start-up delay is so extreme
that the module issues this warning, to reassure users the something is actually happening, and
explain why it's happening so slowly.

The only remedy at present is to use an older or newer version of Perl.

For all the gory details, see: <https://rt.perl.org/Public/Bug/Display.html?id=122283>
<https://rt.perl.org/Public/Bug/Display.html?id=122890>

"PPR::decomment() does not work under Perl 5.14"
There is a separate bug in the Perl 5.14 regex engine that prevents the decomment() subroutine from
correctly detecting the location of comments.

The subroutine throws an exception if you attempt to call it when running under Perl 5.14
specifically.

The module has no other diagnostics, apart from those Perl provides for all regular expressions.

The commonest error is to forget to add $PPR::GRAMMAR to a regex, in which case you will get a standard
Perl error message such as:

Reference to nonexistent named group in regex;
marked by <-- HERE in m/

(?&PerlDocument <-- HERE )

/ at example.pl line 42.

Adding $PPR::GRAMMAR at the end of the regex solves the problem.

CONFIGURATION AND ENVIRONMENT

       PPR requires no configuration files or environment variables.

DEPENDENCIES

       Requires Perl 5.10 or later.

INCOMPATIBILITIES

       None reported.

LIMITATIONS

This module works under all versions of Perl from 5.10 onwards.

However, the lastest release of Perl 5.20 seems to have significant difficulties compiling large regular
expressions, and typically requires over a minute to build any regex that incorporates the $PPR::GRAMMAR
rule definitions.

The problem does not occur in Perl 5.10 to 5.18, nor in Perl 5.22 or later, though the parser is still
measurably slower in all Perl versions greater than 5.20 (presumably because most regexes are measurably
slower in more modern versions of Perl; such is the price of full re-entrancy and safe lexical scoping).

The decomment() subroutine trips a separate regex engine bug in Perl 5.14 only and will not run under
that version.

There was a lingering bug in regex re-interpolation between Perl 5.18 and 5.28, which means that
interpolating a PPR grammar (or any other precompiled regex that uses the "(??{...})" construct) into
another regex sometimes does not work. In these cases, the spurious error message generated is usually:
Sequence (?_...) not recognized. This problem is unlikely ever to be resolved, as those versions of Perl
are no longer being maintained. The only known workaround is to upgrade to Perl 5.30 or later.

There are also constructs in Perl 5 which cannot be parsed without actually executing some code...which
the regex does not attempt to do, for obvious reasons.

BUGS

       No bugs have been reported.

       Please report any bugs or feature requests to "bug-ppr@rt.cpan.org", or  through  the  web  interface  at
       <http://rt.cpan.org>.

AUTHOR

       Damian Conway  "<DCONWAY@CPAN.org>"

LICENCE AND COPYRIGHT

       Copyright (c) 2017, Damian Conway "<DCONWAY@CPAN.org>". All rights reserved.

       This  module  is  free  software;  you  can redistribute it and/or modify it under the same terms as Perl
       itself. See perlartistic.

DISCLAIMER OF WARRANTY

       BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE,  TO  THE  EXTENT
       PERMITTED  BY  APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER
       PARTIES PROVIDE THE SOFTWARE "AS  IS"  WITHOUT  WARRANTY  OF  ANY  KIND,  EITHER  EXPRESSED  OR  IMPLIED,
       INCLUDING,  BUT  NOT  LIMITED  TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
       PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF  THE  SOFTWARE  IS  WITH  YOU.  SHOULD  THE
       SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

       IN  NO  EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY
       OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE  LIABLE
       TO  YOU  FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
       THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT  LIMITED  TO  LOSS  OF  DATA  OR  DATA  BEING
       RENDERED  INACCURATE  OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE
       WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF  SUCH
       DAMAGES.

perl v5.40.0                                       2024-10-11                                           PPR(3pm)