Ubuntu Manpage: Tree::XPathEngine - a re-usable XPath engine

Provided by: libtree-xpathengine-perl_0.05-3_all

NAME

       Tree::XPathEngine - a re-usable XPath engine

DESCRIPTION

       This module provides an XPath engine, that can be re-used by other module/classes that implement trees.

       It is designed to be compatible with Class::XPath, ie it passes its tests if you replace Class::XPath by
       Tree::XPathEngine.

       This code is a more or less direct copy of the XML::XPath module by Matt Sergeant. I only removed the XML
       processing part (that parses an XML document and load it as a tree in memory) to remove the dependency on
       XML::Parser, applied a couple of patches, removed a whole bunch of XML specific things (comment,
       processing inistructions, namespaces...), renamed a whole lot of methods to make Pod::Coverage happy, and
       changed the docs.

       The article eXtending XML XPath, http://www.xmltwig.com/article/extending_xml_xpath/ should give authors
       who want to use this module enough background to do so.

       Otherwise, my email is below ;--)

       WARNING: while the underlying code is rather solid, this module most likely lacks docs.

       As they say, "patches welcome"... but I am also interested in any experience using this module, what were
       the tricky parts, and how could the code or the docs be improved.

SYNOPSIS

           use Tree::XPathEngine;

           my $tree= my_tree->new( ...);
           my $xp = Tree::XPathEngine->new();

           my @nodeset = $xp->find('/root/kid/grankid[1]'); # find all first grankids

           package tree;

           # needs to provide these methods
           sub xpath_get_name              { ... }
           sub xpath_get_next_sibling      { ... }
           sub xpath_get_previous_sibling  { ... }
           sub xpath_get_root_node         { ... }
           sub xpath_get_parent_node       { ... }
           sub xpath_get_child_nodes       { ... }
           sub xpath_is_element_node       { return 1; }
           sub xpath_cmp                   { ... }
           sub xpath_get_attributes        { ... } # only if attributes are used
           sub xpath_to_literal            { ... } # only if you want to use findnodes_as_string or findvalue

DETAILS

API

       The API of Tree::XPathEngine itself is extremely simple to allow you to get going almost immediately. The
       deeper API's are more complex, but you shouldn't have to touch most of that.

   new %options
       options

       xpath_name_re
           a  regular  expression  used  to  match  names  (node  names  or  attribute  names)  by default it is
           qr/[A-Za-z_][\w.-]*/ in order to work under perl 5.6.n, but you might  want  to  use  something  like
           qr/\p{L}[\w.-]*/ in 5.8.n, to accommodate letter outside of the ascii range.

   findnodes ($path, $context)
       Returns  a  list  of  nodes  found  by  $path,  in  context  $context.   In  scalar  context  returns  an
       "Tree::XPathEngine::NodeSet" object.

   findnodes_as_string ($path, $context)
       Returns the text values of the nodes

   findvalue ($path, $context)
       Returns    either    a    "Tree::XPathEngine::Literal",    a    "Tree::XPathEngine::Boolean"     or     a
       "Tree::XPathEngine::Number"  object.  If the path returns a NodeSet, $nodeset->xpath_to_literal is called
       automatically for you (and thus a "Tree::XPathEngine::Literal" is returned). Note that for  each  of  the
       objects  stringification  is  overloaded,  so you can just print the value found, or manipulate it in the
       ways you would a normal perl value (e.g. using regular expressions).

   exists ($path, $context)
       Returns true if the given path exists.

   matches($node, $path, $context)
       Returns true if the node matches the path.

   find ($path, $context)
       The find function takes an XPath expression (a string) and returns  either  a  Tree::XPathEngine::NodeSet
       object   containing   the  nodes  it  found  (or  empty  if  no  nodes  matched  the  path),  or  one  of
       Tree::XPathEngine::Literal  (a  string),  Tree::XPathEngine::Number,  or  Tree::XPathEngine::Boolean.  It
       should  always  return  something  - and you can use ->isa() to find out what it returned. If you need to
       check how many nodes it found you should check $nodeset->size.  See Tree::XPathEngine::NodeSet.

   XPath variables
       XPath lets you use variables in expressions (see the XPath spec: <http://www.w3.org/TR/xpath>).

       set_var ($var_name, $val)
           sets the variable $var_name to val

       get_var ($var_name)
           get the value of the variable (there should be no need to use this method from  outside  the  module,
           but it looked silly to have "set_var" and "_get_var").

How to use this module

       The purpose of this module is to add XPah support to generic tree modules.

       It  works  by letting you create a Tree::XPathEngine object, that will be called to resolve XPath queries
       on a context. The context is a node (or a list of nodes) in a tree.

       The tree should share some characteristics with a XML tree: it is made of nodes, there  are  2  kinds  of
       nodes,  document  (the whole tree, the root of the tree is a child of this node), elements(regular nodes
       in the tree) and attributes.

       Nodes in the tree are expected to provide methods that will be called by the XPath engine to resolve  the
       query.  Not  all  of  the possible methods need be available, depending on the type of XPath queries that
       need to be supported: for example if the nodes do not have a text value then  there  is  no  need  for  a
       "string_value"  method, and XPath queries cannot include the "string()" function (using it will trigger a
       runtime error).

       Most of the expected methods are usual methods for a tree module, so it should not be  too  difficult  to
       implement them, by aliasing existing methods to the required ones.

       Just  in  case, here is a fast way to alias for example your own "parent" method to the "get_parent_node"
       needed by Tree::XPathEngine:

         *get_parent_node= *parent; # in the node package

       The XPath engine expects the whole tree and attributes to be full blown objects, which provide a  set  of
       methods similar to nodes. If they are not, see below for ways to "fake" it.

   Methods to be provided by the nodes
       xpath_get_name
           returns the name of the node.

           Not used for the document.

       xpath_string_value
           The   text   corresponding   to  the  node,  used  by  the  "string()"  function  (for  queries  like
           "//foo[string()="bar"]")

       xpath_get_next_sibling
       xpath_get_previous_sibling
       xpath_get_root_node
           returns the document object. see "Document object" below for more details.

       xpath_get_parent_node
           The parent of the root of the tree is the document node.

           The parent of an attribute is its element.

       xpath_get_child_nodes
           returns a list of children.

           note that the attributes are not children of an element

       xpath_is_element_node
       xpath_is_document_node
       xpath_is_attribute_node
       xpath_is_text_node
           only if the tree includes textual nodes

       xpath_to_string
           returns the node as a string

       xpath_to_number
           returns the node value as a number object

             sub xpath_to_number
               { return XML::XPath::Number->new( $_[0]->xpath_string_value); }

       xpath_cmp ($node_a, $node_b)
           compares 2 nodes and returns -1, 0 or 1 depending on whether $a_node is before,  equal  to  or  after
           $b_node in the tree.

           This is needed in order to return sorted results and to remove duplicates.

           See  "Ordering  nodesets"  below for a ready-to-use sorting method if your tree does not have a "cmp"
           method

   Element specific methods
       xpath_get_attributes
           returns the list of attributes, attributes should be objects that support the following methods:

Tricky bits

   Document object
       The original XPath works on XML, and is roughly speaking based on the DOM model of an  XML  document.  As
       far as the XPath engine is concerned, it still deals with a DOM tree.

       One  of  the  possibly annoying consequences is that in the DOM the document itself is a node, that has a
       single element child, the root of the document tree. If the tree you want to use this module  on  doesn't
       follow that model, if its root element is the tree itself, then you will have to fake it.

       This is how I did it in Tree::DAG_Node::XPath:

         # in package Tree::DAG_Node::XPath
         sub xpath_get_root_node
         { my $node= shift;
           # The parent of root is a Tree::DAG_Node::XPath::Root
           # that helps getting the tree to mimic a DOM tree
           return $node->root->xpath_get_parent_node;
         }

         sub xpath_get_parent_node
           { my $node= shift;

             return    $node->mother # normal case, any node but the root
                       # the root parent is a Tree::DAG_Node::XPath::Root object
                       # which contains the reference of the (real) root node
                    || bless { root => $node }, 'Tree::DAG_Node::XPath::Root';
           }

         # class for the fake root for a tree
         package Tree::DAG_Node::XPath::Root;

         sub xpath_get_child_nodes   { return ( $_[0]->{root}); }
         sub address                 { return -1; } # the root is before all other nodes
         sub xpath_get_attributes    { return []  }
         sub xpath_is_document_node  { return 1   }
         sub xpath_is_element_node   { return 0   }
         sub xpath_is_attribute_node { return 0   }

   Attribute objects
       If  the  attributes  in  the original tree are not objects, but simple fields in a hash, you can generate
       objects on the fly:

         # in the element package
         sub xpath_get_attributes
           { my $elt= shift;
             my $atts= $elt->attributes; # returns a reference to a hash of attributes
             my $rank=-1;                # used for sorting
             my @atts= map { bless( { name => $_, value => $atts->{$_}, elt => $elt, rank => $rank -- },
                                    'Tree::DAG_Node::XPath::Attribute')
                           }
                            sort keys %$atts;
             return @atts;
           }

         # the attribute package
         package Tree::DAG_Node::XPath::Attribute;
         use Tree::XPathEngine::Number;

         # not used, instead get_attributes in Tree::DAG_Node::XPath directly returns an
         # object blessed in this class
         #sub new
         #  { my( $class, $elt, $att)= @_;
         #    return bless { name => $att, value => $elt->att( $att), elt => $elt }, $class;
         #  }

         sub xpath_get_value         { return $_[0]->{value}; }
         sub xpath_get_name          { return $_[0]->{name} ; }
         sub xpath_string_value      { return $_[0]->{value}; }
         sub xpath_to_number         { return Tree::XPathEngine::Number->new( $_[0]->{value}); }
         sub xpath_is_document_node  { 0 }
         sub xpath_is_element_node   { 0 }
         sub xpath_is_attribute_node { 1 }
         sub to_string         { return qq{$_[0]->{name}="$_[0]->{value}"}; }

         # Tree::DAG_Node uses the address field to sort nodes, which simplifies things quite a bit
         sub xpath_cmp { $_[0]->address cmp $_[1]->address }
         sub address
           { my $att= shift;
             my $elt= $att->{elt};
             return $elt->address . ':' . $att->{rank};
           }

   Ordering nodesets
       XPath query results must be sorted, and duplicates removed, so the XPath engine needs to be able to  sort
       nodes.

       I does so by calling the "cmp" method on nodes.

       One of the easiest way to write such a method, for static trees, is to have a method of the object return
       its position in the tree as a number.

       If that is not possible, here is a method that should work (note that it only compares elements):

        # in the tree element package

         sub xpath_cmp($$)
           { my( $a, $b)= @_;
             if( UNIVERSAL::isa( $b, $ELEMENT))       # $ELEMENT is the tree element class
               { # 2 elts, compare them
                                         return $a->elt_cmp( $b);
                     }
             elsif( UNIVERSAL::isa( $b, $ATTRIBUTE))  # $ATTRIBUTE is the attribute class
               { # elt <=> att, compare the elt to the att->{elt}
                                         # if the elt is the att->{elt} (cmp return 0) then -1, elt is before att
                 return ($a->elt_cmp( $b->{elt}) ) || -1 ;
               }
             elsif( UNIVERSAL::isa( $b, $TREE))        # $TREE is the tree class
               { # elt <=> document, elt is after document
                                         return 1;
               }
             else
               { die "unknown node type ", ref( $b); }
           }

         sub elt_cmp
           { my( $a, $b)=@_;

             # easy cases
             return  0 if( $a == $b);
             return  1 if( $a->in($b)); # a starts after b
             return -1 if( $b->in($a)); # a starts before b

             # ancestors does not include the element itself
             my @a_pile= ($a, $a->ancestors);
             my @b_pile= ($b, $b->ancestors);

             # the 2 elements are not in the same twig
             return undef unless( $a_pile[-1] == $b_pile[-1]);

             # find the first non common ancestors (they are siblings)
             my $a_anc= pop @a_pile;
             my $b_anc= pop @b_pile;

             while( $a_anc == $b_anc)
               { $a_anc= pop @a_pile;
                 $b_anc= pop @b_pile;
               }

             # from there move left and right and figure out the order
             my( $a_prev, $a_next, $b_prev, $b_next)= ($a_anc, $a_anc, $b_anc, $b_anc);
             while()
               { $a_prev= $a_prev->_prev_sibling || return( -1);
                 return 1 if( $a_prev == $b_next);
                 $a_next= $a_next->_next_sibling || return( 1);
                 return -1 if( $a_next == $b_prev);
                 $b_prev= $b_prev->_prev_sibling || return( 1);
                 return -1 if( $b_prev == $a_next);
                 $b_next= $b_next->_next_sibling || return( -1);
                 return 1 if( $b_next == $a_prev);
               }
           }

         sub in
           { my ($self, $ancestor)= @_;
             while( $self= $self->xpath_get_parent_node) { return $self if( $self ==  $ancestor); }
           }

         sub ancestors
           { my( $self)= @_;
             while( $self= $self->xpath_get_parent_node) { push @ancestors, $self; }
             return @ancestors;
           }

         # in the attribute package
         sub xpath_cmp($$)
           { my( $a, $b)= @_;
             if( UNIVERSAL::isa( $b, $ATTRIBUTE))
               { # 2 attributes, compare their elements, then their name
                 return ($a->{elt}->elt_cmp( $b->{elt}) ) || ($a->{name} cmp $b->{name});
               }
             elsif( UNIVERSAL::isa( $b, $ELEMENT))
               { # att <=> elt : compare the att->elt and the elt
                 # if att->elt is the elt (cmp returns 0) then 1 (elt is before att)
                 return ($a->{elt}->elt_cmp( $b) ) || 1 ;
               }
             elsif( UNIVERSAL::isa( $b, $TREE))
               { # att <=> document, att is after document
                 return 1;
               }
             else
               { die "unknown node type ", ref( $b); }
           }

XPath extension

       The  module  supports  the  XPath  recommendation  to  the  same  extend  as  XML::XPath (that is, rather
       completely).

       It includes a perl-specific extension: direct support for regular expressions.

       You can use the usual (in Perl!) "=~" and "!~" operators. Regular expressions are / delimited  (no  other
       delimiter is accepted, \ inside regexp must be backslashed), the "imsx" modifiers can be used.

         $xp->findnodes( '//@att[.=~ /^v.$/]'); # returns the list of attributes att
                                                # whose value matches ^v.$

TODO

       provide  inheritable  node and attribute classes for typical cases, starting with nodes where the root IS
       the tree, and where attributes are a simple hash (similar to what I did in Tree::DAG_Node).

       better docs (patches welcome).

AUTHOR

       Michel Rodriguez, "<mirod@cpan.org>"

       This code is heavily based on the code for XML::XPath by Matt Sergeant copyright 2000 Axkit.com Ltd

BUGS

       Please  report  any  bugs  or  feature requests to "bug-tree-xpathengine@rt.cpan.org", or through the web
       interface at <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Tree-XPathEngine>.  I will be notified,  and
       then you'll automatically be notified of progress on your bug as I make changes.

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE

       XML::XPath Copyright 2000-2004 AxKit.com Ltd.  Copyright 2006 Michel Rodriguez, All Rights Reserved.

       This  program  is  free  software;  you can redistribute it and/or modify it under the same terms as Perl
       itself.

perl v5.36.0                                       2022-11-20                             Tree::XPathEngine(3pm)

NAME

DESCRIPTION

SYNOPSIS

DETAILS

API

How to use this module

Tricky bits

XPath extension

TODO

SEE ALSO

AUTHOR

BUGS

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE