Provided by: libxml-dom-perl_1.46-2_all bug

NAME

       XML::DOM - A perl module for building DOM Level 1 compliant document structures

SYNOPSIS

        use XML::DOM;

        my $parser = new XML::DOM::Parser;
        my $doc = $parser->parsefile ("file.xml");

        # print all HREF attributes of all CODEBASE elements
        my $nodes = $doc->getElementsByTagName ("CODEBASE");
        my $n = $nodes->getLength;

        for (my $i = 0; $i < $n; $i++)
        {
            my $node = $nodes->item ($i);
            my $href = $node->getAttributeNode ("HREF");
            print $href->getValue . "\n";
        }

        # Print doc file
        $doc->printToFile ("out.xml");

        # Print to string
        print $doc->toString;

        # Avoid memory leaks - cleanup circular references for garbage collection
        $doc->dispose;

DESCRIPTION

       This module extends the XML::Parser module by Clark Cooper.  The XML::Parser module is built on top of
       XML::Parser::Expat, which is a lower level interface to James Clark's expat library.

       XML::DOM::Parser is derived from XML::Parser. It parses XML strings or files and builds a data structure
       that conforms to the API of the Document Object Model as described at
       http://www.w3.org/TR/REC-DOM-Level-1.  See the XML::Parser manpage for other available features of the
       XML::DOM::Parser class.  Note that the 'Style' property should not be used (it is set internally.)

       The XML::Parser NoExpand option is more or less supported, in that it will generate EntityReference
       objects whenever an entity reference is encountered in character data. I'm not sure how useful this is.
       Any comments are welcome.

       As described in the synopsis, when you create an XML::DOM::Parser object, the parse and parsefile methods
       create an XML::DOM::Document object from the specified input. This Document object can then be examined,
       modified and written back out to a file or converted to a string.

       When using XML::DOM with XML::Parser version 2.19 and up, setting the XML::DOM::Parser option KeepCDATA
       to 1 will store CDATASections in CDATASection nodes, instead of converting them to Text nodes.
       Subsequent CDATASection nodes will be merged into one. Let me know if this is a problem.

       When using XML::Parser 2.27 and above, you can suppress expansion of parameter entity references (e.g.
       %pent;) in the DTD, by setting ParseParamEnt to 1 and ExpandParamEnt to 0. See Hidden Nodes for details.

       A Document has a tree structure consisting of Node objects. A Node may contain other nodes, depending on
       its type.  A Document may have Element, Text, Comment, and CDATASection nodes.  Element nodes may have
       Attr, Element, Text, Comment, and CDATASection nodes.  The other nodes may not have any child nodes.

       This module adds several node types that are not part of the DOM spec (yet.)  These are: ElementDecl (for
       <!ELEMENT ...> declarations), AttlistDecl (for <!ATTLIST ...> declarations), XMLDecl (for <?xml ...?>
       declarations) and AttDef (for attribute definitions in an AttlistDecl.)

XML::DOM Classes

       The XML::DOM module stores XML documents in a tree structure with a root node of type XML::DOM::Document.
       Different nodes in tree represent different parts of the XML file. The DOM Level 1 Specification defines
       the following node types:

       •   XML::DOM::Node - Super class of all node types

       •   XML::DOM::Document - The root of the XML document

       •   XML::DOM::DocumentType - Describes the document structure: <!DOCTYPE root [ ... ]>

       •   XML::DOM::Element - An XML element: <elem attr="val"> ... </elem>

       •   XML::DOM::Attr - An XML element attribute: name="value"

       •   XML::DOM::CharacterData - Super class of Text, Comment and CDATASection

       •   XML::DOM::Text - Text in an XML element

       •   XML::DOM::CDATASection - Escaped block of text: <![CDATA[ text ]]>

       •   XML::DOM::Comment - An XML comment: <!-- comment -->

       •   XML::DOM::EntityReference - Refers to an ENTITY: &ent; or %ent;

       •   XML::DOM::Entity - An ENTITY definition: <!ENTITY ...>

       •   XML::DOM::ProcessingInstruction - <?PI target>

       •   XML::DOM::DocumentFragment - Lightweight node for cut & paste

       •   XML::DOM::Notation - An NOTATION definition: <!NOTATION ...>

       In  addition,  the  XML::DOM  module  contains  the  following nodes that are not part of the DOM Level 1
       Specification:

       •   XML::DOM::ElementDecl - Defines an element: <!ELEMENT ...>

       •   XML::DOM::AttlistDecl - Defines one or more attributes in an <!ATTLIST ...>

       •   XML::DOM::AttDef - Defines one attribute in an <!ATTLIST ...>

       •   XML::DOM::XMLDecl - An XML declaration: <?xml version="1.0" ...>

       Other classes that are part of the DOM Level 1 Spec:

       •   XML::DOM::Implementation - Provides information about this implementation. Currently  it  doesn't  do
           much.

       •   XML::DOM::NodeList   -   Used   internally   to   store  a  node's  child  nodes.  Also  returned  by
           getElementsByTagName.

       •   XML::DOM::NamedNodeMap - Used internally to store an element's attributes.

       Other classes that are not part of the DOM Level 1 Spec:

       •   XML::DOM::Parser - An non-validating XML parser that creates XML::DOM::Documents

       •   XML::DOM::ValParser - A validating XML parser that creates XML::DOM::Documents. It uses  XML::Checker
           to check against the DocumentType (DTD)

       •   XML::Handler::BuildDOM - A PerlSAX handler that creates XML::DOM::Documents.

XML::DOM package

       Constant definitions
           The following predefined constants indicate which type of node it is.

        UNKNOWN_NODE (0)                The node type is unknown (not part of DOM)

        ELEMENT_NODE (1)                The node is an Element.
        ATTRIBUTE_NODE (2)              The node is an Attr.
        TEXT_NODE (3)                   The node is a Text node.
        CDATA_SECTION_NODE (4)          The node is a CDATASection.
        ENTITY_REFERENCE_NODE (5)       The node is an EntityReference.
        ENTITY_NODE (6)                 The node is an Entity.
        PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction.
        COMMENT_NODE (8)                The node is a Comment.
        DOCUMENT_NODE (9)               The node is a Document.
        DOCUMENT_TYPE_NODE (10)         The node is a DocumentType.
        DOCUMENT_FRAGMENT_NODE (11)     The node is a DocumentFragment.
        NOTATION_NODE (12)              The node is a Notation.

        ELEMENT_DECL_NODE (13)          The node is an ElementDecl (not part of DOM)
        ATT_DEF_NODE (14)               The node is an AttDef (not part of DOM)
        XML_DECL_NODE (15)              The node is an XMLDecl (not part of DOM)
        ATTLIST_DECL_NODE (16)          The node is an AttlistDecl (not part of DOM)

        Usage:

          if ($node->getNodeType == ELEMENT_NODE)
          {
              print "It's an Element";
          }

       Not  In  DOM  Spec:  The  DOM  Spec  does  not  mention UNKNOWN_NODE and, quite frankly, you should never
       encounter it. The last 4 node types were added to support the 4 added node classes.

   Global Variables
       $VERSION
           The variable $XML::DOM::VERSION contains the version number of this implementation, e.g. "1.43".

   METHODS
       These methods are not part of the DOM Level 1 Specification.

       getIgnoreReadOnly and ignoreReadOnly (readOnly)
           The DOM Level 1 Spec does not  allow  you  to  edit  certain  sections  of  the  document,  e.g.  the
           DocumentType,     so     by     default    this    implementation    throws    DOMExceptions    (i.e.
           NO_MODIFICATION_ALLOWED_ERR) when you try to edit a readonly node.   These  readonly  checks  can  be
           disabled by (temporarily) setting the global IgnoreReadOnly flag.

           The  ignoreReadOnly  method  sets  the global IgnoreReadOnly flag and returns its previous value. The
           getIgnoreReadOnly method simply returns its current value.

            my $oldIgnore = XML::DOM::ignoreReadOnly (1);
            eval {
            ... do whatever you want, catching any other exceptions ...
            };
            XML::DOM::ignoreReadOnly ($oldIgnore);     # restore previous value

           Another way to do it, using a local variable:

            { # start new scope
               local $XML::DOM::IgnoreReadOnly = 1;
               ... do whatever you want, don't worry about exceptions ...
            } # end of scope ($IgnoreReadOnly is set back to its previous value)

       isValidName (name)
           Whether the specified name is a valid "Name" as specified in the XML spec.  Characters  with  Unicode
           values > 127 are now also supported.

       getAllowReservedNames and allowReservedNames (boolean)
           The first method returns whether reserved names are allowed.  The second takes a boolean argument and
           sets whether reserved names are allowed.  The initial value is 1 (i.e. allow reserved names.)

           The  XML  spec  states  that  "Names"  starting  with  (X|x)(M|m)(L|l)  are  reserved for future use.
           (Amusingly enough, the XML version of the XML spec (REC-xml-19980210.xml) breaks that  very  rule  by
           defining  an  ENTITY with the name 'xmlpio'.)  A "Name" in this context means the Name token as found
           in the BNF rules in the XML spec.

           XML::DOM only checks for errors when you modify the DOM tree, not when the DOM tree is built  by  the
           XML::DOM::Parser.

       setTagCompression (funcref)
           There are 3 possible styles for printing empty Element tags:

           Style 0
                <empty/> or <empty attr="val"/>

               XML::DOM uses this style by default for all Elements.

           Style 1
                 <empty></empty> or <empty attr="val"></empty>

           Style 2
                 <empty /> or <empty attr="val" />

               This  style  is  sometimes desired when using XHTML.  (Note the extra space before the slash "/")
               See <http://www.w3.org/TR/xhtml1> Appendix C for more details.

           By default XML::DOM compresses all empty Element tags (style 0.)  You can control which style is used
           for a particular Element by calling XML::DOM::setTagCompression with a reference to a  function  that
           takes 2 arguments. The first is the tag name of the Element, the second is the XML::DOM::Element that
           is  being  printed.   The  function should return 0, 1 or 2 to indicate which style should be used to
           print the empty tag. E.g.

            XML::DOM::setTagCompression (\&my_tag_compression);

            sub my_tag_compression
            {
               my ($tag, $elem) = @_;

               # Print empty br, hr and img tags like this: <br />
               return 2 if $tag =~ /^(br|hr|img)$/;

               # Print other empty tags like this: <empty></empty>
               return 1;
            }

IMPLEMENTATION DETAILS

       •   Perl Mappings

           The value undef was used when the DOM Spec said null.

           The DOM Spec says: Applications must encode DOMString  using  UTF-16  (defined  in  Appendix  C.3  of
           [UNICODE]  and  Amendment  1  of  [ISO-10646]).  In this implementation we use plain old Perl strings
           encoded in UTF-8 instead of UTF-16.

       •   Text and CDATASection nodes

           The Expat parser expands EntityReferences and CDataSection sections  to  raw  strings  and  does  not
           indicate  where it was found.  This implementation does therefore convert both to Text nodes at parse
           time.  CDATASection and EntityReference nodes that are added to an existing Document  (by  the  user)
           will be preserved.

           Also,  subsequent  Text nodes are always merged at parse time. Text nodes that are added later can be
           merged with the normalize method. Consider using the addText method when adding Text nodes.

       •   Printing and toString

           When printing (and converting an XML Document to a string) the strings have  to  encoded  differently
           depending on where they occur. E.g. in a CDATASection all substrings are allowed except for "]]>". In
           regular  text,  certain  characters  are  not allowed, e.g. ">" has to be converted to "&gt;".  These
           routines should be verified by someone who knows the details.

       •   Quotes

           Certain sections in XML are quoted, like attribute values in an Element.   XML::Parser  strips  these
           quotes  and  the  print methods in this implementation always uses double quotes, so when parsing and
           printing a document, single quotes may be converted  to  double  quotes.  The  default  value  of  an
           attribute definition (AttDef) in an AttlistDecl, however, will maintain its quotes.

       •   AttlistDecl

           Attribute declarations for a certain Element are always merged into a single AttlistDecl object.

       •   Comments

           Comments  in the DOCTYPE section are not kept in the right place. They will become child nodes of the
           Document.

       •   Hidden Nodes

           Previous versions of XML::DOM would  expand  parameter  entity  references  (like  %pent;),  so  when
           printing the DTD, it would print the contents of the external entity, instead of the parameter entity
           reference.   With  this  release (1.27), you can prevent this by setting the XML::DOM::Parser options
           ParseParamEnt => 1 and ExpandParamEnt => 0.

           When it is parsing the contents of the external entities, it  *DOES*  still  add  the  nodes  to  the
           DocumentType,  but  it  marks  these  nodes by setting the 'Hidden' property. In addition, it adds an
           EntityReference node to the DocumentType node.

           When printing the DocumentType node (or when using to_expat() or to_sax()), the  'Hidden'  nodes  are
           suppressed,  so  you  will see the parameter entity reference instead of the contents of the external
           entities. See test case t/dom_extent.t for an example.

           The reason for adding the 'Hidden' nodes to the DocumentType node, is  that  the  nodes  may  contain
           <!ENTITY>  definitions  that  are referenced further in the document. (Simply not adding the nodes to
           the DocumentType could cause such entity references to be expanded incorrectly.)

           Note that you need XML::Parser 2.27 or higher for this to work correctly.

SEE ALSO

       XML::DOM::XPath

       The    Japanese    version    of    this    document     by     Takanori     Kawai     (Hippo2000)     at
       <http://member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm>

       The DOM Level 1 specification at <http://www.w3.org/TR/REC-DOM-Level-1>

       The XML spec (Extensible Markup Language 1.0) at <http://www.w3.org/TR/REC-xml>

       The XML::Parser and XML::Parser::Expat manual pages.

       XML::LibXML  also  provides  a DOM Parser, and is significantly faster than XML::DOM, and is under active
       development.  It requires that you download the Gnome libxml library.

       XML::GDOME will provide the DOM Level 2 Core API, and should be as fast as XML::LibXML, but more  robust,
       since    it   uses   the   memory   management   functions   of   libgdome.    For   more   details   see
       <http://tjmather.com/xml-gdome/>

CAVEATS

       The method getElementsByTagName() does not return a "live" NodeList.  Whether this is an actual caveat is
       debatable, but a few people on the www-dom mailing list seemed to think so. I haven't decided yet. It's a
       pain to implement, it slows things down and the benefits seem marginal.  Let me know what you think.

AUTHOR

       Enno Derksen is the original author.

       Send patches to T.J. Mather at <tjmather@maxmind.com>.

       Paid  support  is  available  from  directly  from  the  maintainers  of  this   package.    Please   see
       <http://www.maxmind.com/app/opensourceservices> for more details.

       Thanks to Clark Cooper for his help with the initial version.

perl v5.36.0                                       2022-10-14                                      XML::DOM(3pm)