Provided by: erlang-manpages_25.3.2.8+dfsg-1ubuntu4.4_all bug

NAME

       xmerl_sax_parser - XML SAX parser API

DESCRIPTION

       A  SAX  parser for XML that sends the events through a callback interface. SAX is the Simple API for XML,
       originally a Java-only API. SAX was the first widely adopted API for XML in  Java,  and  is  a  de  facto
       standard where there are versions for several programming language environments other than Java.

DATA TYPES

         option():
           Options used to customize the behaviour of the parser. Possible options are:

           {continuation_fun, ContinuationFun}:
             ContinuationFun is a call back function to decide what to do if the parser runs into EOF before the
             document is complete.

           {continuation_state, term()}:
              State that is accessible in the continuation call back function.

           {event_fun, EventFun}:
             EventFun is the call back function for parser events.

           {event_state, term()}:
              State that is accessible in the event call back function.

           {file_type, FileType}:
              Flag that tells the parser if it's parsing a DTD or a normal XML file (default normal).

             * FileType = normal | dtd

           {encoding, Encoding}:
              Set  default character set used (default UTF-8). This character set is used only if not explicitly
             given by the XML document.

             * Encoding = utf8 | {utf16,big} | {utf16,little} | latin1 | list

           skip_external_dtd:
              Skips the external DTD during parsing. This option is the same as  {external_entities,  none}  and
             {fail_undeclared_ref, false} but just for the DTD.

           disallow_entities:
              Implies that parsing fails if an ENTITY declaration is found.

           {entity_recurse_limit, N}:
              Sets how many levels of recursion that is allowed for entities. Default is 3 levels.

           {external_entities, AllowedType}:
              Sets which types of external entities that should be allowed, if not allowed it's just skipped.

             * AllowedType = all | file | none

           {fail_undeclared_ref, Boolean}:
              Decides  how  the parser should behave when an undeclared reference is found. Can be useful if one
             has turned of external entities so that an external DTD is not parsed. Default is true.

         :

         event():
           The SAX events that are sent to the user via the callback.

           startDocument:
              Receive notification of the beginning of a document. The SAX parser will send this event only once
             before any other event callbacks.

           endDocument:
              Receive notification of the end of a document. The SAX parser will send this event only once,  and
             it will be the last event during the parse.

           {startPrefixMapping, Prefix, Uri}:
              Begin the scope of a prefix-URI Namespace mapping. Note that start/endPrefixMapping events are not
             guaranteed  to  be properly nested relative to each other: all startPrefixMapping events will occur
             immediately before the corresponding startElement event, and all endPrefixMapping events will occur
             immediately after the corresponding endElement event, but their order is not otherwise  guaranteed.
             There  will  not be start/endPrefixMapping events for the "xml" prefix, since it is predeclared and
             immutable.

             * Prefix = string()

             * Uri = string()

           {endPrefixMapping, Prefix}:
              End the scope of a prefix-URI mapping.

             * Prefix = string()

           {startElement, Uri, LocalName, QualifiedName, Attributes}:
              Receive notification of the beginning of an element. The  Parser  will  send  this  event  at  the
             beginning  of every element in the XML document; there will be a corresponding endElement event for
             every startElement event (even when the element is empty). All of the  element's  content  will  be
             reported, in order, before the corresponding endElement event.

             * Uri = string()

             * LocalName = string()

             * QualifiedName = {Prefix, LocalName}

             * Prefix = string()

             * Attributes = [{Uri, Prefix, AttributeName, Value}]

             * AttributeName = string()

             * Value = string()

           {endElement, Uri, LocalName, QualifiedName}:
              Receive  notification  of the end of an element. The SAX parser will send this event at the end of
             every element in the XML document; there will be  a  corresponding  startElement  event  for  every
             endElement event (even when the element is empty).

             * Uri = string()

             * LocalName = string()

             * QualifiedName = {Prefix, LocalName}

             * Prefix = string()

           {characters, string()}:
              Receive notification of character data.

           {ignorableWhitespace, string()}:
              Receive notification of ignorable whitespace in element content.

           {processingInstruction, Target, Data}:
              Receive  notification  of  a processing instruction. The Parser will send this event once for each
             processing instruction found: note that processing instructions may occur before or after the  main
             document element.

             * Target = string()

             * Data = string()

           {comment, string()}:
              Report an XML comment anywhere in the document (both inside and outside of the document element).

           startCDATA:
              Report  the  start  of a CDATA section. The contents of the CDATA section will be reported through
             the regular characters event.

           endCDATA:
              Report the end of a CDATA section.

           {startDTD, Name, PublicId, SystemId}:
              Report the start of DTD declarations, it's reporting the start of the DOCTYPE declaration. If  the
             document has no DOCTYPE declaration, this event will not be sent.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

           endDTD:
              Report the end of DTD declarations, it's reporting the end of the DOCTYPE declaration.

           {startEntity, SysId}:
              Report the beginning of some internal and external XML entities. ???

           {endEntity, SysId}:
              Report the end of an entity. ???

           {elementDecl, Name, Model}:
              Report  an  element  type  declaration.  The content model will consist of the string "EMPTY", the
             string "ANY", or a parenthesised group, optionally followed by an occurrence indicator.  The  model
             will  be  normalized  so  that  all  parameter  entities  are  fully resolved and all whitespace is
             removed,and will include the enclosing parentheses. Other normalization (such as removing redundant
             parentheses or simplifying occurrence indicators) is at the discretion of the parser.

             * Name = string()

             * Model = string()

           {attributeDecl, ElementName, AttributeName, Type, Mode, Value}:
              Report an attribute type declaration.

             * ElementName = string()

             * AttributeName = string()

             * Type = string()

             * Mode = string()

             * Value = string()

           {internalEntityDecl, Name, Value}:
              Report an internal entity declaration.

             * Name = string()

             * Value = string()

           {externalEntityDecl, Name, PublicId, SystemId}:
              Report a parsed external entity declaration.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

           {unparsedEntityDecl, Name, PublicId, SystemId, Ndata}:
              Receive notification of an unparsed entity declaration event.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

             * Ndata = string()

           {notationDecl, Name, PublicId, SystemId}:
              Receive notification of a notation declaration event.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

         unicode_char():
            Integer representing valid unicode codepoint.

         unicode_binary():
            Binary with characters encoded in UTF-8 or UTF-16.

         latin1_binary():
            Binary with characters encoded in iso-latin-1.

EXPORTS

       file(Filename, Options) -> Result

              Types:

                 Filename = string()
                 Options = [option()]
                 Result = {ok, EventState, Rest} |
                  {Tag, Location, Reason, EndTags, EventState}
                 Rest = unicode_binary() | latin1_binary()
                 Tag = atom() (fatal_error, or user defined tag)
                 Location = {CurrentLocation, EntityName, LineNo}
                 CurrentLocation = string()
                 EntityName = string()
                 LineNo = integer()
                 EventState = term()
                 Reason = term()

              Parse file containing an XML document. This functions uses a default continuation function to read
              the file in blocks.

       stream(Xml, Options) -> Result

              Types:

                 Xml = unicode_binary() | latin1_binary() | [unicode_char()]
                 Options = [option()]
                 Result = {ok, EventState, Rest} |
                  {Tag, Location, Reason, EndTags, EventState}
                 Rest = unicode_binary() | latin1_binary() | [unicode_char()]
                 Tag = atom() (fatal_error or user defined tag)
                 Location = {CurrentLocation, EntityName, LineNo}
                 CurrentLocation = string()
                 EntityName = string()
                 LineNo = integer()
                 EventState = term()
                 Reason = term()

              Parse a stream containing an XML document.

CALLBACK FUNCTIONS

       The callback interface is based on that the user sends a fun with the correct signature to the parser.

EXPORTS

       Module:ContinuationFun(State) -> {NewBytes, NewState}

              Types:

                 State = NewState = term()
                 NewBytes = binary() | list() (should be same as start input in stream/2)

              This function is called whenever the parser runs out of input data. If the function can't get hold
              of more input an empty list or binary (depends on start input  in  stream/2)  is  returned.  Other
              types  of  errors  is  handled  through exceptions. Use throw/1 to send the following tuple {Tag =
              atom(), Reason = string()} if the continuation function encounters a fatal error. Tag is  an  atom
              that  identifies  the  functional  entity  that  sends  the  exception and Reason is a string that
              describes the problem.

       Module:EventFun(Event, Location, State) -> NewState

              Types:

                 Event = event()
                 Location = {CurrentLocation, Entityname, LineNo}
                 CurrentLocation = string()
                 Entityname = string()
                 LineNo = integer()
                 State = NewState = term()

              This function is called for every event sent by the parser. The error  handling  is  done  through
              exceptions.  Use  throw/1  to  send  the  following tuple {Tag = atom(), Reason = string()} if the
              application encounters a fatal error. Tag is an atom that identifies the  functional  entity  that
              sends the exception and Reason is a string that describes the problem.

Ericsson AB                                      xmerl 1.3.31.1                           xmerl_sax_parser(3erl)