Provided by: libmediawiki-dumpfile-perl_0.2.2-2_all bug

NAME

       MediaWiki::DumpFile::Pages - Process an XML dump file of pages from a MediaWiki instance

SYNOPSIS

         use MediaWiki::DumpFile::Pages;

         #dump files up to version 0.5 are tested
         $input = 'file-name.xml';
         #many supported compression formats
         $input = 'file-name.xml.bz2';
         $input = 'file-name.xml.gz';
         $input = \*FH;

         $pages = MediaWiki::DumpFile::Pages->new($input);

         #default values
         %opts = (
           input => $input,
           fast_mode => 0,
           version_ignore => 1
         );

         #override configuration options passed to constructor
         $ENV{MEDIAWIKI_DUMPFILE_VERSION_IGNORE} = 0;
         $ENV{MEDIAWIKI_DUMPFILE_FAST_MODE} = 1;

         $pages = MediaWiki::DumpFile::Pages->new(%opts);
         $version = $pages->version;

         #version 0.3 and later dump files only
         $sitename = $pages->sitename;
         $base = $pages->base;
         $generator = $pages->generator;
         $case = $pages->case;
         %namespaces = $pages->namespaces;

         #all versions
         while(defined($page = $pages->next) {
           print 'Title: ', $page->title, "\n";
         }

         $title = $page->title;
         $id = $page->id;
         $revision = $page->revision;
         @revisions = $page->revision;

         $text = $revision->text;
         $id = $revision->id;
         $timestamp = $revision->timestamp;
         $comment = $revision->comment;
         $contributor = $revision->contributor;
         #version 0.4 and later dump files only
         $bool = $revision->redirect;

         $username = $contributor->username;
         $id = $contributor->id;
         $ip = $contributor->ip;
         $username_or_ip = $contributor->astext;
         $username_or_ip = "$contributor";

METHODS

   new
       This is the constructor for this package. If it is called with a single parameter it must be the input to
       use for parsing. The input is specified as either the location of a MediaWiki pages dump file or a
       reference to an already open file handle.

       If more than one argument is passed to new it must be a hash of options. The keys are named

       input
           This is the input to parse as documented earlier.

       fast_mode
           Have the iterator run in fast mode by default; defaults to false. See the section on fast mode below.

       version_ignore
           Do not enforce parsing of only tested schemas in the XML document; defaults to true

   version
       Returns the version of the dump file.

   sitename
       Returns the sitename from the MediaWiki instance. Requires a dump file of at least version 0.3.

   base
       Returns the URL used to access the MediaWiki instance. Requires a dump file of at least version 0.3.

   generator
       Returns  the  version of MediaWiki that generated the dump file. Requires a dump file of at least version
       0.3.

   case
       Returns the case sensitivity configuration of the MediaWiki instance. Requires a dump file  of  at  least
       version 0.3.

   namespaces
       Returns  a  hash  where  the  key is the numerical namespace id and the value is the plain text namespace
       name. The main namespace has an id of 0 and an empty string value. Requires  a  dump  file  of  at  least
       version 0.3.

   next
       Accepts  an  optional  boolean argument to control fast mode. If the argument is specified it forces fast
       mode on or off. Otherwise the mode is controlled by the fast_mode configuration option. See  the  section
       below on fast mode for more information.

       It is safe to intermix calls between fast and normal mode in one parsing session.

       In all modes undef is returned if there is no more data to parse.

       In normal mode an instance of MediaWiki::DumpFile::Pages::Page is returned and the full API is available.

       In  fast mode an instance of MediaWiki::DumpFile::Pages::FastPage is returned; the only methods supported
       are title, text, and revision. This class can act  as  a  stand-in  for  MediaWiki::DumpFile::Pages::Page
       except it will throw an error if any attempt is made to access any other part of the API.

   size
       Returns  the size of the input file in bytes or if the input specified is a reference to a file handle it
       returns undef.

   current_byte
       Returns the number of bytes of XML that have been successfully parsed.

FAST MODE

       Fast mode is a way to get increased parsing performance while dropping some of the features available  in
       the  parser.  If you only require the titles and text from a page then fast mode will decrease the amount
       of time required just to parse the XML file; some times drastically.

       When fast mode is used on a dump file that has more than one revision of a single article in it only  the
       text  of  the first article in the dump file will be returned; the other revisions of the article will be
       silently skipped over.

MediaWiki::DumpFile::Pages::Page

       This object represents a distinct Mediawiki page and is used to access the page data  and  metadata.  The
       following methods are available:

       title
           Returns a string of the page title

       id  Returns a numerical page identification

       revision
           In  scalar  context  returns  the last revision in the dump for this page; in array context returns a
           list of all revisions made available for the page in the same order as the dump  file.  All  returned
           data is an instance of MediaWiki::DumpFile::Pages::Revision

MediaWiki::DumpFile::Pages::Page::Revision

       This  object  represents  a  distinct  revision of a page from the Mediawiki dump file. The standard dump
       files contain only the most specific revision of each page and the comprehensive dump files  contain  all
       revisions for each page. The following methods are available:

       text
           Returns the page text for this specific revision of the page.

       id  Returns the numerical revision id for this specific revision - this is independent of the page id.

       timestamp
           Returns a string value representing the time the revision was created. The string is in the format of
           "2008-07-09T18:41:10Z".

       comment
           Returns the comment made about the revision when it was created.

       contributor
           Returns an instance of MediaWiki::DumpFile::Pages::Page::Revision::Contributor

       minor
           Returns true if the edit was marked as being minor or false otherwise

       redirect
           Returns true if the page is a redirect to another page or false otherwise. Requires a dump file of at
           least version 0.4.

MediaWiki::DumpFile::Pages::Page::Revision::Contributor

       This  object  provides  access to the contributor of a specific revision of a page. When used in a scalar
       context it will return the username of the editor if the editor was logged in or the IP  address  of  the
       editor if the edit was anonymous.

       username
           Returns  the  username  of  the  editor  if  the editor was logged in when the edit was made or undef
           otherwise.

       id  Returns the numerical id of the editor if the editor was logged in or undef otherwise.

       ip  Returns the IP address of the editor if the editor was anonymous or undef otherwise.

       astext
           Returns the username of the editor if they were logged in  or  the  IP  address  if  the  editor  was
           anonymous.

ERRORS

   E_XML_CREATE_FAILED Error creating XML parser object
       While  trying  to  build  the  XML::TreePuller  object a fatal error occurred; the error message from the
       parser was included in the generated error output you saw. At the time of writing this document the error
       messages are not very helpful but for some reason the XML parser rejected the document; here's a list  of
       things to check:

       Make sure the file exists and is readable
       Make sure the file is actually an XML file and is not compressed

   E_XML_PARSE_FAILED XML parser failed during parsing
       Something  went wrong with the XML parser - the error from the parser was included in the generated error
       message. This happens when there is a severe error parsing the document such as a syntax error.

   E_UNTESTED_DUMP_VERSION Untested dump file versions
       The dump files created by Mediawiki include a versioned XML schema. This software is tested with the most
       recent  known  schema  versions  and  can  be  configured  to   enforce   a   specific   tested   schema.
       MediaWiki::DumpFile::Pages  no longer enforces the versions by default but the software author using this
       library has indicated that it should.  When this happens it dies with an error like the following:

       E_UNTESTED_DUMP_VERSION Version 0.4 dump file "t/simpleenglish-wikipedia.xml" has not  been  tested  with
       MediaWiki::DumpFile::Pages  version  0.1.9; see the ERRORS section of the MediaWiki::DumpFile::Pages Perl
       module documentation for what to do at lib/MediaWiki/DumpFile/Pages.pm line 148.

       If you encounter this condition you can do the following:

       Check your module version
           The error message should have the version number of this module in it. Check CPAN and see if there is
           a newer version with official support. The web page

             http://search.cpan.org/dist/MediaWiki-DumpFile/lib/MediaWiki/DumpFile/Pages.pm

           will show the highest supported version dump files near the top of the SYNOPSIS.

       Check the bug database
           It is possible the issue has been resolved already but the update has not made it onto CPAN yet.  See
           this web page

             http://rt.cpan.org/Public/Dist/Display.html?Name=mediawiki-dumpfile

           and check for an open bug report relating to the version number changing.

       Be adventurous
           If  you  just  want  to have the software run anyway and see what happens you can set the environment
           variable MEDIAWIKI_DUMPFILE_VERSION_IGNORE to a true value which will cause the  module  to  silently
           ignore  the case and continue parsing the document.  You can set the environment and run your program
           at the same time with a command like this:

             MEDIAWIKI_DUMPFILE_VERSION_IGNORE=1 ./wikiscript.pl

           This may work fine or it may fail in subtle ways silently - there is no way to know for sure with out
           studying the schema to see if the changes are backwards compatible.

       Open a bug report
           You can use the same URL for rt.cpan.org above to create a new ticket in MediaWiki-DumpFile  or  just
           send  an  email  to  "bug-mediawiki-dumpfile at rt.cpan.org". Be sure to use a title for the bug that
           others will be able to use to find this case as well and to include the  full  text  from  the  error
           message. Please also specify if you were adventurous or not and if it was successful for you.

AUTHOR

       Tyler Riddle, "<triddle at gmail.com>"

BUGS

       Please see MediaWiki::DumpFile for information on how to report bugs in this software.

COPYRIGHT & LICENSE

       Copyright 2009 "Tyler Riddle".

       This  program  is  free software; you can redistribute it and/or modify it under the terms of either: the
       GNU General Public License as published by the Free Software Foundation; or the Artistic License.

       See http://dev.perl.org/licenses/ for more information.

perl v5.34.0                                       2022-06-15                    MediaWiki::DumpFile::Pages(3pm)