Provided by: libebook-tools-perl_0.5.4-1.3_amd64 bug

NAME

       EBook::Tools - Object class for manipulating and generating E-books

DESCRIPTION

       This module provides an object interface and a number of related procedures intended to create or modify
       documents centered around the International Digital Publishing Forum (IDPF) standards, currently both
       OEBPS v1.2 and OPS/OPF v2.0.

SYNOPSIS

        use EBook::Tools qw(split_metadata system_tidy_xml);
        $EBook::Tools::tidysafety = 2;

        my $opffile = split_metadata('ebook.html');
        my $otheropffile = 'alternate.opf';
        my $retval = system_tidy_xml($opffile,'tidy-backup.xml');
        my $ebook = EBook::Tools->new($opffile);
        $ebook->fix_opf20;
        $ebook->fix_misc;
        $ebook->print;
        $ebook->save;

        $ebook->init($otheropffile);
        $ebook->fix_oeb12;
        $ebook->gen_epub;

DEPENDENCIES

   Perl Modules
       Archive::Zip
       Data::UUID (or OSSP::UUID)
       Date::Manip
           Note  that  Date::Manip  will die on MS Windows system unless the TZ environment variable is set in a
           specific manner. See:

           http://search.cpan.org/perldoc?Date::Manip#TIME_ZONES

       File::MimeInfo
       HTML::Parser
       Lingua::EN::NameParse
       Tie::IxHash
       Time::Local
       URI::Escape
       XML::Twig

   Other Programs
       Tidy
           The command "tidy" needs to be available, and ideally on the path.  If it isn't on the path,  package
           variable  "$tidycmd"  can  be set to its absolute path.  If tidy cannot be found, "system_tidy_xml()"
           and "system_tidy_xhtml()" will be nonfunctional.

CONFIGURABLE PACKAGE VARIABLES

       %bisacsubjects
           A mapping of lowercase BISAC codes and text descriptions to standard capitalized  text  descriptions.
           As  BISG claims copyright on this and does not allow the lists to be redistributed, this list must be
           downloaded and cached locally via "ebook dlbisac" before it is available.

           Running the unit tests will cause this to happen automatically.

       %bisactolc
           An extremely incomplete mapping of lowercase BISAC codes and text descriptions to Library of Congress
           standard subjects.

       %booktypes
           A hash mapping all-lowercase terms to a standard vocabulary to be used in <dc:type> elements.

       %dcelements12
           A tied  IxHash  mapping  an  all-lowercase  list  of  Dublin  Core  metadata  element  names  to  the
           capitalization  dictated by the OEB 1.2 specification, used by the fix_oeb12() and fix_oeb12_dcmeta()
           methods.  Changing the tags in this list will change  the  tags  recognized  and  placed  inside  the
           <dc-metadata> element.

           Order is preserved and significant -- fix_oeb12 will output DC metadata elements in the same order as
           in this hash, though order for tags of the same name is preserved from the input file.

       %dcelements20
           A  tied  IxHash  mapping  an  all-lowercase  list  of  Dublin Core metadata element names to the all-
           lowercase form dictated by the OPF 2.0 specification (which means it maps the all-lowercase  tags  to
           themselves).   It  is  used  by the fix_opf20() and fix_opf20_dcmeta() methods.  Changing the tags in
           this list will change the tags recognized and placed directly under the <metadata> element.

           Order is preserved and significant -- fix_opf20 will output DC metadata elements in the same order as
           in this hash, though order for tags of the same name is preserved from the input file.

       %lcsubjects
           An extremely incomplete mapping of lowercase terms to Library of  Congress  subject  classifications.
           This  is  used  for  automatic  normalization  of  subject  elements.   Every value in the hash has a
           lowercase representation of itself as a key in addition to any aliases.

           This MUST NOT contain mappings from BISAC codes or descriptors to Library of Congress subjects.   Use
           %bisactolc for that.

       %nonxmlentity2char
           This is the %entity2char conversion map from HTML::Entities with the 5 pre-defined XML entities (amp,
           gt,  lt,  quot,  apos) removed.  This is used during by "init()" to sanitize the OPF file data before
           parsing.  This hash can be modified to allow and  convert  other  non-standard  entities  to  unicode
           characters.  See HTML::Entities for details.

       %publishermap
           A  hash mapping known variants of publisher names to a canonical form, used by "fix_publisher()", and
           thus also indirectly by "fix_misc()".

           Keys should be entered in lowercase.  The hash can also be set empty to prevent fix_publisher()  from
           taking any action at all.

       %referencetypes
           A  hash  mapping  valid OPF 2.0 reference types to themselves, along with common variants to standard
           types.

       %relatorcodes
           A hash mapping the MARC Relator Codes (see: http://www.loc.gov/marc/relators/relacode.html) to  their
           descriptive names.

       %sexcodes
           A  hash normalizing subject tags for erotic fiction, as used by StoriesOnline, ASSTR, and Literotica,
           among others.

           Unlike the other mappings, this one is *not* lowercased, and the keys are case-sensitive.   The  hash
           maps only to the canonical base code, but subject normalization will also add a prefix, defaulting to
           the BISAC format of 'FICTION / Erotica / '.

       %strangenames
       %strangefileas
           Hashes  mapping  mapping  known incorrect outputs of name normalization to correct format.  The first
           handles the main name display, the second the file-as output.

       $tidycmd
           The tidy executable name.  This has to be a fully qualified pathname  if  tidy  isn't  on  the  path.
           Defaults to 'tidy'.

       $tidyxhtmlerrors
           The name of the error output file from system_tidy_xhtml().  Defaults to 'tidyxhtml-errors.txt'

       $tidyxmlerrors
           The name of the error output file from system_tidy_xml().  Defaults to 'tidyxml-errors.txt'

       $tidysafety
           The safety level to use when running tidy (default is 1).  Potential values are:

           "$tidysafety < 1":
               No checks performed, no error files kept, works like a clean tidy -m

               This setting is DANGEROUS!

           "$tidysafety == 1":
               Overwrites  original  file if there were no errors, but even if there were warnings.  Keeps a log
               of errors, but not warnings.

           "$tidysafety == 2":
               Overwrites original file if there were no errors, but even if there were warnings.  Keeps  a  log
               of both errors and warnings.

           "$tidysafety == 3":
               Overwrites  original  file  only if there were no errors or warnings.  Keeps a log of both errors
               and warnings.

           $tidysafety = 4>:
               Never overwrites original file.  Keeps a log of both errors and warnings.

       %validspecs
           A hash mapping valid specification strings to themselves, primarily  used  to  undefine  unrecognized
           values.  Default valid values are 'OEB12' and 'OPF20'.

CONSTRUCTORS AND INITIALIZATION

   "new($filename)"
       Instantiates  a  new EBook::Tools object.  If $filename is specified, it will also immediately initialize
       itself via the "init" method.

   "init($filename)"
       Initializes the object from an existing OPF file.  If $filename is specified and exists, the  OEB  object
       will  be  set  to  read and write to that file before attempting to initialize.  Otherwise, if the object
       currently points to an OPF file it will use that name.  If there is no OPF filename data,  and  $filename
       was  not  specified,  it  will  make  a  last-ditch  attempt  to  find  an  OPF  file first by looking in
       META-INF/container.xml, and if nothing is found there, by looking in the current directory for  a  single
       OPF file.

       If no such files or found (or more than one is found), the initialization croaks.

   "init_blank(%args)"
       Initializes  an  object  containing  nothing  more  than the basic OPF framework, suitable for adding new
       documents when creating an e-book from scratch.

       Arguments

       "init_blank" takes up to three optional named arguments:

       "opffile"
           This specifies the OPF filename to use.  If not specified, defaults to 'content.opf'

       "author"
           This specifies the content of the initial dc:creator element.  If not specified, defaults to "Unknown
           Author".

       "title"
           This specifies the content of the initial dc:title element. If not specified,  defaults  to  "Unknown
           Title".

       Example

        init_blank('opffile' => 'newfile.opf',
                   'title' => 'The Great Unknown');

ACCESSOR METHODS

       The  following  methods  return  data  deeper  in the structure than the auto-accessors, but still do not
       modify any object data or files.

   "adult()"
       Returns the text of the Mobipocket-specific <Adult> element, if it exists.  Expected values are 'yes' and
       undef.

   "contributor_list()"
       Returns a list containing the text of all dc:contributor elements (case-insensitive) or undef if none are
       found.

       In scalar context, returns the first contributor, not the last.

   "coverimage()"
       Returns the href to the cover image, or undef if none is found.

       Checks the following in order:

       <reference type="other.ms-coverimage-standard">
       <EmbeddedCover>
       <meta name="cover"> (as href)
       <meta name="cover"> (as item id)

   "date_list(%args)"
       Returns the text of all dc:date elements (case-insensitive) matching the specified attributes.

       In scalar context, returns the first match, not the last.

       Returns undef if no matches are found.

       Arguments

       •   "id" - 'id' attribute that must be matched exactly for the result to be added to the list

       •   "event" 'opf:event' or 'event' attribute that must be matched exactly for the result to be  added  to
           the list

       If  both arguments are specified a value is added to the list if it matches either one (i.e. the logic is
       OR).

   "description()"
       Returns the description of the e-book, if set, or undef otherwise.

   "element_list(%args)"
       Returns a list containing the text values of all elements matching the specified criteria.

       Arguments

       •   "cond"

           The XML::Twig search condition used to find the elements.  Typically this is just the GI (tag) of the
           element you wish to find, but it can also be a "qr//" expression,  coderef,  or  anything  else  that
           XML::Twig can work with.  See the XML::Twig documentation for details.

           If this is not specified, an error is added and the method returns undef.

       •   "id" (optional)

           'id' attribute that must be matched exactly for the result to be added to the list

       •   "scheme" (optional)

           'opf:scheme'  or  'scheme'  attribute  that must be matched exactly for the result to be added to the
           list

       •   "event" (optional)

           'opf:event' or 'event' attribute that must be matched exactly for the result to be added to the list

       If more than one of the arguments "id", "scheme", or "event" are specified a value is added to  the  list
       if it matches any one (i.e. the logic is OR).

   "errors()"
       Returns an arrayref containing any generated error messages.

   "identifier()"
       Returns the text of the dc:identifier element pointed to by the 'unique-identifier' attribute of the root
       'package' element, or undef if it could not be located.

   "isbn_list(%args)"
       Returns  a  list  of all ISBNs matching the specified attributes.  See "twigelt_is_isbn()" for a detailed
       description of how the ISBN elements are found.

       Returns undef if no matches are found.

       In scalar context returns the first match, not the last.

       See also "isbns(%args)".

       Arguments

       •   "id" (optional)

           'id' attribute that must be matched exactly for the result to be added to the list

       •   "scheme" (optional)

           'opf:scheme' or 'scheme' attribute that must be matched exactly for the result to  be  added  to  the
           list

       If  both arguments are specified a value is added to the list if it matches either one (i.e. the logic is
       OR).

   "isbns(%args)"
       Returns all of the ISBN identifiers matching the specificied attributes as a list of hashrefs,  with  one
       hash  per  ISBN identifier presented in the order that the identifiers are found.  The hash keys are 'id'
       (containing the value of the 'id' attribute), 'scheme' (containing the value of either  the  'opf:scheme'
       or 'scheme' attribute, whichever is found first), and 'isbn' (containing the text of the element).

       If no entries are found, returns undef.

       In scalar context returns the first match, not the last.

       See also "isbn_list(%args)".

       Arguments

       "isbns()" takes two optional named arguments:

       •   "id" - 'id' attribute that must be matched exactly for the result to be added to the list

       •   "scheme" - 'opf:scheme' or 'scheme' attribute that must be matched exactly for the result to be added
           to the list

       If  both arguments are specified a value is added to the list if it matches either one (i.e. the logic is
       OR).

   "languages()"
       Returns a list containing the text of all dc:language (case-insensitive) entries, or undef  if  none  are
       found.

       In scalar context returns the first match, not the last.

   "manifest(%args)"
       Returns  all  of  the items in the manifest as a list of hashrefs, with one hash per manifest item in the
       order that they appear, where the hash keys are  'id',  'href',  and  'media-type',  each  returning  the
       appropriate attribute, if any.

       In scalar context, returns the first match, not the last.

       Arguments

       "manifest()" takes four optional named arguments:

       •   "id" - 'id' attribute to match

       •   "href" - 'href' attribute to match

       •   "mtype" - 'media-type' attribute to match

       •   "logic" - logic to use (valid values are 'and' or 'or', default: 'and')

       If  any  of the named arguments are specified, "manifest()" will return only items matching the specified
       criteria.  This is an exact case-sensitive match, but it can (especially in  the  case  of  mtype)  still
       return multiple elements.

       Return values

       Returns  undef if there is no <manifest> element directly underneath <package>, or if <manifest> contains
       no items.

       See also

       "manifest_hrefs()", "spine()"

       Example

        @manifest = $ebook->manifest(id => 'ncx',
                                     mtype => 'text/xml',
                                     logic => 'or');

   "manifest_hrefs()"
       Returns a list of all of the hrefs in the current OPF manifest, or the empty list if none are found.

       In scalar context returns the first href, not the last.

       See also: "manifest()", "spine_idrefs()"

   "opf_namespace()"
       Some OPF generators explicity assign 'opf:' in the gi as a prefix on  OPF  elements.   This  makes  later
       parsing more complex and is unnecessary, so this is stripped before any parsing takes place.

   "opfdir()"
       Returns the full filesystem path to the directory where the OPF metadata file will be stored, or undef if
       either the top-level directory or the OPF subdirectory is not found.

   "opffile()"
       Returns the name of the file where the OPF metadata will be stored or undef if no value is found..

   "opfpath()"
       Returns the full filesystem path to the file where the OPF metadata will be stored or undef if either the
       top level directory or the OPF filename is not found.

   "primary_author()"
       Finds  the  primary  author of the book, defined as the first 'dc:creator' entry (case-insensitive) where
       either the attribute opf:role="aut" or role="aut", or the first 'dc:creator' entry  if  no  entries  with
       either attribute can be found.  Entries must actually have text to be considered.

       In  list context, returns a two-item list, the first of which is the text of the entry (the author name),
       and the second element of which  is  the  value  of  the  'opf:file-as'  or  'file-as'  attribute  (where
       'opf:file-as' is given precedence if both are present).

       In scalar context, returns the text of the entry (the author name).

       If no entries are found, returns undef.

       Uses "twigelt_is_author()" in the first half of the search.

       Example

        my ($fileas, $author) = $ebook->primary_author;
        my $author = $ebook->primary_author;

   "print_errors()"
       Prints the current list of errors to STDERR.

   "print_warnings()"
       Prints the current list of warnings to STDERR.

   "print_opf()"
       Prints the OPF file to the default filehandle

   "publishers()"
       Returns  a  list containing the text of all dc:publisher (case-insensitive) entries, or undef if none are
       found.

       In scalar context returns the first match, not the last.

   "retailprice()"
       Returns a two-scalar list, the first scalar being the text of the Mobipocket-specific <SRP>  element,  if
       it exists, and the second being the 'Currency' attribute of that element, if it exists.

       In scalar context, returns just the text (price).

       Returns undef if the SRP element is not found.

   "review()"
       Returns  the text of the Mobipocket-specific <Review> element, if it exists.  Returns undef if one is not
       found.

   "rights('id' => 'identifier')"
       Returns a list containing the text of all of dc:rights or dc:copyrights (case-insensitive) entries in the
       e-book, or undef if none are found.

       In scalar context returns the first match, not the last.

       If the optional named argument 'id' is specified, it will only return  entries  where  the  id  attribute
       matches the specified identifier.  Although this still returns a list, if more than one entry is found, a
       warning is logged.

       Note  that  dc:copyrights  is  not a valid Dublin Core element -- it is included only because some broken
       Mobipocket books use it.

   "search_knownuids()"
       Searches the OPF twig for the first dc:identifier (case-insensitive) element with an  ID  matching  known
       UID IDs.

       Returns the ID if a match is found, undef otherwise

   "search_knownuidschemes()"
       Searches  descendants of the OPF twig element for the first <dc:identifier> or <dc:Identifier> subelement
       with the attribute 'scheme' or 'opf:scheme' matching a known list of schemes for unique IDs

       NOTE: this is NOT a case-insensitive search!  If you have to deal with really bizarre  input,  make  sure
       that you run "fix_oeb12()" or "fix_opf20()" before calling "fix_packageid()" or "fix_misc()".

       Returns the ID if a match is found, undef otherwise.

   "spec()"
       Returns  the  version  of  the OEB specification currently in use.  Valid values are "OEB12" and "OPF20".
       This value will default to undef until "fix_oeb12" or "fix_opf20" is called, as there is no way  for  the
       object to know what specification is being conformed to (if any) until it attempts to enforce it.

   "spine()"
       Returns  all  of  the  manifest  items  referenced  in the spine as a list of hashrefs, with one hash per
       manifest item in the order that they appear, where the hash keys are 'id', 'href', and 'media-type', each
       returning the appropriate attribute, if any.

       In scalar context, returns the first item, not the last.

       Returns undef if there is no <spine> element directly underneath <package>, or  if  <spine>  contains  no
       itemrefs.   If  <spine>  exists,  but <manifest> does not, or a spine itemref exists but points an ID not
       found in the manifest, spine() logs an error and returns undef.

       See also: "spine_idrefs()", "manifest()"

   "spine_idrefs()"
       Returns a list of all of the idrefs in the current OPF spine, or the empty list if none are found.

       In scalar context, returns the first idref, not the last.

       See also: "spine()", "manifest_hrefs()"

   "subject_list()"
       Returns a list containing the text of all dc:subject elements or undef if none are found.

       In scalar context, returns the first subject, not the last.

   "title()"
       Returns the title of the e-book, or undef  if  no  dc:title  element  (case-insensitive)  exists.   If  a
       dc:title element exists, but contains no text, returns an empty string.

   "twig()"
       Returns the raw XML::Twig object used to store the OPF metadata.

       Although  this  twig can be manipulated via the standard XML::Twig methods, doing so requires caution and
       is not recommended.  In particular, changing the root element  from  here  will  cause  the  EBook::Tools
       internal  twig  and twigroot attributes to become unlinked and the result of any subsequent action is not
       defined.

   "twigcheck()"
       Croaks showing the calling location unless $self has both a twig and a  twigroot,  and  the  twigroot  is
       <package>.  Used as a sanity check for methods that use twig or twigroot.

   "twigroot()"
       Returns the raw XML::Twig root element used to store the OPF metadata.

       This  twig  element  can be manipulated via the standard XML::Twig::Elt methods, but care should be taken
       not to attempt to cut this element from its twig as doing so will cause the  EBook::Tools  internal  twig
       and twigroot attributes to become unlinked and the result of any subsequent action is not defined.

   "warnings()"
       Returns an arrayref containing any generated warning messages.

MODIFIER METHODS

       Unless  otherwise  specified,  all modifier methods return undef if an error was added to the error list,
       and true otherwise (even if a warning was added to the warning list).

   "add_document($href,$id,$mediatype)"
       Adds a document to the OPF manifest and spine, creating <manifest> and <spine> if necessary.  To  add  an
       item only to the OPF manifest, see add_item().

       Arguments

       $href
           The  href  to  the  document  in  question.   Usually,  this is just a filename (or relative path and
           filename) of a file in the current working directory.  If you are planning to eventually  generate  a
           .epub book, all hrefs MUST be in or below the current working directory.

           The method returns undef if $href is not defined or empty.

       $id The XML ID to use.  If not specified, defaults to the href with invalid characters removed.

           This  must  be  unique  not  only  to  the manifest list, but to every element in the OPF file.  If a
           duplicate ID exists, the method sets an error and returns undef.

       $mediatype (optional)
           The mime type of the document.  If not specified, will attempt to autodetect the mime  type,  and  if
           that fails, will default to 'application/xhtml+xml'.

   "add_error(@errors)"
       Adds  @errors  to  the  list  of object errors.  Each member of @errors should be a string containing the
       entire text of the error, with no ending newline.

       SEE ALSO: "add_warning()", "clear_errors()", "clear_warnerr()"

   "add_identifier(%args)"
       Creates a new dc:identifier element containing the specified text, id, and scheme.

       If a <dc-metadata>  element  exists  underneath  <metadata>,  the  identifier  element  will  be  created
       underneath  the  <dc-metadata>  in  OEB  1.2  format,  otherwise  the title element is created underneath
       <metadata> in OPF 2.0 format.

       Returns the twig element containing the new identifier.

       Arguments

       "add_identifier()" takes three named arguments, one mandatory, two optional.

       •   "text" - the text of the identifier.  This is mandatory, and the method croaks if it is not present.

       •   "scheme" - 'opf:scheme' or 'scheme' attribute to be added (optional)

       •   "id" - 'id' attribute to be added.  If this is specified, and the id is already  in  use,  a  warning
           will  be  added  but  the  method  will  continue,  removing  the  id attribute from the element that
           previously contained it.

   "add_item($href,$id,$mediatype)"
       Adds a document to the OPF manifest (but not spine), creating <manifest> if necessary.  To  add  an  item
       only to both the OPF manifest and spine, see add_document().

       Arguments

       $href
           The  href  to  the  document  in  question.   Usually,  this is just a filename (or relative path and
           filename) of a file in the current working directory.  If you are planning to eventually  generate  a
           .epub book, all hrefs MUST be in or below the current working directory.

       $id The XML ID to use.  If not specified, defaults to the href with all nonword characters removed.

           This  must  be  unique  not  only  to  the manifest list, but to every element in the OPF file.  If a
           duplicate ID exists, the method sets an error and returns undef.

       $mediatype (optional)
           The mime type of the document.  If not specified, will attempt to autodetect the mime  type,  and  if
           that fails, will set an error and return undef.

   add_metadata(%args)
       Creates a metadata element with the specified text, attributes, and parent.

       If  a <dc-metadata> element exists underneath <metadata>, the language element will be created underneath
       the <dc-metadata> and any standard attributes will be created in OEB 1.2 format, otherwise the element is
       created underneath <metadata> in OPF 2.0 format.

       Returns 1 on success, returns undef if no gi or if no text was specified.

       Arguments

       "gi"
           The generic identifier (tag) of the metadata element to alter  or  create.   If  not  specified,  the
           method sets an error and returns undef.

       "parent"
           The  generic  identifier (tag) of the parent to use for any newly created element.  If not specified,
           defaults to 'dc-metadata' if 'dc-metadata' exists underneath 'metadata', and 'metadata' otherwise.

           A newly created element will be created under the first element  found  with  this  gi.   A  modified
           element will be moved under the first element found with this gi.

           Newly  created  elements  will  use  OPF  2.0 attribute names if the parent is 'metadata' and OEB 1.2
           attribute names otherwise.

       "text"
           This specifies the element text to set.  If not specified, the  method  sets  an  error  and  returns
           undef.

       "id" (optional)
           This  specifies  the  ID  to  set  on the element.  If set and the ID is already in use, a warning is
           logged and the ID is removed from the other location and assigned to the element.

       "fileas" (optional)
           This specifies the file-as attribute to set on the element.

       "role" (optional)
           This specifies the role attribute to set on the element.

       "scheme" (optional)
           This specifies the scheme attribute to set on the element.

       Example

        $retval = $ebook->add_metadata(gi => 'AuthorNonstandard',
                                       text => 'Element Text',
                                       id => 'customid',
                                       fileas => 'Text, Element',
                                       role => 'xxx',
                                       scheme => 'code');

   "add_subject(%args)"
       Creates a new dc:subject element containing the specified text, code, and id.

       If a <dc-metadata> element exists underneath <metadata>, the subject element will be  created  underneath
       the  <dc-metadata> in OEB 1.2 format, otherwise the title element is created underneath <metadata> in OPF
       2.0 format.

       Returns the twig element containing the new subject.

       Arguments

       "add_subject()" takes four named arguments, one mandatory, three optional.

       •   "text" - the text of the subject.  This is mandatory, and the method croaks if it is not present.

       •   "scheme" (optional) - 'opf:scheme' or 'scheme' attribute to be added.  Be warned that neither the OEB
           1.2 nor the OPF 2.0 specifications allow a scheme to  be  added  to  this  element,  so  if  this  is
           specified, the resulting OPF file will fail to validate against either standard.

       •   "basiccode"  (optional)  -  'BASICCode'  attribute to be added.  Be warned that this is a Mobipocket-
           specific attribute that does not exist in either the OEB 1.2 nor the OPF 2.0  specifications,  so  if
           this is specified, the resulting OPF file will fail to validate against either standard.

       •   "id"  (optional) - 'id' attribute to be added.  If this is specified, and the id is already in use, a
           warning will be added but the method will continue, removing the id attribute from the  element  that
           previously contained it.

   "add_warning(@newwarning)"
       Joins  @newwarning to a single string and adds it to the list of object warnings.  The warning should not
       end with a newline newline.

       SEE ALSO: "add_error()", "clear_warnings()", "clear_warnerr()"

   "clear_errors()"
       Clear the current list of errors

   "clear_warnerr()"
       Clear both the error and warning lists

   "clear_warnings()"
       Clear the current list of warnings

   "delete_meta_filepos()"
       Deletes metadata elements with the attribute 'filepos' underneath the given parent element

       These are secondary metadata elements included in the output from mobi2html may that are not used.

   "delete_subject(%args)"
       Deletes dc:subject and dc:Subject elements based  on  text  content  or  the  id,  scheme,  or  basiccode
       attributes.  Matches are case-sensitive.

       Specifying multiple arguments will delete subject matching any of them.

       This has the same potential arguments as add_subject.

       Returns the count of elements deleted.

   "fix_creators()"
       Normalizes creator and contributor names and file-as attributes

       Names  are  normalized  to  'First Last' format, while file-as attributes are normalized to 'Last, First'
       format.

       This can damage some unusual names that do not match standard capitalization formats, so it is  not  made
       part of "fix_misc()".

   "fix_dates()"
       Standardizes  all  <dc:date>  elements  via fix_datestring().  Adds a warning to the object for each date
       that could not be fixed.

       Called from "fix_misc()".

   "fix_guide()"
       Fixes problems related to the OPF guide elements, specifically:

       •   Ensures the guide element exists

       •   Moves all reference elements directly underneath the guide element

       •   Finds nonstandard reference types and either converts them to standard or prefaces them with 'other.'

       •   Finds reference elements with a href with only an anchor portion and assigns them to the first  spine
           href.   This  only works if the spine is in working condition, so it may be wise to run "fix_spine()"
           before "fix_guide()" if the input is expected to be very badly broken.

       Logs a warning if a reference href is found that does not appear in the manifest.

   "fix_languages(%args)"
       Checks through  the  <dc:language>  elements  (case-insensitive)  and  removes  any  duplicates.   If  no
       <dc:language> elements are found, one is created.

       TODO: Also convert language names to IANA language and region codes.

       Arguments

       •   "default"

           The  default language string to use when creating a new language element.  If not specified, defaults
           to 'en'.

   "fix_links()"
       Checks through the links in the manifest and checks them for anything they might link to, adding anything
       missing to the manifest.

       A warning is added for every manifest item missing a href.

       If no <manifest> element exists directly underneath the <package> root, or <manifest> contains no  items,
       the method logs a warning and returns undef.  Otherwise, it returns 1.

   "fix_manifest()"
       Finds all <item> elements and moves them underneath <manifest>, creating <manifest> if necessary.

       Logs  a  warning but continues if it finds an <item> with a missing id or href attribute.  If both id and
       href attributes are missing, logs a warning, skips moving  the  item  entirely  (unless  it  was  already
       underneath  <manifest>,  in which case it is moved to preserve its sort order along all other items under
       <manifest>), but otherwise continues.

   "fix_metastructure_basic()"
       Verifies that <metadata> exists (creating it if necessary), and  moves  it  to  be  the  first  child  of
       <package>.   If  additional  <metadata> elements exist, their children are moved into the first one found
       and then the extras are deleted.

       Used in "fix_metastructure_oeb12()", "fix_packageid()", and "set_primary_author(%args)".

   "fix_metastructure_oeb12()"
       Verifies the existence of <metadata>, <dc-metadata>, and  <x-metadata>,  creating  them  as  needed,  and
       making sure that <metadata> is a child of <package>, while <dc-metadata> and <x-metadata> are children of
       <metadata>.

       Used in "fix_oeb12()" and "fix_mobi()".

   "fix_misc()"
       Fixes  miscellaneous  potential  problems  in  OPF  data.   Specifically,  this  is a shortcut to calling
       "delete_meta_filepos()",   "fix_packageid()",   "fix_dates()",   "fix_languages()",    "fix_publisher()",
       "fix_manifest()", "fix_spine()", "fix_subjects()", "fix_type()", "fix_guide()", and "fix_links()".

       "fix_creators()"  is  not  run  from  this,  as  it carries a risk of taking a correct name and making it
       incorrect.

       The objective here is that you can run either "fix_misc()" and either "fix_oeb12()" or "fix_opf20()"  and
       a perfectly valid OPF file will result from only two calls.

   "fix_mobi()"
       Manipulates the twig to fix Mobipocket-specific issues

       •   Force  the  OEB  1.2  structure  (although  not  the  namespace,  DTD,  or  capitalization),  so that
           <dc-metadata> and <x-metadata> are guaranteed to exist.

       •   Find and move all Mobi-specific elements to <x-metadata>

       •   If no <output> element exists, creates one for a utf-8 ebook

       Note that the forced creation of <output> will cause the  OPF  file  to  become  noncompliant  with  IDPF
       specifications.

   "fix_oeb12()"
       Modifies the OPF data to conform to the OEB 1.2 standard

       Specifically, this involves:

       •   adding the OEB 1.2 doctype

       •   removing OPF 2.0 version and namespace attributes

       •   setting the OEB 1.2 namespace on <package>

       •   moving  all of the dc-metadata elements underneath an element with tag <dc-metadata>, which itself is
           forced to be underneath <metadata>, which is created if it doesn't exist.

       •   moving any remaining tags underneath <x-metadata>, again forced to be under <metadata>

       •   making the dc-metadata tags conform to the OEB v1.2 capitalization

   "fix_oeb12_dcmetatags()"
       Makes a case-insensitive search for tags matching a known list of DC metadata elements and  corrects  the
       capitalization  to  the  OEB  1.2  standard.   Also  corrects 'dc:Copyrights' to 'dc:Rights'.  See global
       variable $dcelements12.

       The "fix_oeb12()" method does this also, but fix_oeb12_dcmetatags() is usable  separately  for  the  case
       where  you  want  DC metadata elements with consistent tag names, but don't want them moved from wherever
       they are.

   "fix_opf20()"
       Modifies the OPF data to conform to the OPF 2.0 standard

       Specifically, this involves:

       •   moving all of the dc-metadata and x-metadata elements directly underneath <metadata>

       •   removing the <dc-metadata> and <x-metadata> elements themselves

       •   lowercasing the dc-metadata tags (and fixing dc:copyrights to dc:rights)

       •   setting namespaces on dc-metata OPF attributes

       •   setting version and xmlns attributes on <package>

       •   setting xmlns:dc and xmlns:opf on <metadata>

   "fix_opf20_dcmetatags()"
       Makes a case-insensitive search for tags matching a known list of DC metadata elements and  corrects  the
       capitalization  to  the  OPF  2.0  standard.   Also corrects 'dc:copyrights' to 'dc:rights'.  See package
       variable %dcelements20.

       The "fix_opf20()" method does this also, but "fix_opf20_dcmetatags()" is usable separately for  the  case
       where  you  want  DC metadata elements with consistent tag names, but don't want them moved from wherever
       they are.

   "fix_packageid()"
       Checks the <package> element for the attribute 'unique-identifier', makes sure that it  is  mapped  to  a
       valid  dc:identifier  subelement,  and if not, searches those subelements for an identifier to assign, or
       creates one if nothing can be found.

       Requires that <metadata> exist.  Croaks if it doesn't.  Run "fix_oeb12()" or "fix_opf20()" before calling
       this if the input might be very broken.

   "fix_publisher()"
       Standardizes publisher names in all dc:publisher entities, mapping known variants of a  publisher's  name
       to a canonical form via package variable %publishermap.

       Publisher entries with no text are deleted.

   "fix_spine()"
       Fixes problems with the OPF spine, specifically:

       Moves all <itemref> elements underneath <spine>, creating <spine> if necessary.

   "fix_subjects()"
       Deletes  empty  and  duplicate  subject  elements  and normalizes existing subject text against the known
       Library of Congress mappings.

       If $self->{erotic} is true, then the book will be treated as a work of erotic fiction  and  the  subjects
       will  go through preprocessing against the %sexcodes package variable, normalizing matches and prepending
       'FICTION / Erotica / ' (with a trailing space).

       This method is called as a component of "fix_misc()".

   "fix_type()"
       Normalizes <dc:type> elements against a limited list based on book types listed in Wikipedia.

   "gen_epub(%args)"
       Creates  a  .epub  format  e-book.   This  will  create  (or  overwrite)   the   files   'mimetype'   and
       'META-INF/container.xml' in the current directory, creating the subdirectory META-INF as needed.

       A NCX file will also be created if missing.

       Arguments

       This method can take two optional named arguments.

       "filename"
           The  filename  of  the  .epub output file.  If not specified, takes the base name of the opf file and
           adds a .epub extension.

       "dir"
           The directory to output the .epub file.  If not specified, uses the current working directory.  If  a
           specified directory does not exist, it will be created, or the method will croak.

       Example

        gen_epub(filename => 'mybook.epub',
                 dir => '../epub_books');

   "gen_epub_files()"
       Generates  the  "mimetype" and "META-INF/container.xml" files expected by a .epub container, but does not
       actually generate the .epub file itself.  This will be called automatically by "gen_epub".

       The OPF will be normalized to the OPF 2.0 format.

       If no NCX element exists, it will also be created.

   "gen_ncx($filename)"
       Creates a NCX-format table of contents from the package unique-identifier, the dc:title, dc:creator,  and
       spine elements, and then add the NCX entry to the manifest if it is not already referenced.

       Adds an error and fails if any of those cannot be found.  The first available dc:title is taken, but will
       prioritize   dc:creator   elements   with   opf:role="aut"   over  those  with  no  role  attribute  (see
       twigelt_is_author() for details).

       WARNING: This method REQUIRES that the  e-book  be  in  OPF  2.0  format  to  function  correctly.   Call
       fix_opf20()  before calling gen_ncx().  gen_ncx() will log an error and fail if $self{spec} is not set to
       OPF20.

       Arguments

       $filename : The filename to save to.  If not specified, will use 'toc.ncx'.

       This method will overwrite any existing file.

       Returns a twig containing the NCX XML, or undef on failure.

   "save()"
       Saves the OPF file to disk.  Existing files are backed up to filename.backup.

   "set_adult($bool)"
       Sets the Mobipocket-specific <Adult> element, creating or deleting it as necessary.  If  $bool  is  true,
       the  text  is  set  to  'yes'.   If it is defined but false, any existing elements are deleted.  If it is
       undefined, the method immediately returns.

       If a new element has to be created, "fix_metastructure_oeb12"  is  called  to  ensure  that  <x-metadata>
       exists  and  the  element  is  created  under  <x-metadata>, as Mobipocket elements are not recognized by
       Mobipocket's software when placed directly under <metadata>

   "set_cover(%args)"
       Sets a cover image

       In OPF 2.0, this is  done  by  setting  both  a  <meta  name="cover">  element  and  a  guide  <reference
       type="other.ms-coverimage-standard"> element (though some readers will also extract the first image found
       in the HTML of the <reference type="cover"> element, which this method will not handle).

       In OEB 1.2, this is done by setting the <EmbeddedCover> tag.

       If the filename is not currently listed as an item in the manifest, it is added.

       Arguments

       "href"
           The filename of the image file to use.  This is mandatory.

       "id"
           The id attribute to assign to its item element

       "spec"
           The  specification  to use, either OEB12 or OPF20.  If this is left undefined, the current spec state
           will be checked, and if that is undefined, it will default to OPF20.

   "set_date(%args)"
       Sets the date metadata for a given event.  If more than one dc:date or dc:Date element  is  present  with
       the specified event attribute, sets the first.  If no dc:date element is present with the specified event
       attribute, a new element is created.

       If  a <dc-metadata> element exists underneath <metadata>, the date element will be created underneath the
       <dc-metadata> in OEB 1.2 format, otherwise the title element is created underneath <metadata> in OPF  2.0
       format.

       Returns 1 on success, logs an error and returns undef if no text or event was specified.

       Arguments

       "text"
           This  specifies the description to use as the text of the element.  If not specified, the method sets
           an error and returns undef.

       "event"
           This optionally specifies the event attribute for the date.  This attribute is not valid in  OPF  3.0
           (which only allows publication date in this element) and should no longer be used.

       "id" (optional)
           This  specifies  the  ID  to  set  on the element.  If set and the ID is already in use, a warning is
           logged and the ID is removed from the other location and assigned to the element.

   set_description(%args)
       Sets the text and optionally ID of the first dc:description element  found  (case-insensitive).   Creates
       the  element  if  one  did  not  exist.   If  a  <dc-metadata>  element exists underneath <metadata>, the
       description element will be created underneath the <dc-metadata> in OEB 1.2 format, otherwise  the  title
       element is created underneath <metadata> in OPF 2.0 format.

       Returns 1 on success, returns undef if no publisher was specified.

       Arguments

       "set_description()" takes one required and one optional named argument

       "text"
           This  specifies  the  description  to  use  as the text of the element.  If not specified, the method
           returns undef.

       "id" (optional)
           This specifies the ID to set on the element.  If set and the ID is  already  in  use,  a  warning  is
           logged and the ID is removed from the other location and assigned to the element.

       Example

        $retval = $ebook->set_description('text' => 'A really good book',
                                          'id' => 'mydescid');

   "set_erotic($bool)"
       If $bool is true, "$self-"{erotic}> is set to 1, otherwise this is set to 0.

       This will enable or disable special handling for erotic books, most notably in subject normalization.

       This is not related in any way to "set_adult" which is a Mobipocket-specific flag.

       Returns 1 if no argument is given, 0 otherwise.

   "set_language(%args)"
       Sets  the  text and optionally the ID of the first dc:language element found (case-insensitive).  Creates
       the element if one did not exist.  If a <dc-metadata> element exists underneath <metadata>, the  language
       element  will  be  created underneath the <dc-metadata> in OEB 1.2 format, otherwise the title element is
       created underneath <metadata> in OPF 2.0 format.

       Returns 1 on success, returns undef if no text was specified.

       Arguments

       "text"
           This specifies the language set as the text of the element.  If not specified,  the  method  sets  an
           error  and  returns undef.  This should be an IANA language code, and it will be lowercased before it
           is set.

       "id" (optional)
           This specifies the ID to set on the element.  If set and the ID is  already  in  use,  a  warning  is
           logged and the ID is removed from the other location and assigned to the element.

       Example

        $retval = $ebook->set_language('text' => 'en-us',
                                       'id' => 'langid');

   set_meta(%args)
       Sets a <meta> element in the <metadata> element area.

       Arguments

       "name"
           The  name  attribute  to  use  when  finding or creating OPF 2.0 <meta> elements.  Either this or the
           property attribute (below) must be specified, but specifying both is an error.

       "content"
           The value of the content attribute to set on OPF 2.0 elements.  If this value is empty or  undefined,
           but "name" is provided and matches an existing element, that element will be deleted.

       "property"
           The  property  attribute  to  use  when  finding or creating OPF 3.0 <meta> elements.  Either this or
           "name" (above) must be specified, but specifying both is an error.

       "refines"
           The refines attribute to use when finding or creating OPF 3.0 <meta> elements.

       "scheme"
           The scheme attribute to use when creating or updating OPF 3.0 <meta> elements.

       "text"
           The text set on OPF 3.0 <meta> elements.  If this value is empty  or  undefined,  but  "property"  is
           provided  and  the  combination of "property" and "refines" matches an existing element, that element
           will be deleted.

       "lang"
           The xml:lang attribute to set.  This is valid on both OPF 2 and OPF 3 <meta> elements.

   set_metadata(%args)
       Sets the text and optionally the ID  of  the  first  specified  element  type  found  (case-insensitive).
       Creates the element if one did not exist (with the exact capitalization specified).

       If  a <dc-metadata> element exists underneath <metadata>, the language element will be created underneath
       the <dc-metadata> and any standard attributes will be created in OEB 1.2 format, otherwise the element is
       created underneath <metadata> in OPF 2.0 format.

       Returns 1 on success, returns undef if no gi or if no text was specified.

       Arguments

       "gi"
           The generic identifier (tag) of the metadata element to alter  or  create.   If  not  specified,  the
           method sets an error and returns undef.

       "parent"
           The  generic  identifier (tag) of the parent to use for any newly created element.  If not specified,
           defaults to 'dc-metadata' if 'dc-metadata' exists underneath 'metadata', and 'metadata' otherwise.

           A newly created element will be created under the first element  found  with  this  gi.   A  modified
           element will be moved under the first element found with this gi.

           Newly  created  elements  will  use  OPF  2.0 attribute names if the parent is 'metadata' and OEB 1.2
           attribute names otherwise.

       "text"
           This specifies the element text to set.  If not specified, the  method  sets  an  error  and  returns
           undef.

       "id" (optional)
           This  specifies  the  ID  to  set  on the element.  If set and the ID is already in use, a warning is
           logged and the ID is removed from the other location and assigned to the element.

       "fileas" (optional)
           This specifies the file-as attribute to set on the element.

       "role" (optional)
           This specifies the role attribute to set on the element.

       "scheme" (optional)
           This specifies the scheme attribute to set on the element.

       Example

        $retval = $ebook->set_metadata(gi => 'AuthorNonstandard',
                                       text => 'Element Text',
                                       id => 'customid',
                                       fileas => 'Text, Element',
                                       role => 'xxx',
                                       scheme => 'code');

   set_opffile($filename)
       Sets the filename used to store the OPF metadata.

       Returns 1 on success; sets an error message and returns undef if no filename was specified.

   set_retailprice(%args)
       Sets the Mobipocket-specific  <SRP>  element  (Suggested  Retail  Price),  creating  or  deleting  it  as
       necessary.

       If  a  new  element  has  to  be created, "fix_metastructure_oeb12" is called to ensure that <x-metadata>
       exists and the element is created under <x-metadata>,  as  Mobipocket  elements  are  not  recognized  by
       Mobipocket's software when placed directly under <metadata>

       Arguments

       •   "text"

           The  price  to  set  as  the text of the element.  If this is undefined, the method sets an error and
           returns undef.  If it is set but false, any existing <SRP> element is deleted.

       •   "currency" (optional)

           The value to set on the 'Currency' attribute.  If not provided, defaults to 'USD' (US Dollars)

   set_primary_author(%args)
       Sets the text, id, file-as, and role attributes of the primary author element (see "primary_author()" for
       details on how this is found), or if no primary author exists,  creates  a  new  element  containing  the
       information.

       This  method  calls  "fix_metastructure_basic()" to enforce the presence of the <metadata> element.  When
       creating a new element, the method will use the OEB 1.2 element name and create  the  element  underneath
       <dc-metadata>  if  an  existing  <dc-metadata>  element  is  found underneath <metadata>.  If no existing
       <dc-metadata> element is found, the new element will be created with the OPF 2.0  element  name  directly
       underneath  <metadata>.   Regardless,  it  is probably a good idea to call "fix_oeb12()" or "fix_opf20()"
       after calling this method to ensure a consistent scheme.

       Arguments

       Three optional named arguments can be passed:

       •   "text"

           Specifies the author text to set.  If omitted and a primary author element exists, the text  will  be
           left  as  is;  if  omitted  and  a  primary  author element cannot be found, an error message will be
           generated and the method will return undef.

       •   "fileas"

           Specifies the 'file-as' attribute to set.  If omitted  and  a  primary  author  element  exists,  any
           existing  attribute  will be left untouched; if omitted and a primary author element cannot be found,
           the newly created element will not have this attribute.

       •   "id"

           Specifies the 'id' attribute to set.  If this is specified, and the id is already in use,  a  warning
           will  be  added  but  the  method  will  continue,  removing  the  id attribute from the element that
           previously contained it.

           If this is omitted and a primary author element exists, any existing id will be  left  untouched;  if
           omitted  and  a primary author element cannot be found, the newly created element will not have an id
           set.

       If called with no arguments, the only effect this method has is to enforce that either an  'opf:role'  or
       'role' attribute is set to 'aut' on the primary author element.

       Return values

       Returns 1 if successful, returns undef and sets an error message if the author argument is missing and no
       primary author element was found.

   "set_publisher(%args)"
       Sets  the text and optionally the ID of the first dc:publisher element found (case-insensitive).  Creates
       the element if one did not exist.  If a <dc-metadata> element exists underneath <metadata>, the publisher
       element will be created underneath the <dc-metadata> in OEB 1.2 format, otherwise the  title  element  is
       created underneath <metadata> in OPF 2.0 format.

       Returns 1 on success, returns undef if no publisher was specified.

       Arguments

       "set_publisher()" takes one required and one optional named argument

       "text"
           This  specifies  the  publisher name to set as the text of the element.  If not specified, the method
           returns undef.

       "id" (optional)
           This specifies the ID to set on the element.  If set and the ID is  already  in  use,  a  warning  is
           logged and the ID is removed from the other location and assigned to the element.

       Example

        $retval = $ebook->set_publisher('text' => 'My Publishing House',
                                        'id' => 'mypubid');

   set_review(%args)
       Sets  the  text  and  optionally  ID of the first <Review> element found (case-insensitive), creating the
       element if one did not exist.

       This is a Mobipocket-specific element and if it needs to be created  it  will  always  be  created  under
       <x-metadata> with "fix_metastructure_oeb12()" called to ensure that <x-metadata> exists.

       Returns 1 on success, returns undef if no review text was specified

       Arguments

       "text"
           This  specifies  the  description  to  use  as the text of the element.  If not specified, the method
           returns undef.

       "id" (optional)
           This specifies the ID to set on the element.  If set and the ID is  already  in  use,  a  warning  is
           logged and the ID is removed from the other location and assigned to the element.

       Example

        $retval = $ebook->set_review('text' => 'This book is perfect!',
                                     'id' => 'revid');

   "set_rights(%args)"
       Sets  the  text of the first dc:rights or dc:copyrights element found (case-insensitive).  If the element
       found has the gi of dc:copyrights, it  will  be  changed  to  dc:rights.   This  is  to  correct  certain
       noncompliant Mobipocket files.

       Creates  the  element if one did not exist.  If a <dc-metadata> element exists underneath <metadata>, the
       title element will be created underneath the <dc-metadata> in OEB 1.2 format, otherwise the title element
       is created underneath <metadata> in OPF 2.0 format.

       Returns 1 on success, returns undef if no rights string was specified.

       Arguments

       •   "text"

           This specifies the text of the element.  If not specified, the method returns undef.

       •   "id" (optional)

           This specifies the ID to set on the element.  If set and the ID is  already  in  use,  a  warning  is
           logged but the method continues anyway.

   "set_spec($spec)"
       Sets  the OEB specification to match when modifying OPF data.  Allowable values are 'OEB12', 'OPF20', and
       'MOBI12'.

       Returns 1 if successful; returns undef and sets an error message if an unknown specification was set.

   "set_timestamp()"
       Sets the <meta property="dcterms:modified"> element to the current timestamp  and  removes  duplicate  or
       nonstandard timestamps.

   "set_title(%args)"
       Sets  the  text or id of the first dc:title element found (case-insensitive).  Creates the element if one
       did not exist.  If a <dc-metadata> element exists  underneath  <metadata>,  the  title  element  will  be
       created underneath the <dc-metadata> in OEB 1.2 format, otherwise the title element is created underneath
       <metadata> in OPF 2.0 format.

       Arguments

       set_title() takes two optional named arguments.  If neither is specified, the method will do nothing.

       •   "text"

           This  specifies  the  text of the element.  If not specified, and no title element is found, an error
           will be set and the method will return undef -- set_title() will refuse to create a dc:title  element
           with no text.

       •   "id"

           This  specifies  the  ID  to  set  on the element.  If set and the ID is already in use, a warning is
           logged but the method continues anyway.

   "set_type(%args)"
       Sets the text and optionally the ID of the first dc:type element found (case-insensitive).   Creates  the
       element  if  one  did  not exist.  If a <dc-metadata> element exists underneath <metadata>, the publisher
       element will be created underneath the <dc-metadata> in OEB 1.2 format, otherwise the  title  element  is
       created underneath <metadata> in OPF 2.0 format.

       Returns 1 on success, returns undef if no publisher was specified.

       Arguments

       "set_type()" takes one required and one optional named argument

       "text"
           This  specifies  the  publisher name to set as the text of the element.  If not specified, the method
           returns undef.

       "id" (optional)
           This specifies the ID to set on the element.  If set and the ID is  already  in  use,  a  warning  is
           logged and the ID is removed from the other location and assigned to the element.

       Example

        $retval = $ebook->set_type('text' => 'Short Story',
                                   'id' => 'mytypeid');

PROCEDURES

       All procedures are exportable, but none are exported by default.  All procedures can be exported by using
       the ":all" tag.

   "_lc"
       Wrapper  for CORE::lc to get around the fact that builtins can't be used in dispatch tables prior to Perl
       5.16.

       WARNING: this procedure may disappear once Perl 5.16 is standard on all systems in common use!  For  that
       reason, this is not exportable.

   "_uc"
       Wrapper  for CORE::uc to get around the fact that builtins can't be used in dispatch tables prior to Perl
       5.16.

       WARNING: this procedure may disappear once Perl 5.16 is standard on all systems in common use!  For  that
       reason, this is not exportable.

   "capitalize($string)"
       Capitalizes the first letter of each word in $string.

       Returns the corrected string.

   "clean_filename($string)"
       Takes an input string and cleans out any characters that would not be valid in a filename.

       Returns the cleaned string.

   "create_epub_container($opffile)"
       Creates the XML file META-INF/container.xml pointing to the specified OPF file.

       Creates the META-INF directory if necessary.  Will destroy any non-directory file named 'META-INF' in the
       current   directory.    If   META-INF/container.xml   already   exists,  it  will  rename  that  file  to
       META-INF/container.xml.backup.

       Arguments

       $opffile
           The OPF filename (and path, if necessary) to use in the container.  If not  specified,  looks  for  a
           sole OPF file in the current working directory.  Fails if more than one is found.

       Return values

       Returns a twig representing the container data if successful, undef otherwise

   "create_epub_mimetype()"
       Creates  a  file  named 'mimetype' in the current working directory containing 'application/epub+zip' (no
       trailing newline)

       Destroys and overwrites that file if it exists.

       Returns the mimetype string if successful, undef otherwise.

   "debug($level,@message)"
       Prints a debugging message to "STDERR" if package variable $debug is greater than or equal to $level.   A
       trailing newline is appended, and should not be part of @message.

       Returns true or dies.

   "excerpt_line($text)"
       Takes  as  an  argument a list of text pieces that will be joined.  If the joined length is less than 70,
       all of the joined text is returned.

       If the joined length is greater than 70, the return string is the first 30 characters followed by ' [...]
       ' followed by the last 30 characters.

   "find_in_path($pattern,@extradirs)"
       Searches through $ENV{PATH} (and optionally any additional directories specified in @extradirs)  for  the
       first  regular  file  matching  $pattern.  $pattern itself can take two forms: if passed a "qr//" regular
       expression, that expression is used directly.  If passed any other string, that string will be used for a
       case-insensitive exact match where the extension '.bat', '.com', or '.exe' is optional  (i.e.  the  final
       pattern will be "qr/^ $pattern (\.bat|\.com|\.exe)? $/ix").

       Returns the first match found, or undef if there were no matches or if no pattern was specified.

   "find_links($filename)"
       Searches  through  a  file for href and src attributes, and returns a list of unique links with any named
       anchors removed (e.g. 'myfile.html#part7' returns as just 'myfile.html').  If no links are found, or  the
       file does not exist, returns undef.

       Does  not  check  to see if the links are local.  Requires that links be surrounded by double quotes, not
       single or left bare.  Assumes that any link will not be broken across multiple lines,  so  it  will  (for
       example) fail to find:

        <img src=
        "myfile.jpg">

       though it can find:

        <img
         src="myfile.jpg">

       This also does not distinguish between local files and remote links.

   "find_opffile()"
       Attempts  to  locate  an  OPF  file, first by calling "get_container_rootfile()" to check the contents of
       "META-INF/container.xml", and then by looking for a single file with the extension ".opf" in the  current
       working directory.

       Returns the filename of the OPF file, or undef if nothing was found.

   "fix_datestring($datestring)"
       Takes  a  date  string  and  attempts  to  convert it to the limited subset of ISO8601 allowed by the OPF
       standard (YYYY, YYYY-MM, or YYYY-MM-DD).

       In the special case of finding MM/DD/YYYY, it assumes that it was a Mobipocket-mangled date, and not only
       converts it, but will strip the day information  if  the  day  is  '01',  and  both  the  month  and  day
       information  if  both  month  and  day  are '01'.  This is because Mobipocket Creator enforces a complete
       MM/DD/YYYY even if the month and day aren't known, and it is common practice to use  01  for  an  unknown
       value.

       Arguments

       $datestring
           A date string in a format recognizable by Date::Manip

       Returns $fixeddate

       $fixeddate : the corrected string, or undef on failure

   "get_container_rootfile($container)"
       Opens and parses an OPS/epub container, extracting the 'full-path' attribute of element 'rootfile'

       Arguments

       $container
           The OPS container to parse.  Defaults to 'META-INF/container.xml'

       Return values

       Returns a string containing the rootfile on success, undef on failure.

   "hashvalue_key_self(\%hash, $modifier)"
       Takes  as  an argument a hash reference and an optional modifier and inserts a new key for every value in
       that hash if no such key already exists.

       If the modifer is set to 'lc' or 'uc', the value is either lowercased or uppercased  respectively  before
       it is used as a key.

       Croaks if the first argument is not a hashref, or if an invalid modifier string is used.

   "hexstring($bindata)"
       Takes  as  an  argument a scalar containing a sequence of binary bytes.  Returns a string converting each
       octet of the data to its two-digit hexadecimal equivalent.  There is no leading "0x" on the string.

   "print_memory($label)"
       Checks /proc/$PID/statm and prints out a line to STDERR showing the current  memory  usage.   This  is  a
       debugging  tool that will likely fail to do anything useful on a system without a /proc system compatible
       with Linux.

       Arguments

       $label
           If defined, will be output along with the memory usage.

       Returns 1 on success, undef otherwise.

   "split_metadata($metahtmlfile, $metafile)"
       Takes a psuedo-HTML containing one or more <metadata>...</metadata> blocks and splits  out  the  metadata
       blocks  into  an  XML file ready to be used as an OPF document.  The input HTML file is rewritten without
       the metadata.

       If $metafile (or the temporary HTML-only file created during the split) already exists, it will be  moved
       to filename.backup.

       Arguments

       $metahtmlfile
           The filename of the pseudo-HTML file

       $metafile (optional)
           The  filename to write out any extracted metadata.  If not specified, will default to the basename of
           $metahtmlfile with '.opf' appended.

       Returns the filename the metadata was written to, or undef if no metadata was found.

   "split_pre($htmlfile,$outfilebase)"
       Splits <pre>...</pre> blocks out of a source HTML file into  their  own  separate  HTML  files  including
       required   headers.    Each  block  will  be  written  to  its  own  file  following  the  naming  format
       "$outfilebase-###.html", where ### is a three-digit number beginning at 001  and  incrementing  for  each
       block  found.   If  $outfilebase  is  not  specified,  it  defaults  to  the  basename  of $htmlfile with
       "-pre-###.html" appended.  The

       Returns a list containing all filenames created.

   "strip_script(%args)"
       Strips any <script>...</script> blocks out of a HTML file.

       Arguments

       "infile"
           Specifies the input file.  If not specified, the sub croaks.

       "outfile"
           Specifies the output file.  If not specified, it  defaults  to  "infile"  (i.e.  the  input  file  is
           overwritten).

       "noscript"
           If set to true, the sub will strip <noscript>...</noscript> blocks as well.

   "system_result($caller,$retval,@syscmd)"
       Checks  the  result of a system call and croak on failure with an appropriate message.  For this to work,
       it MUST be used as the line immediately following the system command.

       Arguments

       $caller
           The calling function (used in output message)

       $retval
           The return value of the system command

       @syscmd
           The array passed to the system call

       Return Values

       Returns 0 on success

       Croaks on failure.

   "system_tidy_xhtml($infile,$outfile)"
       Runs tidy on a XHTML file semi-safely (using a secondary file)

       Converts HTML to XHTML if necessary

       Arguments

       $infile
           The filename to tidy

       $outfile
           The filename to use for tidy output if the safety condition to overwrite the input file isn't met.

           Defaults to "infile-tidy.ext" if not specified.

       Global variables used

       $tidycmd
           the location of the tidy executable

       $tidyxhtmlerrors
           the filename to use to output errors

       $tidysafety
           the safety factor to use (see CONFIGURABLE GLOBAL VARIABLES, above)

       Return Values

       Returns the return value from tidy

       0 - no errors
       1 - warnings only
       2 - errors
       Dies horribly if the return value is unexpected

   "system_tidy_xml($infile,$outfile)"
       Runs tidy on an XML file semi-safely (using a secondary file)

       Arguments

       $infile
           The filename to tidy

       $outfile (optional)
           The filename to use for tidy output if the safety condition to overwrite the input file isn't met.

           Defaults to "infile-tidy.ext" if not specified.

       Global variables used

       $tidycmd
           the name of the tidy executable

       $tidyxmlerrors
           the filename to use to output errors

       $tidysafety
           the safety factor to use (see CONFIGURABLE GLOBAL VARIABLES, above)

       Return values

       Returns the return value from tidy

       0 - no errors
       1 - warnings only
       2 - errors
       Dies horribly if the return value is unexpected

   "trim"
       Removes any whitespace characters from the beginning or end of every  string  in  @list  (also  works  on
       scalars).

        trim;               # trims $_ inplace
        $new = trim;        # trims (and returns) a copy of $_
        trim $str;          # trims $str inplace
        $new = trim $str;   # trims (and returns) a copy of $str
        trim @list;         # trims @list inplace
        @new = trim @list;  # trims (and returns) a copy of @list

       This was shamelessly copied from japhy's example at perlmonks.org:

       http://www.perlmonks.org/?node_id=36684

       If needed for large lists, it would probably be better to use String::Strip.

   "twigelt_create_uuid($gi)"
       Creates an unlinked element with the specified gi (tag), and then assigns it the id and scheme attributes
       'UUID'.

       Arguments

       $gi : The gi (tag) to use for the element
           Default: 'dc:identifier'

       Returns the element.

   "twigelt_detect_duplicate($element1, $element2)"
       Takes  two  twig elements and returns 1 if they have the same GI (tag), text, and attributes, but are not
       actually the same element.  The GI comparison is case-insensitive.  The others are case-sensitive.

       Returns 0 otherwise.

       Croaks if passed anything but twig elements.

   "twigelt_fix_oeb12_atts($element)"
       Checks the attributes in a twig element to see if they match OPF names with an opf: namespace, and if so,
       removes the namespace.  Used by the fix_oeb12() method.

       Takes as a sole argument a twig element.

       Returns that element with the modified attributes, or undef if the  element  didn't  exist.   Returns  an
       unmodified element if both att and opf:att exist.

   "twigelt_fix_opf20_atts($element)"
       Checks  the  attributes  in  a  twig  element to see if they match OPF names, and if so, prepends the OPF
       namespace.  Used by the fix_opf20() method.

       Takes as a sole argument a twig element.

       Returns that element with the modified attributes, or undef if the element didn't exist.

   "twigelt_is_author($element)"
       Takes as an argument a twig element.  Returns true if the element is a dc:creator (case-insensitive) with
       either a opf:role="aut" or role="aut" attribute defined.   Returns  undef  otherwise,  and  also  if  the
       element has no text.

       Croaks if fed no argument, or fed an argument that isn't a twig element.

       Intended to be used as a twig search condition.

       Example

        my @elements = $ebook->twigroot->descendants(\&twigelt_is_author);

   "twigelt_is_isbn($element)"
       Takes  as  an argument a twig element.  Returns true if the element is a dc:identifier (case-insensitive)
       where any of the id, opf:scheme, or scheme attributes start with 'isbn', '-isbn',  'eisbn',  or  'e-isbn'
       (again case-insensitive).

       Returns undef otherwise, and also if the element has no text.

       Croaks if fed no argument, or fed an argument that isn't a twig element.

       Intended to be used as a twig search condition.

       Example

        my @elements = $ebook->twigroot->descendants(\&twigelt_is_isbn);

   "twigelt_is_knownuid($element)"
       Takes  as  an argument a twig element.  Returns true if the element is a dc:identifier (case-insensitive)
       element with an "id" attribute matching the known  IDs  of  proper  unique  identifiers  suitable  for  a
       package-id (also case-insensitive).  Returns undef otherwise.

       Croaks if fed no argument, or fed an argument that isn't a twig element.

       Intended to be used as a twig search condition.

       Example

        my @elements = $ebook->twigroot->descendants(\&twigelt_is_knownuid);

   "usedir($dir)"
       Changes the current working directory to the one specified, creating it if necessary.

       Returns  the  current  working  directory  before  the change.  If no directory is specified, returns the
       current working directory without changing anything.

       Croaks on any failure.

   "userconfigdir()"
       Returns the directory in which user configuration files and helper programs are  expected  to  be  found,
       creating that directory if it does not exist.  Typically, this directory is "$ENV{HOME}/.ebooktools", but
       on      MSWin32      systems      if      that      directory      does      not      already      exist,
       "$ENV{USERPROFILE}/ApplicationData/EBook-Tools" is returned (and potentially created) instead.

       If $ENV{HOME} (and $ENV{USERPROFILE} on MSWin32) are either not set or do not point to a  directory,  the
       sub returns undef.

   "ymd_validate($year,$month,$day)"
       Make sure month and day have valid values.  Return the passed values if they are, return 3 undefs if not.
       Testing of month or day can be skipped by passing undef in that spot.

BUGS AND LIMITATIONS

fix_links() could be improved to download remote URIs instead of ignoring them.

       •   fix_links() needs to check the <reference> links under <guide>

       •   fix_links()  needs  to  be  redone  with  HTML::TreeBuilder  or  Mojo::DOM to avoid the weakness with
           newlines between attribute names and values

       •   Need to implement fix_tours() that should collect the related elements and delete the parent if  none
           are found.  Empty <tours> elements aren't allowed.

       •   fix_languages() needs to convert language names into IANA language codes.

       •   set_language() should add a warning if the text isn't a valid IANA language code.

       •   NCX  generation  only  generates  from  the  spine.  It should be possible to use a TOC html file for
           generation instead.  In the long term, it should be possible to generate one  from  the  headers  and
           anchors in arbitrary HTML files.

       •   It  might be better to use sysread / index / substr / syswrite in &split_metadata to handle the split
           in 10k chunks, to avoid massive memory usage on large files.

           This may not be worth the effort, since the average size for most books is less than  500k,  and  the
           largest books are rarely over 10M.

       •   The  only  generator is currently for .epub books.  PDF, PalmDoc, Mobipocket, Plucker, and iSiloX are
           eventually planned.

       •   Although I like keeping warnings associated with  the  ebook  object,  it  may  be  better  to  throw
           exceptions on errors and catch them later.  This probably won't be implemented until it bites someone
           who complains, though.

       •   Unit tests are incomplete

AUTHOR

       Zed Pobre <zed@debian.org>

LICENSE AND COPYRIGHT

       Copyright 2008-2013 Zed Pobre

       Licensed to the public under the terms of the GNU GPL, version 2

perl v5.24.1                                       2017-05-22                                  EBook::Tools(3pm)