Ubuntu Manpage: EBook::Tools::Mobipocket - Palm::PDB handler for manipulating the Mobipocket format.

Provided by: libebook-tools-perl_0.5.4-1.3_amd64

NAME

       EBook::Tools::Mobipocket - Palm::PDB handler for manipulating the Mobipocket format.

SYNOPSIS

        use EBook::Tools::Mobipocket qw(:all);
        my $mobi = EBook::Tools::Mobipocket->new();
        $mobi->Load('filename.prc');
        print "Title: ",$mobi->{title},"\n";
        print "Author: ",$mobi->{header}{exth}{author},"\n";
        print "Language: ",$mobi->{header}{mobi}{language},"\n";

        my $mobigen = find_mobigen();
        system_mobigen('myfile.opf');

DEPENDENCIES

       •   "Bit::Vector"

       •   "Compress::Zlib"

       •   "HTML::Tree"

       •   "Image::Size"

       •   "List::MoreUtils"

       •   "P5-Palm"

       •   "String::CRC32"

CONSTRUCTOR

   "new()"
       Instantiates a new Ebook::Tools::Mobipocket object.

ACCESSOR METHODS

   "drm()"
       Returns  1  if the "drmoffset" header value is neither 0 nor 0xffffffff.  Returns undef if "drmoffset" is
       undefined. Returns 0 otherwise.

   "text()"
       Returns the text of the file

   "write_images()"
       Writes each image record to the disk.

       Returns the number of images written.

   "write_text($filename)"
       Writes the book text to disk with the given filename.  This filename must match  the  filename  given  to
       "fix_html()" for the internal links to be consistent.

       Croaks if $filename is not specified.

       Returns 1 on success, or undef if there was no text to write.

   "write_unknown_records()"
       Writes each unidentified record to disk with a filename in the format of 'raw-record-####', where #### is
       the record number (not the record ID).

       Returns the number of records written.

MODIFIER METHODS

       These  methods  have  two naming/capitalization schemes -- methods directly related to the subclassing of
       Palm::PDB use its MethodName capitalization style.  Any other methods are lowercase_with_underscores  for
       consistency with the rest of EBook::Tools.

   "Load($filename)"
       Sets   "$self->{filename}"   and  then  loads  and  parses  the  file  specified  by  $filename,  calling
       "ParseRecord(%record)" on every record found.

       If DictionaryHuffman compression is detected, text records will be left untouched during the  ParseRecord
       pass,  and  "uncompress_dictionaryhuffman_records()"  will  be  called  after the initial parsing pass is
       complete.

   "ParseRecord(%record)"
       Parses PDB records, updating the object  attributes.   This  method  is  called  automatically  on  every
       database record during "Load()".

   "ParseRecord0($data)"
       Parses  the  header  record  and  places  the parsed values into the hashref "$self->{header}{palm}", the
       hashref  "$self->{header}{mobi}",  and  "$self->{header}{exth}"  by   calling   "parse_palmdoc_header()",
       "parse_mobi_header()", and "parse_mobi_exth()" respectively.

   "ParseRecordCDIC(\$data)"
       Parses a CDIC record.  Takes as a sole argument a reference to the data of the record.

       Record format

       •   Offset 0: Record identifier

           4 bytes, always 'CDIC'

       •   Offset 4: Header length

           4 bytes, big-endian long int, always = 16

       •   Offset 8: Index count

           4  bytes,  big-endian  long  int, marks the number of big-endian short ints immediately following the
           header used as index points into the dictionary data

       •   Offset 12: Codelength

           4 bytes, big-endian long int, number of code bits

       •   Offset 16: Indexes

           A number of big-endian short ints used as index points into the dictionary data

       •   Offset ??: Dictionary data

           Dictionary result strings immediately following the indexes

   "ParseRecordHUFF(\$data)"
       Parses a HUFF record.  Takes as a sole argument a reference to the data of the record.

       Record format

       •   Offset 0: Record identifier

           4 bytes, always 'HUFF'

       •   Offset 4: Header length

           4 bytes, big-endian long int, always = 24

       •   Offset 8: Cache table (big-endian) offset

           4 bytes, big-endian long int, always = 24

       •   Offset 12: Base table (big-endian) offset

           4 bytes, big-endian long int, always = 1048

       •   Offset 16: Cache table (little-endian) offset

           4 bytes, big-endian long int, always = 1304

       •   Offset 20: Base table (little-endian) offset

           4 bytes, big-endian long int, always = 2328

       •   Offset 24: Cache table (big-endian)

           1024 bytes, 256 big-endian long ints

           This is a look up table for the length and decoding of short codewords.  If the codeword  represented
           by  the 8 bits is unique, then bit 7 (0x80) will be set, and the low 5 bits are the length in bits of
           the code.  The high three bytes partially represent the final symbol.

           If bit 7 is clear, then the code is looked up in the base table

       •   Offset 1048: Base table (big-endian)

           256 bytes, 64 big-endian long ints

           This is where the codeword is looked up if it isn't found in the cache table.

       •   Offset 1304: Cache table (little-endian)

           1024 bytes, 256 little-endian long ints.

           This contains exactly the same data as in the cache table at offset 24, except that all of the values
           are stored in little-endian format instead of big-endian.

           Presumably this is for a speed advantage on slow little-endian processors.  This module uses only the
           big-endian tables.

       •   Offset 2328: Base table (little-endian)

           256 bytes, 64 little-endian long ints

           This contains exactly the same data as in the base table at offset  1048,  except  that  all  of  the
           values are stored in little-endian format instead of big-endian.

           Presumably this is for a speed advantage on slow little-endian processors.  This module uses only the
           big-endian tables.

   "ParseRecordImage(\$dataref)"
       Parses  image  records,  updating  object  attributes,  most  notably  adding  the image data to the hash
       "$self->{imagedata}",  adding  the  image  filename   to   "$self->{recindexlinks}",   and   incrementing
       "$self->{recindex}".

       Takes as an argument a reference to the record data.  Croaks if it isn't provided, or isn't a reference.

       This is called automatically by "ParseRecord()" and "ParseResource()" as needed.

   "ParseRecordText(\$dataref)"
       Parses  text  records, updating object attributes, most notably appending text to "$self->{text}".  Takes
       as an argument a reference to the record data.

       This is called automatically by "ParseRecord()" and "ParseResource()" as needed.

   fix_html(%args)
       Takes raw Mobipocket text and replaces the custom tags and file position anchors

       Arguments

       •   "filename"

           The name of the output HTML file (used in generating hrefs).  The procedure croaks  if  this  is  not
           supplied.

       •   "nonewlines" (optional)

           If this is set to true, the procedure will not attempt to insert newlines for readability.  This will
           leave  the output in a single unreadable line, but has the advantage of reducing the processing time,
           especially useful if tidy is going to be run on the output anyway.

   "fix_html_filepos()"
       Takes the raw HTML text of the object and replaces the filepos anchors.  This has to be called before any
       other action that modifies the text, or the filepos positions will not be valid.

       Returns 1 if successful, undef if there was no text to fix.

       This is called automatically by "fix_html()".

   "uncompress_dictionaryhuffman_records()"
       Uncompresses all  text  records  using  "uncompress_dictionaryhuffman()".   This  destroys  the  existing
       contents of $self->{text} if any.

       This method is called automatically at the end of "Load()" if DictionaryHuffman encoding is detected.

PROCEDURES

All procedures are exportable, but none are exported by default. All procedures can be exported by using
the ":all" tag.

"find_mobidedrm()"
Attempts to locate a copy of the MobiDeDrm script by searching PATH and looking in the EBook::Tools user
configuration directory (see "userconfigdir()" in EBook::Tools.

Returns the complete path to the script, or undef if nothing was found.

This will use package variable $mobidedrm_cmd as its first guess, and set that variable to the return
value as well.

"find_mobigen()"
Attempts to locate the mobigen executable by making a test execution on predicted locations (including
just checking PATH) and looking in the EBook::Tools user configuration directory (see "userconfigdir()"
in EBook::Tools.

Returns the system command used for a successful invocation, or undef if nothing worked.

This will use package variable $mobigen_cmd as its first guess, and set that variable to the return value
as well.

"parse_mobi_exth($headerdata)"
Takes as an argument a scalar containing the variable-length Mobipocket EXTH data from the first record.
Returns an array of hashes, each hash containing the data from one EXTH record with values from that data
keyed to recognizable names.

If $headerdata doesn't appear to be an EXTH header, carps a warning and returns an empty list.

See:

http://wiki.mobileread.com/wiki/MOBI

Hash keys

• "type"

A numeric value indicating the type of EXTH data in the record. See package variable %exthtypes.

• "length"

The length of the "data" value in bytes

• "data"

The data of the record.

parse_mobi_header($headerdata)
Takes as an argument a scalar containing the variable-length Mobipocket-specific header data from the
first record. Returns a hash containing values from that data keyed to recognizable names.

See:

http://wiki.mobileread.com/wiki/MOBI

keys

The returned hash will have the following keys (documented in the order in which they are encountered in
the header):

"identifier"
This should always be the string 'MOBI'. If it isn't, the procedure croaks.

"headerlength"
This is the size of the complete header. If this value is different from the length of the argument,
the procedure croaks.

"type"
A numeric code indicating what category of Mobipocket file this is.

"encoding"
A numeric code representing the encoding. Expected values are '1252' (for Windows-1252) and '65001
(for UTF-8).

The procedure carps a warning if an unexpected value is encountered.

"uniqueid"
This is thought to be a unique ID for the book, but its actual use is unknown.