Ubuntu Manpage: CAM::PDF - PDF manipulation library

NAME

       CAM::PDF - PDF manipulation library

LICENSE

       Copyright 2002-2006 Clotho Advanced Media, Inc., <http://www.clotho.com/>

       Copyright 2007-2008 Chris Dolan

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl
       itself.

SYNOPSIS

           use CAM::PDF;

           my $pdf = CAM::PDF->new('test1.pdf');

           my $page1 = $pdf->getPageContent(1);
           [ ... mess with page ... ]
           $pdf->setPageContent(1, $page1);
           [ ... create some new content ... ]
           $pdf->appendPageContent(1, $newcontent);

           my $anotherpdf = CAM::PDF->new('test2.pdf');
           $pdf->appendPDF($anotherpdf);

           my @prefs = $pdf->getPrefs();
           $prefs[$CAM::PDF::PREF_OPASS] = 'mypassword';
           $prefs[$CAM::PDF::PREF_UPASS] = 'mypassword';
           $pdf->setPrefs(@prefs);

           $pdf->cleanoutput('out1.pdf');
           print $pdf->toPDF();

       Many example programs are included in this distribution to do useful tasks.  See the "bin" subdirectory.

DESCRIPTION

       This package reads and writes any document that conforms to the PDF specification generously provided by
       Adobe at <http://partners.adobe.com/public/developer/pdf/index_reference.html> (link last checked Oct
       2005).

       The file format through PDF 1.5 is well-supported, with the exception of the "linearized" or "optimized"
       output format, which this module can read but not write.  Many specific aspects of the document model are
       not manipulable with this package (like fonts), but if the input document is correctly written, then this
       module will preserve the model integrity.

       The PDF writing feature saves as PDF 1.4-compatible.  That means that we cannot write compressed object
       streams.  The consequence is that reading and then writing a PDF 1.5+ document may enlarge the resulting
       file by a fair margin.

       This library grants you some power over the PDF security model.  Note that applications editing PDF
       documents via this library MUST respect the security preferences of the document.  Any violation of this
       respect is contrary to Adobe's intellectual property position, as stated in the reference manual at the
       above URL.

       Technical detail regarding corrupt PDFs: This library adheres strictly to the PDF specification.  Adobe's
       Acrobat Reader is more lenient, allowing some corrupted PDFs to be viewable.  Therefore, it is possible
       that some PDFs may be readable by Acrobat that are illegible to this library.  In particular, files which
       have had line endings converted to or from DOS/Windows style (i.e. CR-NL) may be rendered unusable even
       though Acrobat does not complain.  Future library versions may relax the parser, but not yet.

API

   Functions intended to be used externally
        $self = CAM::PDF->new(content | filename | '-')
        $self->toPDF()
        $self->needsSave()
        $self->save()
        $self->cleansave()
        $self->output(filename | '-')
        $self->cleanoutput(filename | '-')
        $self->previousRevision()
        $self->allRevisions()
        $self->preserveOrder()
        $self->appendObject(olddoc, oldnum, [follow=(1|0)])
        $self->replaceObject(newnum, olddoc, oldnum, [follow=(1|0)])
           (olddoc can be undef in the above for adding new objects)
        $self->numPages()
        $self->getPageText(pagenum)
        $self->getPageDimensions(pagenum)
        $self->getPageContent(pagenum)
        $self->setPageContent(pagenum, content)
        $self->appendPageContent(pagenum, content)
        $self->deletePage(pagenum)
        $self->deletePages(pagenum, pagenum, ...)
        $self->extractPages(pagenum, pagenum, ...)
        $self->appendPDF(CAM::PDF object)
        $self->prependPDF(CAM::PDF object)
        $self->wrapString(string, width, fontsize, page, fontlabel)
        $self->getFontNames(pagenum)
        $self->addFont(page, fontname, fontlabel, [fontmetrics])
        $self->deEmbedFont(page, fontname, [newfontname])
        $self->deEmbedFontByBaseName(page, basename, [newfont])
        $self->getPrefs()
        $self->setPrefs()
        $self->canPrint()
        $self->canModify()
        $self->canCopy()
        $self->canAdd()
        $self->getFormFieldList()
        $self->fillFormFields(fieldname, value, [fieldname, value, ...])
          or $self->fillFormFields(%values)
        $self->clearFormFieldTriggers(fieldname, fieldname, ...)

       Note: 'clean' as in cleansave() and cleanobject() means write a fresh PDF document.  The alternative
       (e.g. save()) reuses the existing doc and just appends to it.  Also note that 'clean' functions sort the
       objects numerically.  If you prefer that the new PDF docs more closely resemble the old ones, call
       preserveOrder() before cleansave() or cleanobject().

   Slightly less external, but useful, functions
        $self->toString()
        $self->getPage(pagenum)
        $self->getFont(pagenum, fontname)
        $self->getFonts(pagenum)
        $self->getStringWidth(fontdict, string)
        $self->getFormField(fieldname)
        $self->getFormFieldDict(object)
        $self->isLinearized()
        $self->decodeObject(objectnum)
        $self->decodeAll(any-node)
        $self->decodeOne(dict-node)
        $self->encodeObject(objectnum, filter)
        $self->encodeOne(any-node, filter)
        $self->changeString(obj-node, hashref)

   Deeper utilities
        $self->pageAddName(pagenum, name, objectnum)
        $self->getPageObjnum(pagenum)
        $self->getPropertyNames(pagenum)
        $self->getProperty(pagenum, propname)
        $self->getValue(any-node)
        $self->dereference(objectnum)  or $self->dereference(name,pagenum)
        $self->deleteObject(objectnum)
        $self->copyObject(obj-node)
        $self->cacheObjects()
        $self->setObjNum(obj-node, num)
        $self->getRefList(obj-node)
        $self->changeRefKeys(obj-node, hashref)

   More rarely needed utilities
        $self->getObjValue(objectnum)

   Routines that should not be called
        $self->_startdoc()
        $self->delinearlize()
        $self->build*()
        $self->parse*()
        $self->write*()
        $self->*CB()
        $self->traverse()
        $self->fixDecode()
        $self->abbrevInlineImage()
        $self->unabbrevInlineImage()
        $self->cleanse()
        $self->clean()
        $self->createID()

FUNCTIONS

   Object creation/manipulation
       $doc->new($package, $content)
       $doc->new($package, $content, $ownerpass, $userpass)
       $doc->new($package, $content, $ownerpass, $userpass, $prompt)
       $doc->new($package, $content, $ownerpass, $userpass, $options)
           Instantiate  a new CAM::PDF object.  $content can be a document in a string, a filename, or '-'.  The
           latter indicates that the document should be read from standard input.  If the document  is  password
           protected,  the passwords should be passed as additional arguments.  If they are not known, a boolean
           $prompt argument allows the programmer to  suggest  that  the  constructor  prompt  the  user  for  a
           password.  This is rudimentary prompting: passwords are in the clear on the console.

           This  constructor  takes an optional final argument which is a hash reference.  This hash can contain
           any of the following optional parameters:

           prompt_for_password => $boolean
               This is the same as the $prompt argument described above.

           fault_tolerant => $boolean
               This flag causes the instance to be more lenient when reading the  input  PDF.   Currently,  this
               only affects PDFs which cannot be successfully decrypted.

       $doc->toPDF()
           Serializes the data structure as a PDF document stream and returns as in a scalar.

       $doc->toString()
           Returns a serialized representation of the data structure.  Implemented via Data::Dumper.

   Document reading
       (all of these functions are intended for internal only)

       $doc->getRootDict()
           Returns the Root dictionary for the PDF.

       $doc->getPagesDict()
           Returns the root Pages dictionary for the PDF.

       $doc->parseObj($string)
           Use parseAny() instead of this, if possible.

           Given  a  fragment  of PDF page content, parse it and return an object Node.  This can be called as a
           class method in most circumstances, but is intended as an instance method.

       $doc->parseInlineImage($string)
       $doc->parseInlineImage($string, $objnum)
       $doc->parseInlineImage($string, $objnum, $gennum)
           Given a fragment of PDF page content, parse it and return an object Node.  This can be  called  as  a
           class method in some cases, but is intended as an instance method.

       $doc->writeInlineImage($objectnode)
           This is the inverse of parseInlineImage(), intended for use only in the CAM::PDF::Content class.

       $doc->parseStream($string, $objnum, $gennum, $dictnode)
           This should only be used by parseObj(), or other specialized cases.

           Given  a  fragment  of  PDF page content, parse it and return a stream Node.  This can be called as a
           class method in most circumstances, but is intended as an instance method.

           The dictionary Node argument is typically the body of the object Node that precedes this stream.

       $doc->parseDict($string)
       $doc->parseDict($string, $objnum)
       $doc->parseDict($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given a fragment of PDF page content, parse it and return an dictionary Node.  This can be called  as
           a class method in most circumstances, but is intended as an instance method.

       $doc->parseArray($string)
       $doc->parseArray($string, $objnum)
       $doc->parseArray($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given  a  fragment  of  PDF page content, parse it and return an array Node.  This can be called as a
           class or instance method.

       $doc->parseLabel($string)
       $doc->parseLabel($string, $objnum)
       $doc->parseLabel($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given a fragment of PDF page content, parse it and return a label Node.  This  can  be  called  as  a
           class or instance method.

       $doc->parseRef($string)
       $doc->parseRef($string, $objnum)
       $doc->parseRef($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given  a fragment of PDF page content, parse it and return a reference Node.  This can be called as a
           class or instance method.

       $doc->parseNum($string)
       $doc->parseNum($string, $objnum)
       $doc->parseNum($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given a fragment of PDF page content, parse it and return a number Node.  This can  be  called  as  a
           class or instance method.

       $doc->parseString($string)
       $doc->parseString($string, $objnum)
       $doc->parseString($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given  a  fragment  of  PDF page content, parse it and return a string Node.  This can be called as a
           class or instance method.

       $doc->parseHexString($string)
       $doc->parseHexString($string, $objnum)
       $doc->parseHexString($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given a fragment of PDF page content, parse it and return a hex string Node.  This can be called as a
           class or instance method.

       $doc->parseBoolean($string)
       $doc->parseBoolean($string, $objnum)
       $doc->parseBoolean($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given a fragment of PDF page content, parse it and return a boolean Node.  This can be  called  as  a
           class or instance method.

       $doc->parseNull($string)
       $doc->parseNull($string, $objnum)
       $doc->parseNull($string, $objnum, $gennum)
           Use parseAny() instead of this, if possible.

           Given a fragment of PDF page content, parse it and return a null Node.  This can be called as a class
           or instance method.

       $doc->parseAny($string)
       $doc->parseAny($string, $objnum)
       $doc->parseAny($string, $objnum, $gennum)
           Given  a  fragment of PDF page content, parse it and return a Node of the appropriate type.  This can
           be called as a class or instance method.

   Data Accessors
       $doc->getValue($object)
           For INTERNAL use

           Dereference a data object, return a value.  Given an node object of  any  kind,  returns  raw  scalar
           object:  hashref,  arrayref, string, number.  This function follows all references, and descends into
           all objects.

       $doc->getObjValue($objectnum)
           For INTERNAL use

           Dereference a data object, and return a value.  Behaves just like the getValue() function,  but  used
           when all you know is the object number.

       $doc->dereference($objectnum)
       $doc->dereference($name, $pagenum)
           For INTERNAL use

           Dereference  a  data  object,  return  a  PDF object as a node.  This function makes heavy use of the
           internal object cache.  Most (if not all) object requests should go through this function.

           $name should look something like '/R12'.

       $doc->getPropertyNames($pagenum)
       $doc->getProperty($pagenum, $propertyname)
           Each PDF page contains a list of resources that it uses  (images,  fonts,  etc).   getPropertyNames()
           returns  an array of the names of those resources.  getProperty() returns a node representing a named
           property (most likely a reference node).

       $doc->getFont($pagenum, $fontname)
           For INTERNAL use

           Returns a dictionary for a given font identified by its label, referenced by page.

       $doc->getFontNames($pagenum)
           For INTERNAL use

           Returns a list of fonts for a given page.

       $doc->getFonts($pagenum)
           For INTERNAL use

           Returns an array of font objects for a given page.

       $doc->getFontByBaseName($pagenum, $fontname)
           For INTERNAL use

           Returns a dictionary for a given font, referenced by page and the name of the base font.

       $doc->getFontMetrics($properties $fontname)
           For INTERNAL use

           Returns a data structure representing the font metrics for the named font.  The property list is  the
           results of something like the following:

             $self->_buildNameTable($pagenum);
             my $properties = $self->{Names}->{$pagenum};

           Alternatively, if you know the page number, it might be easier to do:

             my $font = $self->dereference($fontlabel, $pagenum);
             my $fontmetrics = $font->{value}->{value};

           where  the  $fontlabel is something like '/Helv'.  The getFontMetrics() method is useful in the cases
           where you've forgotten which page number you are working  on  (e.g.  in  CAM::PDF::GS),  or  if  your
           property list isn't part of any page (e.g. working with form field annotation objects).

       $doc->addFont($pagenum, $fontname, $fontlabel)
       $doc->addFont($pagenum, $fontname, $fontlabel, $fontmetrics)
           Adds a reference to the specified font to the page.

           If  a font metrics hash is supplied (it is required for a font other than the 14 core fonts), then it
           is cloned and inserted into the new  font  structure.   Note  that  if  those  font  metrics  contain
           references  (e.g.  to  the "FontDescriptor"), the referred objects are not copied -- you must do that
           part yourself.

           For  Type1  fonts,  the  font  metrics  must  minimally  contain  the  following  fields:  "Subtype",
           "FirstChar", "LastChar", "Widths", "FontDescriptor".

       $doc->deEmbedFont($pagenum, $fontname)
       $doc->deEmbedFont($pagenum, $fontname, $basefont)
           Removes  embedded  font  data, leaving font reference intact.  Returns true if the font exists and 1)
           font is not embedded or 2) embedded data was successfully discarded.  Returns false if the font  does
           not exist, or the embedded data could not be discarded.

           The  optional  $basefont  parameter  allows  you  to  change  the  font.   This  is  useful when some
           applications embed a standard font (see below) and give it a funny name, like "SYLXNP+Helvetica".  In
           this example, it's important to change the  basename  back  to  the  standard  "Helvetica"  when  de-
           embedding.

           De-embedding the font does NOT remove it from the PDF document, it just removes references to it.  To
           get  a  size  reduction by throwing away unused font data, you should use the following code sometime
           after this method.

             $self->cleanse();

           For reference, the standard fonts are "Times-Roman", "Helvetica",  and  "Courier"  (and  their  bold,
           italic and bold-italic forms) plus "Symbol" and "Zapfdingbats". (Adobe PDF Reference v1.4, p.319)

       $doc->deEmbedFontByBaseName($pagenum, $fontname)
       $doc->deEmbedFontByBaseName($pagenum, $fontname, $basefont)
           Just  like  deEmbedFont(), except that the font name parameter refers to the name of the current base
           font instead of the PDF label for the font.

       $doc->wrapString($string, $width, $fontsize, $fontmetrics)
       $doc->wrapString($string, $width, $fontsize, $pagenum, $fontlabel)
           Returns an array of strings wrapped to the specified width.

       $doc->getStringWidth($fontmetrics, $string)
           For INTERNAL use

           Returns the width of the string, using the font metrics if possible.

       $doc->numPages()
           Returns the number of pages in the PDF document.

       $doc->getPage($pagenum)
           For INTERNAL use

           Returns a dictionary for a given numbered page.

       $doc->getPageObjnum($pagenum)
           For INTERNAL use

           Return the number of the PDF object in which the specified page occurs.

       $doc->getPageText($pagenum)
           Extracts the text from a PDF page as a string.

       $doc->getPageContentTree($pagenum)
           Retrieves a parsed page content data structure, or undef if there is a syntax error or  if  the  page
           does not exist.

       $doc->getPageContent($pagenum)
           Return a string with the layout contents of one page.

       $doc->getPageDimensions($pagenum)
           Returns  an  array  of  "x",  "y",  "width"  and  "height"  numbers that define the dimensions of the
           specified page in points (1/72 inches).   Technically,  this  is  the  "MediaBox"  dimensions,  which
           explains why it's possible for "x" and "y" to be non-zero, but that's a rare case.

           For example, given a simple 8.5 by 11 inch page, this method will return "(0,0,612,792)".

           This method will die() if the specified page number does not exist.

       $doc->getName($object)
           For INTERNAL use

           Given  a  PDF  object  reference,  return  it's  name,  if  it  has one.  This is useful for indirect
           references to images in particular.

       $doc->getPrefs()
           Return an array of security information for the document:

             owner password
             user password
             print boolean
             modify boolean
             copy boolean
             add boolean

           See the PDF reference for the intended use of the latter four booleans.

           This module publishes the array indices of these values for your convenience:

             $CAM::PDF::PREF_OPASS
             $CAM::PDF::PREF_UPASS
             $CAM::PDF::PREF_PRINT
             $CAM::PDF::PREF_MODIFY
             $CAM::PDF::PREF_COPY
             $CAM::PDF::PREF_ADD

           So, you can retrieve the value of the Copy boolean via:

             my ($canCopy) = ($self->getPrefs())[$CAM::PDF::PREF_COPY];

       $doc->canPrint()
           Return a boolean indicating whether the Print permission is enabled on the PDF.

       $doc->canModify()
           Return a boolean indicating whether the Modify permission is enabled on the PDF.

       $doc->canCopy()
           Return a boolean indicating whether the Copy permission is enabled on the PDF.

       $doc->canAdd()
           Return a boolean indicating whether the Add permission is enabled on the PDF.

       $doc->getFormFieldList()
           Return an array of the names of all of the PDF form fields.  The  names  are  the  full  hierarchical
           names  constructed  as  explained  in  the  PDF  reference  manual.   These  names are useful for the
           fillFormFields() function.

       $doc->getFormField($name)
           For INTERNAL use

           Return the object containing the form field definition for the specified field name.   $name  can  be
           either the full name or the "short/alternate" name.

       $doc->getFormFieldDict($formfieldobject)
           For INTERNAL use

           Return a hash reference representing the accumulated property list for a form field, including all of
           it's  inherited  properties.   This  should  be  treated  as a read-only hash!  It ONLY retrieves the
           properties it knows about.

   Data/Object Manipulation
       $doc->setPrefs($ownerpass, $userpass, $print?, $modify?, $copy?, $add?)
           Alter the document's security information.   Note  that  modifying  these  parameters  must  be  done
           respecting  the  intellectual  property  of  the  original  document.   See  Adobe's statement in the
           introduction of the reference manual.

           Important Note: Most PDF readers (Acrobat, Preview.app) only offer one  password  field  for  opening
           documents.   So,  if  the  $ownerpass and $userpass are different, those applications cannot read the
           documents.  (Perhaps this is a bug in CAM::PDF?)

           Note: any omitted booleans default to false.  So, these two are equivalent:

               $doc->setPrefs('password', 'password');
               $doc->setPrefs('password', 'password', 0, 0, 0, 0);

       $doc->setName($object, $name)
           For INTERNAL use

           Change the name of a PDF object structure.

       $doc->removeName($object)
           For INTERNAL use

           Delete the name of a PDF object structure.

       $doc->pageAddName($pagenum, $name, $objectnum)
           For INTERNAL use

           Append a named object to the metadata for a given page.

       $doc->setPageContent($pagenum, $content)
       $doc->setPageContent($pagenum, $tree->toString)
           Replace the content of the specified page with a new version.  This function is often used after  the
           getPageContent() function and some manipulation of the returned string from that function.

           If your content is a parsed tree (i.e. the result of getPageContentTree) then you should serialize it
           via toString first.

       $doc->appendPageContent($pagenum, $content)
           Add  more  content to the specified page.  Note that this function does NOT do any page metadata work
           for you (like creating font objects for any newly defined fonts).

       $doc->extractPages($pages...)
           Remove all pages from the PDF except the specified  ones.   Like  deletePages(),  the  pages  can  be
           multiple arguments, comma separated lists, ranges (open or closed).

       $doc->deletePages($pages...)
           Remove the specified pages from the PDF.  The pages can be multiple arguments, comma separated lists,
           ranges (open or closed).

       $doc->deletePage($pagenum)
           Remove the specified page from the PDF.  If the PDF has only one page, this method will fail.

       $doc->decachePages($pagenum, $pagenum, ...)
           Clears  cached copies of the specified page data structures.  This is useful if an operation has been
           performed that changes a page.

       $doc->addPageResources($pagenum, $resourcehash)
           Add the resources from the given object to the page resource dictionary.  If the page does not have a
           resource dictionary, create one.  This function avoids duplicating resources where feasible.

       $doc->appendPDF($pdf)
           Append pages from another PDF document to this one.  No optimization is done -- the pieces  are  just
           appended and the internal table of contents is updated.

           Note that this can break documents with annotations.  See the appendpdf.pl script for a workaround.

       $doc->prependPDF($pdf)
           Just like appendPDF() except the new document is inserted on page 1 instead of at the end.

       $doc->duplicatePage($pagenum)
       $doc->duplicatePage($pagenum, $leaveblank)
           Inserts  an  identical  copy  of the specified page into the document.  The new page's number will be
           "$pagenum + 1".

           If $leaveblank is true, the new page does not get any content.  Thus, the document  is  broken  until
           you subsequently call setPageContent().

       $doc->createStreamObject($content)
       $doc->createStreamObject($content, $filter ...)
           For INTERNAL use

           Create  a  new  Stream  object.   This  object  is NOT added to the document.  Use the appendObject()
           function to do that after calling this function.

       $doc->uninlineImages()
       $doc->uninlineImages($pagenum)
           Search the content of the specified page (or all pages if the page number is  omitted)  for  embedded
           images.   If  there  are  any, replace them with indirect objects.  This procedure uses heuristics to
           detect in-line images, and is subject to confusion in extremely rare cases of text that uses "BI" and
           "ID" a lot.

       $doc->appendObject($doc, $objectnum, $recurse?)
       $doc->appendObject($undef, $object, $recurse?)
           Duplicate an object from another PDF document and add it to this document, optionally descending into
           the object and copying any other objects it references.

           Like replaceObject(), the second form allows you to append a newly-created block to the PDF.

       $doc->replaceObject($objectnum, $doc, $objectnum, $recurse?)
       $doc->replaceObject($objectnum, $undef, $object)
           Duplicate an object from another PDF document and insert it into this document, replacing an existing
           object.  Optionally descend into the original object and copy any other objects it references.

           If the other document is undefined, then the object to copy is taken to be an anonymous  object  that
           is not part of any other document.  This is useful when you've just created that anonymous object.

       $doc->deleteObject($objectnum)
           Remove an object from the document.  This function does NOT take care of dependencies on this object.

       $doc->cleanse()
           Remove  unused  objects.  WARNING: this function breaks some PDF documents because it removes objects
           that are strictly part of the page model hierarchy, but which are required  anyway  (like  some  font
           definition objects).

       $doc->createID()
           For INTERNAL use

           Generate a new document ID.  Contrary the Adobe recommendation, this is a random number.

       $doc->fillFormFields($name => $value, ...)
       $doc->fillFormFields($opts_hash, $name => $value, ...)
           Set  the  default  values  of  PDF form fields.  The name should be the full hierarchical name of the
           field as output by the getFormFieldList() function.  The argument list can be a hash if you like.   A
           simple way to use this function is something like this:

               my %fields = (fname => 'John', lname => 'Smith', state => 'WI');
               $field{zip} = 53703;
               $self->fillFormFields(%fields);

           If  the first argument is a hash reference, it is interpreted as options for how to render the filled
           data:

           background_color =< 'none' | $gray | [$r, $g, $b]
               Specify the background color for the text field.

           max_autoscale_fontsize =< $size
           min_autoscale_fontsize =< $size
               If the form field is set to auto-size the text  to  fit,  then  you  may  use  these  options  to
               constrain  the limits of that autoscaling. Otherwise, for example, a very long string will become
               arbitrarily small to fit in the box.

       $doc->clearFormFieldTriggers($name, $name, ...)
           Disable any triggers set on data entry for the specified form field names.  This  is  useful  in  the
           case where, for example, the data entry Javascript forbids punctuation and you want to prefill with a
           hyphenated word.  If you don't clear the trigger, the prefill may not happen.

       $doc->clearAnnotations()
           Remove all annotations from the document.  If form fields are encountered, their text is added to the
           appropriate page.

       $doc->previousRevision()
           If this PDF was previously saved in append mode (that is, if "clean()" was not invoked on it), return
           a  new  instance representing that previous version.  Otherwise return void.  If this is an encrypted
           PDF, this method assumes that previous revisions were encrypted with the same password, which may  be
           an incorrect assumption.

       $doc->allRevisions()
           Accumulate  CAM::PDF  instances  returned  by  "previousRevision"  until  there  are no more previous
           revisions.  Returns a list of instances from newest to oldest including this instance as the newest.

   Document Writing
       $doc->preserveOrder()
           Try to recreate the original document as much as possible.  This may  help  in  recreating  documents
           which use undocumented tricks of saving font information in adjacent objects.

       $doc->isLinearized()
           Returns a boolean indicating whether this PDF is linearized (aka "optimized").

       $doc->delinearize()
           For INTERNAL use

           Undo  the  tweaks  used  to  make the document 'optimized'.  This function is automatically called on
           every save or output since this library does not yet support linearized documents.

       $doc->clean()
           Cache all parts of the document and throw away it's old structure.  This is useful for  writing  PDFs
           anew,  instead  of simply appending changes to the existing documents.  This is called by cleansave()
           and cleanoutput().

       $doc->needsSave()
           Returns a boolean indicating whether the save() method needs to be called.   Like  save(),  this  has
           nothing  to  do  with  whether  the  document  has  been  saved  to  disk,  but whether the in-memory
           representation of the document has been serialized.

       $doc->save()
           Serialize the document into a single string.  All changed document elements are normalized, and a new
           index and an updated trailer are created.

           This function operates solely in memory.  It DOES NOT write the document to a file.  See the output()
           function for that.

       $doc->cleansave()
           Call the clean() function, then call the save() function.

       $doc->output($filename)
       $doc->output()
           Save the document to a file.  The save() function is called first to serialize  the  data  structure.
           If no filename is specified, or if the filename is '-', the document is written to standard output.

           Note:  it  is  the  responsibility  of the application to ensure that the PDF document has either the
           Modify or Add permission.  You can do this like the following:

              if ($self->canModify()) {
                 $self->output($outfile);
              } else {
                 die "The PDF file denies permission to make modifications\n";
              }

       $doc->cleanoutput($file)
       $doc->cleanoutput()
           Call the clean() function, then call the output() function to write a fresh copy of the document to a
           file.

       $doc->writeObject($objnum)
           Return the serialization of the specified object.

       $doc->writeString($string)
           Return the serialization of the specified string.  Works on normal or hex strings.  If encryption  is
           desired, the string should be encrypted before being passed here.

       $doc->writeAny($node)
           Returns  the  serialization  of  the  specified  node.  This handles all Node types, including object
           Nodes.

   Document Traversing
       $doc->traverse($dereference, $node, $callbackfunc, $callbackdata)
           Recursive traversal of a PDF data structure.

           In many cases, it's useful to apply one action to every node in an object tree.  The  routines  below
           all  use  this  traverse()  function.   One  of  the  most  important  parameters  is  the first: the
           $dereference boolean.  If true, the traversal follows reference Nodes.  If false, it does not descend
           into reference Nodes.

           Optionally, you can pass in a hashref as a final  argument  to  reduce  redundant  traversing  across
           multiple calls.  Just pass in an empty hashref the first time and pass in the same hashref each time.
           See "changeRefKeys()" for an example.

       $doc->decodeObject($objectnum)
           For INTERNAL use

           Remove any filters (like compression, etc) from a data stream indicated by the object number.

       $doc->decodeAll($object)
           For INTERNAL use

           Remove any filters from any data stream in this object or any object referenced by it.

       $doc->decodeOne($object)
       $doc->decodeOne($object, $save?)
           For INTERNAL use

           Remove any filters from an object.  The boolean flag $save (defaults to false) indicates whether this
           removal should be permanent or just this once.  If true, the function returns success or failure.  If
           false, the function returns the defiltered content.

       $doc->fixDecode($streamdata, $filter, $params)
           This is a utility method to do any tweaking after removing the filter from a data stream.

       $doc->encodeObject($objectnum, $filter)
           Apply the specified filter to the object.

       $doc->encodeOne($object, $filter)
           Apply the specified filter to the object.

       $doc->setObjNum($object, $objectnum, $gennum)
           Descend  into  an object and change all of the INTERNAL object number flags to a new number.  This is
           just for consistency of internal accounting.

       $doc->getRefList($object)
           For INTERNAL use

           Return an array all of objects referred to in this object.

       $doc->changeRefKeys($object, $hashref)
           For INTERNAL use

           Renumber all references in an object.

       $doc->abbrevInlineImage($object)
           Contract all image keywords to inline abbreviations.

       $doc->unabbrevInlineImage($object)
           Expand all inline image abbreviations.

       $doc->changeString($object, $hashref)
           Alter all instances of a given string.  The hashref is a dictionary of from-string and to-string.  If
           the from-string looks like "regex(...)"  then it is interpreted as a Perl regular expression  and  is
           eval'ed.  Otherwise the search-and-replace is literal.

   Utility functions
       $doc->rangeToArray($min, $max, $list...)
           Converts string lists of numbers to an array.  For example,

               CAM::PDF->rangeToArray(1, 15, '1,3-5,12,9', '14-', '8 - 6, -2');

           becomes

               (1,3,4,5,12,9,14,15,8,7,6,1,2)

       $doc->trimstr($string)
           Used  solely  for  debugging.   Trims a string to a max of 40 characters, handling nulls and non-Unix
           line endings.

       $doc->copyObject($node)
           Clones a node via Data::Dumper and eval().

       $doc->cacheObjects()
           Parses all object Nodes and stores them in the cache.  This is useful for cases where you  intend  to
           do some global manipulation and want all of the data conveniently in RAM.

       $doc->asciify($string)
           Helper  class/instance  method  to  massage a string, cleaning up some non-ASCII problems.  This is a
           very incomplete list.  Specifically:

           f-i ligatures
           (R) symbol

COMPATIBILITY

       This library was primarily developed against the 3rd edition of the reference  (PDF  v1.4)  with  several
       important  updates  from  4th edition (PDF v1.5).  This library focuses most deeply on PDF v1.2 features.
       Nonetheless, it should be forward and backward compatible in the majority of cases.

PERFORMANCE

       This module is written with good  speed  and  flexibility  in  mind,  often  at  the  expense  of  memory
       consumption.   Entire  PDF  documents  are  typically  slurped  into  RAM.  As an example, simply calling
       "new('PDFReference15_v15.pdf')" (the 13.5 MB Adobe PDF Reference V1.5 document) pushes Perl to consume 89
       MB of RAM on my development machine.

INTERNALS

The data structure used to represent the PDF document is composed primarily of a hierarchy of Node
objects. Every node in the document tree has this structure:

type => <type>
value => <value>
objnum => <object number>
gennum => <generation number>

where the <value> depends on the <type>, and <type> is one of

Type Value
---- -----
object Node
stream byte string
string byte string
hexstring byte string
number number
reference integer (object number)
boolean "true" | "false"
label string
array arrayref of Nodes
dictionary hashref of (string => Node)
null undef

All of these except "stream" are directly related to the PDF data types of the same name. Streams are
treated as special cases in this library since the have a non-general syntax and placement in the
document body. Internally, streams are very much like strings, except that they have filters applied to
them.

All objects are referenced indirectly by their numbers, as defined in the PDF document. In all cases,
the dereference() function should be used to deserialize objects into their internal representation.
This function is also useful for looking up named objects in the page model metadata. Every node in the
hierarchy contains its object and generation number. You can think of this as a sort of a pointer back
to the root of each node tree. This serves in place of a "parent" link for every node, which would be
harder to maintain.

The PDF document itself is represented internally as a hash reference with many components, including the
document content, the document metadata (index, trailer and root node), the object cache, and several
other caches, in addition to a few assorted bookkeeping structures.

The core of the document is represented in the object cache, which is only populated as needed, thus
avoiding the overhead of parsing the whole document at read time.

AUTHOR

       Chris Dolan

       This module was originally developed by me at Clotho Advanced Media Inc.  Now I maintain it in  my  spare
       time.

ACKNOWLEDGMENTS

       Thanks to all the people who have submitted bug reports over the years!  I've belatedly started crediting
       people in the CHANGES file.  Apologies to contributors I've overlooked...

perl v5.36.0                                       2022-12-08                                      CAM::PDF(3pm)