Ubuntu Manpage: Iterator - A general-purpose iterator class.

Provided by: libiterator-perl_0.03+ds1-2_all

NAME

       Iterator - A general-purpose iterator class.

VERSION

       This documentation describes version 0.03 of Iterator.pm, October 10, 2005.

SYNOPSIS

        use Iterator;

        # Making your own iterators from scratch:
        $iterator = Iterator->new ( sub { code } );

        # Accessing an iterator's values in turn:
        $next_value = $iterator->value();

        # Is the iterator out of values?
        $boolean = $iterator->is_exhausted();
        $boolean = $iterator->isnt_exhausted();

        # Within {code}, above:
        Iterator::is_done();    # to signal end of sequence.

DESCRIPTION

       This module is meant to be the definitive implementation of iterators, as popularized by Mark Jason
       Dominus's lectures and recent book (Higher Order Perl, Morgan Kauffman, 2005).

       An "iterator" is an object, represented as a code block that generates the "next value" of a sequence,
       and generally implemented as a closure.  When you need a value to operate on, you pull it from the
       iterator.  If it depends on other iterators, it pulls values from them when it needs to.  Iterators can
       be chained together (see Iterator::Util for functions that help you do just that), queueing up work to be
       done but not actually doing it until a value is needed at the front end of the chain.  At that time, one
       data value is pulled through the chain.

       Contrast this with ordinary array processing, where you load or compute all of the input values at once,
       then loop over them in memory.  It's analogous to the difference between looping over a file one line at
       a time, and reading the entire file into an array of lines before operating on it.

       Iterator.pm provides a class that simplifies creation and use of these iterator objects.  Other
       "Iterator::" modules (see "SEE ALSO") provide many general-purpose and special-purpose iterator
       functions.

       Some iterators are infinite (that is, they generate infinite sequences), and some are finite.  When the
       end of a finite sequence is reached, the iterator code block should throw an exception of the type
       "Iterator::X::Am_Now_Exhausted"; this is usually done via the "is_done" function..  This will signal the
       Iterator class to mark the object as exhausted.  The "is_exhausted" method will then return true, and the
       "isnt_exhausted" method will return false.  Any further calls to the "value" method will throw an
       exception of the type "Iterator::X::Exhausted".  See "DIAGNOSTICS".

       Note that in many, many cases, you will not need to explicitly create an iterator; there are plenty of
       iterator generation and manipulation functions in the other associated modules.  You can just plug them
       together like building blocks.

METHODS

       new
            $iter = Iterator->new( sub { code } );

           Creates  a  new  iterator  object.   The  code  block that you provide will be invoked by the "value"
           method.  The code block should have some way of maintaining state, so that it knows how to return the
           next value of the sequence each time it is called.

           If the code is called after it has generated the last value in  its  sequence,  it  should  throw  an
           exception:

               Iterator::X::Am_Now_Exhausted->throw ();

           This very commonly needs to be done, so there is a convenience function for it:

               Iterator::is_done ();

       value
            $next_value = $iter->value ();

           Returns the next value in the iterator's sequence.  If "value" is called on an exhausted iterator, an
           "Iterator::X::Exhausted" exception is thrown.

           Note  that these iterators can only return scalar values.  If you need your iterator to return a list
           or hash, it will have to return an arrayref or hashref.

       is_exhausted
            $bool = $iter->is_exhausted ();

           Returns true if the iterator is exhausted.  In this state, any call to the iterator's "value"  method
           will throw an exception.

       isnt_exhausted
            $bool = $iter->isnt_exhausted ();

           Returns true if the iterator is not yet exhausted.

FUNCTION

       is_done
            Iterator::is_done();

           You  call this function after your iterator code has generated its last value.  See "TUTORIAL".  This
           is simply a convenience wrapper for

            Iterator::X::Am_Now_Exhausted->throw();

THINKING IN ITERATORS

       Typically, when people approach a problem that involves manipulating a bunch of data, their first thought
       is to load it all into memory, into an array, and work with it in-place.  If you're only dealing with one
       element at a time, this approach usually wastes memory needlessly.

       For example, one might get a list of files to operate on, and loop over it:

           my @files = fetch_file_list(....);
           foreach my $file (@files)
               ...
       If C<fetch_file_list> were modified to return an iterator instead of
       an array, the same code could look like this:

           my $file_iterator = fetch_file_list(...)
           while ($file_iterator->isnt_exhausted)
               ...

       The advantage here is that the whole list does not take up memory while each individual element is  being
       worked  on.   For a list of files, that's probably not a lot of overhead.  For the contents of a file, on
       the other hand, it could be huge.

       If a function requires a list of items as its input, the overhead is tripled:

           sub myfunc
           {
               my @things = @_;
               ...

       Now in addition to the array in the calling code, Perl must copy that array to @_, and then copy it again
       to @things.  If you need to massage the input from somewhere, it gets even worse:

           my @data = get_things_from_somewhere();
           my @filtered_data = grep {code} @data;
           my @transformed_data = map {code} @filtered_data;
           myfunc (@transformed_data);

       If "myfunc" is rewritten to use an Iterator instead of an array, things become much simpler:

           my $data = ilist (get_things_from_somewhere());
           $filtered_data = igrep {code} $data;
           $transformed_data = imap {code} $filtered_data;
           myfunc ($transformed_data);

       (This example assumes that the "get_things_from_somewhere" function  cannot  be  modified  to  return  an
       Iterator.   If  it  can, so much the better!)  Now the original list is still in memory, inside the $data
       Iterator, but everwhere else, there is only one data element in memory at a time.

       Another advantage of Iterators is that they're homogeneous.  This is useful for uncoupling  library  code
       from application code.  Suppose you have a library function that grabs data from a filehandle:

           sub my_lib_func
           {
               my $fh = shift;
               ...

       If  you  need "my_lib_func" to get its data from a different source, you must either modify it, or make a
       new copy of it that gets its input differently, or you must jump through hoops  to  make  the  new  input
       stream look like a Perl filehandle.

       On the other hand, if "my_lib_func" accepts an iterator, then you can pass it data from a filehandle:

           my $data = ifile "my_input.txt";
           $result = my_lib_func($data);

       Or a database handle:

           my $data = imap {$_->{IMPORTANT_COLUMN}}
                      idb_rows($dbh, 'select IMPORTANT_COLUMN from foo');
           $result = my_lib_func($data);

       If you later decide you need to transform the data, or process only every 10th data row, or whatever:

           $result = my_lib_func(imap {magic($_)} $data);
           $result = my_lib_func(inth 10, $data);

       The library function doesn't care.  All it needs is an iterator.

       Chapter 4 of Dominus's book (See "SEE ALSO") covers this topic in some detail.

   Word of Warning
       When  you  use  an  iterator in separate parts of your program, or as an argument to the various iterator
       functions, you do not get a copy of the iterator's stream of values.

       In other words, if you grab a value from an iterator, then some other part of the program grabs  a  value
       from the same iterator, you will be getting different values.

       This can be confusing if you're not expecting it.  For example:

           my $it_one = Iterator->new ({something});
           my $it_two = some_iterator_transformation $it_one;
           my $value  = $it_two->value();
           my $whoops = $it_one->value;

       Here,  "some_iterator_transformation"  takes  an  iterator  as  an argument, and returns an iterator as a
       result.  When a value is fetched from $it_two, it internally grabs a value from $it_one  (and  presumably
       transforms it somehow).  If you then grab a value from $it_one, you'll get its second value (or third, or
       whatever, depending on how many values $it_two grabbed), not the first.

TUTORIAL

       Let's  create  a  date  iterator.  It'll take a DateTime object as a starting date, and return successive
       days -- that is, it'll add 1 day each iteration.  It would be used as follows:

        use DateTime;

        $iter = (...something...);
        $day1 = $iter->value;           # Initial date
        $day2 = $iter->value;           # One day later
        $day3 = $iter->value;           # Two days later

       The easiest way to create such an iterator is by using a  closure.   If  you're  not  familiar  with  the
       concept,  it's  fairly  simple: In Perl, the code within an anonymous block has access to all the lexical
       variables that were in scope at the time the block was created.   After  the  program  then  leaves  that
       lexical scope, those lexical variables remain accessible by that code block for as long as it exists.

       This  makes  it very easy to create iterators that maintain their own state.  Here we'll create a lexical
       scope by using a pair of braces:

        my $iter;
        {
           my $dt = DateTime->now();
           $iter = Iterator->new( sub
           {
               my $return_value = $dt->clone;
               $dt->add(days => 1);
               return $return_value;
           });
       }

       Because $dt is lexically scoped to the outermost block, it is not addressable from any code elsewhere  in
       the program.  But the anonymous block within the "new" method's parentheses can see $dt.  So $dt does not
       get garbage-collected as long as $iter contains a reference to it.

       The  code  within  the anonymous block is simple.  A copy of the current $dt is made, one day is added to
       $dt, and the copy is returned.

       You'll probably want to encapsulate the above block in a subroutine, so  that  you  could  call  it  from
       anywhere in your program:

        sub date_iterator
        {
            my $dt = DateTime->now();
            return Iterator->new( sub
            {
                my $return_value = $dt->clone;
                $dt->add(days => 1);
                return $return_value;
            });
        }

       If  you  look  at the source code in Iterator::Util, you'll see that just about all of the functions that
       create iterators look very similar to the above "date_iterator" function.

       Of course, you'd probably want to be able to pass arguments to  "date_iterator",  say  a  starting  date,
       maybe an increment other than "1 day".  But the basic idea is the same.

       The  above  date  iterator is an infinite (well, unbounded) iterator.  Let's look at how to indicate that
       your iterator has reached the end of its sequence of values.  Let's write a scaled-down version of irange
       from the Iterator::Util module -- one that takes a start value and an end value and always increments  by
       1.

        sub irange_limited
        {
            my ($start, $end) = @_;

            return Iterator->new (sub
            {
                Iterator::is_done
                    if $start > $end;

                return $start++;
            });
        }

       The iterator itself is very simple (this sort of thing gets to be easy once you get the hang of it).  The
       new  element  here  is  the  signalling  that  the  sequence  has ended, and the iterator's work is done.
       "is_done" is how your code indicates this to the Iterator object.

       You may also want to throw an exception if the user specified bad input parameters.  There are  a  couple
       ways you can do this.

            ...
            die "Too few parameters to irange_limited"  if @_ < 2;
            die "Too many parameters to irange_limited" if @_ > 2;
            my ($start, $end) = @_;
            ...

       This is the simplest way; you just use "die" (or "croak").  You may choose to throw an Iterator parameter
       error, though; this will make your function work more like one of Iterator.pm's built in functions:

            ...
            Iterator::X::Parameter_Error->throw(
                "Too few parameters to irange_limited")
                if @_ < 2;
            Iterator::X::Parameter_Error->throw(
                "Too many parameters to irange_limited")
                if @_ > 2;
            my ($start, $end) = @_;
            ...

EXPORTS

       No symbols are exported to the caller's namespace.

DIAGNOSTICS

       Iterator   uses   Exception::Class  objects  for  throwing  exceptions.   If  you're  not  familiar  with
       Exception::Class, don't worry; these exception objects work just like $@ does with "die" and "croak", but
       they are easier to work with if you are trapping errors.

       All exceptions thrown by Iterator have a base class of Iterator::X.  You can trap  errors  with  an  eval
       block:

        eval { $foo = $iterator->value(); };

       and then check for errors as follows:

        if (Iterator::X->caught())  {...

       You can look for more specific errors by looking at a more specific class:

        if (Iterator::X::Exhausted->caught())  {...

       Some exceptions may provide further information, which may be useful for your exception handling:

        if (my $ex = Iterator::X::User_Code_Error->caught())
        {
            my $exception = $ex->eval_error();
            ...

       If  you choose not to (or cannot) handle a particular type of exception (for example, there's not much to
       be done about a parameter error), you should rethrow the error:

        if (my $ex = Iterator::X->caught())
        {
            if ($ex->isa('Iterator::X::Something_Useful'))
            {
                ...
            }
            else
            {
                $ex->rethrow();
            }
        }

       •   Parameter Errors

           Class: "Iterator::X::Parameter_Error"

           You called an Iterator method with one or more bad parameters.  Since  this  is  almost  certainly  a
           coding error, there is probably not much use in handling this sort of exception.

           As a string, this exception provides a human-readable message about what the problem was.

       •   Exhausted Iterators

           Class: "Iterator::X::Exhausted"

           You  called  "value"  on  an  iterator  that  is  exhausted; that is, there are no more values in the
           sequence to return.

           As a string, this exception is "Iterator is exhausted."

       •   End of Sequence

           Class: "Iterator::X::Am_Now_Exhausted"

           This exception is not thrown directly by any Iterator.pm methods, but is to  be  thrown  by  iterator
           sequence  generation code; that is, the code that you pass to the "new" constructor.  Your code won't
           catch an "Am_Now_Exhausted" exception, because the Iterator object will catch it internally  and  set
           its "is_exhausted" flag.

           The simplest way to throw this exception is to use the "is_done" function:

            Iterator::is_done() if $something;

       •   User Code Exceptions

           Class: "Iterator::X::User_Code_Error"

           This  exception  is  thrown  when  the  sequence  generation  code  throws  any sort of error besides
           "Am_Now_Exhausted".  This could be because your code explicitly threw an error (that is, "die"d),  or
           because it otherwise encountered an exception (any runtime error).

           This  exception  has  one method, "eval_error", which returns the original $@ that was trapped by the
           Iterator object.  This may be a string or an object, depending on how "die" was invoked.

           As a string, this exception evaluates to the stringified $@.

       •   I/O Errors

           Class: "Iterator::X::IO_Error"

           This exception is thrown when any sort of I/O error occurs; this only  happens  with  the  filesystem
           iterators.

           This  exception  has  one  method,  "os_error", which returns the original $! that was trapped by the
           Iterator object.

           As a string, this exception provides some human-readable information along with $!.

       •   Internal Errors

           Class: "Iterator::X::Internal_Error"

           Something happened that I thought couldn't possibly happen.  I would appreciate it if you could  send
           me an email message detailing the circumstances of the error.

REQUIREMENTS

       Requires the following additional module:

       Exception::Class, v1.21 or later.

THANKS

       Much  thanks  to  Will  Coleda and Paul Lalli (and the RPI lily crowd in general) for suggestions for the
       pre-release version.

AUTHOR / COPYRIGHT

       Eric J. Roode, roode@cpan.org

       Copyright (c) 2005 by Eric J. Roode.  All Rights  Reserved.   This  module  is  free  software;  you  can
       redistribute it and/or modify it under the same terms as Perl itself.

       To  avoid my spam filter, please include "Perl", "module", or this module's name in the message's subject
       line, and/or GPG-sign your message.

perl v5.34.0                                       2022-06-15                                      Iterator(3pm)