Ubuntu Manpage: PDL::Parallel::threads - sharing PDL data between Perl threads

NAME

       PDL::Parallel::threads - sharing PDL data between Perl threads

SYNOPSIS

        use PDL;
        use PDL::Parallel::threads qw(retrieve_pdls share_pdls);

        # Technically, this is pulled in for you by PDL::Parallel::threads,
        # but using it in your code pulls in the named functions like async.
        use threads;

        # Also, technically, you can use PDL::Parallel::threads with
        # single-threaded programs, and even with perl's not compiled
        # with thread support.

        # Create some shared PDL data
        zeroes(1_000_000)->share_as('My::shared::data');

        # Create an ndarray and share its data
        my $test_data = sequence(100);
        share_pdls(some_name => $test_data);  # allows multiple at a time
        $test_data->share_as('some_name');    # or use the PDL method

        # Kick off some processing in the background
        async {
            my ($shallow_copy)
                = retrieve_pdls('some_name');

            # thread-local memory
            my $other_ndarray = sequence(20);

            # Modify the shared data:
            $shallow_copy++;
        };

        # ... do some other stuff ...

        # Rejoin all threads
        for my $thr (threads->list) {
            $thr->join;
        }

        use PDL::NiceSlice;
        print "First ten elements of test_data are ",
            $test_data(0:9), "\n";

DESCRIPTION

       This module provides a means to share PDL data between different Perl threads. In contrast to PDL's posix
       thread support (see PDL::ParallelCPU), this module lets you work with Perl's built-in threading model. In
       contrast to Perl's threads::shared, this module focuses on sharing data, not variables.

       Because this module focuses on sharing data, not variables, it does not use attributes to mark shared
       variables. Instead, you must explicitly share your data by using the "share_pdls" function or "share_as"
       PDL method that this module introduces. Those both associate a name with your data, which you use in
       other threads to retrieve the data with the "retrieve_pdls". Once your thread has access to the ndarray
       data, any modifications will operate directly on the shared memory, which is exactly what shared data is
       supposed to do.  When you are completely done using a piece of data, you need to explicitly remove the
       data from the shared pool with the "free_pdls" function.  Otherwise your data will continue to consume
       memory until the originating thread terminates, or put differently, you will have a memory leak.

       This module lets you share two sorts of ndarray data. You can share data for an ndarray that is based on
       actual physical memory, such as the result of "zeroes" in PDL::Core. You can also share data using memory
       mapped files.  (Note: PDL v2.4.11 and higher support memory mapped ndarrays on all major platforms,
       including Windows.) There are other sorts of ndarrays whose data you cannot share. You cannot directly
       share ndarrays that have not been physicalised, though a simple "make_physical" in PDL::Core, "sever" in
       PDL::Core, or "copy" in PDL::Core will give you an ndarray based on physical memory that you can share.
       Also, certain functions wrap external data into ndarrays so you can manipulate them with PDL methods.
       For example, see "plmap" in PDL::Graphics::PLplot and "plmeridians" in PDL::Graphics::PLplot. These you
       cannot share directly, but making a physical copy with "copy" in PDL::Core will give you something that
       you can safely share.

   Physical Memory
       The mechanism by which this module achieves data sharing of physical memory is remarkably cheap. It's
       even cheaper then a simple affine transformation.  The sharing works by creating a new shell of an
       ndarray for each call to "retrieve_pdls" and setting that ndarray's memory structure to point back to the
       same locations of the original (shared) ndarray. This means that you can share ndarrays that are created
       with standard constructors like "zeroes" in PDL::Core, "pdl" in PDL::Core, and "sequence" in PDL::Basic,
       or which are the result of operations and function evaluations for which there is no data flow, such as
       "cat" in PDL::Core (but not "dog" in PDL::Core), arithmetic, "copy" in PDL::Core, and "sever" in
       PDL::Core. When in doubt, "sever" your ndarray before sharing and everything should work.

       There is an important nuance to sharing physical memory: The memory will always be freed when the
       originating thread terminates, even if it terminated cleanly. This can lead to segmentation faults when
       one thread exits and frees its memory before another thread has had a chance to finish calculations on
       the shared data. It is best to use barrier synchronization to avoid this (via
       PDL::Parallel::threads::SIMD), or to share data solely from your main thread.

   Memory Mapped Data
       As of 0.07, data sharing of memory-mapped ndarrays is exactly the same as any other. It has not been
       tested with PDL::IO::FlexRaw-mapped ndarrays.

   Package and Name Munging
       "PDL::Parallel::threads" lets you associate your data with a specific text name. Put differently, it
       provides a global namespace for data. Users of the "C" programming language will immediately notice that
       this means there is plenty of room for developers using this module to choose the same name for their
       data. Without some combination of discipline and help, it would be easy for shared memory names to clash.
       One solution to this would be to require users (i.e. you) to choose names that include their current
       package, such as "My-Module-workspace" or, following perlpragma, "My::Module/workspace" instead of just
       "workspace". This is sometimes called name mangling. Well, I decided that this is such a good idea that
       "PDL::Parallel::threads" does the second form of name mangling for you automatically! Of course, you can
       opt out, if you wish.

       The basic rules are that the package name is prepended to the name of the shared memory as long as the
       name is only composed of word characters, i.e.  names matching "/^\w+$/". Here's an example demonstrating
       how this works:

        package Some::Package;
        use PDL;
        use PDL::Parallel::threads 'retrieve_pdls';

        # Stored under '??foo'
        sequence(20)->share_as('??foo');

        # Shared as 'Some::Package/foo'
        zeroes(100)->share_as('foo');

        sub do_something {
          # Retrieve 'Some::Package/foo'
          my $copy_of_foo = retrieve_pdls('foo');

          # Retrieve '??foo':
          my $copy_of_weird_foo = retrieve_pdls('??foo');

          # ...
        }

        # Move to a different package:
        package Other::Package;
        use PDL::Parallel::threads 'retrieve_pdls';

        sub something_else {
          # Retrieve 'Some::Package/foo'
          my $copy_of_foo = retrieve_pdls('Some::Package/foo');

          # Retrieve '??foo':
          my $copy_of_weird_foo = retrieve_pdls('??foo');

          # ...
        }

       The upshot of all of this is that if you use some module that also uses "PDL::Parallel::threads",
       namespace clashes are highly unlikely to occur as long as you (and the author of that other module) use
       simple names, like the sort of thing that works for variable names.

FUNCTIONS

       This module provides three stand-alone functions and adds one new PDL method.

   share_pdls
       Shares ndarray data across threads using the given names.

         share_pdls (name => ndarray, name => ndarray, ...)

       This function takes key/value pairs where the value is the ndarray to store, and the key is the name
       under which to store the ndarray. You can later retrieve the memory with the "retrieve_pdls" method.

       Sharing an ndarray with physical memory (or that is memory-mapped) increments the data's reference count;
       you can decrement the reference count by calling "free_pdls" on the given "name". In general this ends up
       doing what you mean, and freeing memory only when you are really done using it.

        my $data1 = zeroes(20);
        my $data2 = ones(30);
        share_pdls(foo => $data1, bar => $data2);

       This can be combined with constructors and fat commas to allocate a collection of shared memory that you
       may need to use for your algorithm:

        share_pdls(
            main_data => zeroes(1000, 1000),
            workspace => zeroes(1000),
            reduction => zeroes(100),
        );

       "share_pdls" preserves the badflag and badvalue on ndarrays.

   share_as
       Method to share an ndarray's data across threads under the given name.

         $pdl->share_as(name)

       This PDL method lets you directly share an ndarray. It does the exact same thing as "shared_pdls", but
       its invocation is a little different:

        # Directly share some constructed memory
        sequence(20)->share_as('baz');

        # Share individual ndarrays:
        my $data1 = zeroes(20);
        my $data2 = ones(30);
        $data1->share_as('foo');
        $data2->share_as('bar');

       Like many other PDL methods, this method returns the just-shared ndarray.  This can lead to some amusing
       ways of storing partial calculations partway through a long chain:

        my $results = $input->sumover->share_as('pre_offset') + $offset;

        # Now you can get the result of the sumover operation
        # before that offset was added, by calling:
        my $pre_offset = retrieve_pdls('pre_offset');

       This function achieves the same end as "share_pdls": There's More Than One Way To Do It, because it can
       make for easier-to-read code. In general I recommend using the "share_as" method when you only need to
       share a single ndarray memory space.

       "share_as" preserves the badflag and badvalue on ndarrays.

   retrieve_pdls
       Obtain ndarrays providing access to the data shared under the given names.

         my ($copy1, $copy2, ...) = retrieve_pdls (name, name, ...)

       This function takes a list of names and returns a list of ndarrays that provide access to the data shared
       under those names. In scalar context the function returns the ndarray corresponding with the first named
       data set, which is usually what you mean when you use a single name. If you specify multiple names but
       call it in scalar context, you will get a warning indicating that you probably meant to say something
       differently.

        my $local_copy = retrieve_pdls('foo');
        my @both_ndarrays = retrieve_pdls('foo', 'bar');
        my ($foo, $bar) = retrieve_pdls('foo', 'bar');

       "retrieve_pdls" preserves the badflag and badvalue on ndarrays.

   free_pdls
       Frees the shared memory (if any) associated with the named shared data.

         free_pdls(name, name, ...)

       This function marks the memory associated with the given names as no longer being shared, handling all
       reference counting and other low-level stuff.  You generally won't need to worry about the return value.
       But if you care, you get a list of values---one for each name---where a successful removal gets the name
       and an unsuccessful removal gets an empty string.

       So, if you say "free_pdls('name1', 'name2')" and both removals were successful, you will get "('name1',
       'name2')" as the return values. If there was trouble removing "name1" (because there is no memory
       associated with that name), you will get "('', 'name2')" instead. This means you can handle trouble with
       perl "grep"s and other conditionals:

        my @to_remove = qw(name1 name2 name3 name4);
        my @results = free_pdls(@to_remove);
        if (not grep {$_ eq 'name2'} @results) {
            print "That's weird; did you remove name2 already?\n";
        }
        if (not $results[2]) {
            print "Couldn't remove name3 for some reason\n";
        }

       This function simply removes an ndarray's memory from the shared pool. It does not interact with bad
       values in any way. But then again, it does not interfere with or screw up bad values, either.

DIAGNOSTICS

"share_pdls: expected key/value pairs"
You called "share_pdl" with an odd number of arguments, which means that you could not have supplied
key/value pairs. Double-check that every ndarray (or filename) that you supply is preceded by its
shared name.

"share_pdls: you already have data associated with '$name'"
You tried to share some data under $name, but some data is already associated with that name. Typo?
You can avoid namespace clashes with other modules by using simple names and letting
"PDL::Parallel::threads" mangle the name internally for you.

"share_pdls: Could not share an ndarray under name '$name' because ..."
"... the ndarray does not have any allocated memory."
You tried to share an ndarray that does not have any memory associated with it.

"... the ndarray's data does not come from the datasv."
You tried to share an ndarray that has a funny internal structure, in which the data does not
point to the buffer portion of the datasv. I'm not sure how that could happen without triggering
a more specific error, so I hope you know what's going on if you get this. :-)

"share_pdls passed data under '$name' that it does not know how to store"
"share_pdls" only knows how to store raw data ndarrays. It'll croak if you try to share other kinds
of ndarrays, and it'll throw this error if you try to share anything else, like a hashref.

"retrieve_pdls: '$name' was created in a thread that has ended or is detached"
In some other thread, you added some data to the shared pool. If that thread ended without you
freeing that data (or the thread has become a detached thread), then we cannot know if the data is
available. You should always free your data from the data pool when you're done with it, to avoid
this error.

"retrieve_pdls could not find data associated with '$name'"
Pretty simple: either data has never been added under this name, or data under this name has been
removed.

"retrieve_pdls: requested many ndarrays... in scalar context?"
This is just a warning. You requested multiple ndarrays (sent multiple names) but you called the
function in scalar context. Why do such a thing?

LIMITATIONS

       You  cannot  share  memory  mapped files that require features of PDL::IO::FlexRaw. That is a cool module
       that lets you pack multiple ndarrays into a single file, but simple cross-thread sharing is  not  trivial
       and is not (yet) supported.

       If you are dealing with a physical ndarray, you have to be a bit careful about how the memory gets freed.
       If  you  don't  call  "free_pdls" on the data, it will persist in memory until the end of the originating
       thread, which means you have a classic memory leak. If another thread creates a thread-local copy of  the
       data  before  the originating thread ends, but then tries to access the data after the originating thread
       ends, this will be fine as the reference count of the "datasv" will have been increased.

BUGS

       None known at this point.

AUTHOR, COPYRIGHT, LICENSE

       This module was written by David Mertens. The documentation is copyright (C)  David  Mertens,  2012.  The
       source code is copyright (C) Northwestern University, 2012. All rights reserved.

       This module is distributed under the same terms as Perl itself.

DISCLAIMER OF WARRANTY

       Parallel  computing is hard to get right, and it can be exacerbated by errors in the underlying software.
       Please do not use this software in anything that is mission-critical unless you have tested and  verified
       it yourself. I cannot guarantee that it will perform perfectly under all loads. I hope this is useful and
       I  wish you well in your usage thereof, but BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO
       WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT  WHEN  OTHERWISE  STATED  IN
       WRITING  THE  COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY
       KIND,  EITHER  EXPRESSED  OR  IMPLIED,  INCLUDING,  BUT  NOT  LIMITED  TO,  THE  IMPLIED  WARRANTIES   OF
       MERCHANTABILITY  AND  FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE
       OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST  OF  ALL  NECESSARY
       SERVICING, REPAIR, OR CORRECTION.

       IN  NO  EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY
       OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE  LIABLE
       TO  YOU  FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
       THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT  LIMITED  TO  LOSS  OF  DATA  OR  DATA  BEING
       RENDERED  INACCURATE  OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE
       WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF  SUCH
       DAMAGES.

perl v5.40.0                                       2025-02-04                        PDL::Parallel::threads(3pm)