Ubuntu Manpage: Statistics::OnLine - Pure Perl implementation of the on-line algorithm to produce statistics

Provided by: libstatistics-online-perl_0.02-4_all

NAME

       Statistics::OnLine - Pure Perl implementation of the on-line algorithm to produce statistics

SYNOPSIS

        use Statistics::OnLine;
        my $s = Statistics::OnLine->new;

        my @data = (1, 2, 3, 4, 5);
        $s->add_data( @data );
        $s->add_data( 6, 7 );
        $s->add_data( 8 );

        print "count = ",$s->count,"\tmean = ",$s->mean,"\tvariance = ",$s->variance,"\tvariance_n = ",
              $s->variance_n,"\tskewness = ",$s->skewness,"\tkurtosis = ",$s->kurtosis,"\n";

        $s->add_data( ); # does nothing!
        print "count = ",$s->count,"\tmean = ",$s->mean,"\tvariance = ",$s->variance,"\tvariance_n = ",
              $s->variance_n,"\tskewness = ",$s->skewness,"\tkurtosis = ",$s->kurtosis,"\n";

        $s->add_data( 9, 10 );
        print "count = ",$s->count,"\tmean = ",$s->mean,"\tvariance = ",$s->variance,"\tvariance_n = ",
              $s->variance_n,"\tskewness = ",$s->skewness,"\tkurtosis = ",$s->kurtosis,"\n";

DESCRIPTION

       This module implements a tool to perform statistic operations on large datasets which, typically, could
       not fit the memory of the machine, e.g. a stream of data from the network.

       Once instantiated, an object of the class provide an "add_data" method to add data to the dataset. When
       the computation of some statistics is required, at some point of the stream, the appropriate method can
       be called. After the execution of the statistics it is possible to continue to add new data. In turn, the
       object will continue to update the existing data to provide new statistics.

METHODS

       new()
           Creates a new "Statistics::OnLine" object and returns it.

       add_data(@)
           Adds new data to the object and updates the internal state of the statistics.

           The method return the object itself in order to use it in chaining:

            my $v = $s->add_data( 1, 2, 3, 4 )->variance;

       clean()
           Cleans the internal state of the object and resets all the internal statistics.

           Return the object itself in order to use it in chaining:

            my $v = $s->clean->add_data( 1, 2, 3, 4 )->variance;

       count()
           Returns the actual number or elements inserted and processed by the object.

       mean()
           Returns the average of the elements inserted into the system:

            \fract{ \sum_1^n{x_i} }{ n }

       variance()
           Returns the variance of the element inserted into the system:

            \fract{ \sum_1^n{avg - x_i} }{ n - 1 }

       variance_n()
           Returns the variance of the element inserted into the system:

            \fract{ \sum_1^n{avg - x_i} }{ n }

       skewness()
           Returns   the  skewness  (third  standardized  moment)  of  the  element  inserted  into  the  system
           <http://en.wikipedia.org/wiki/Skewness>

       kurtosis()
           Returns  the  kurtosis  (fourth  standardized  moment)  of  the  element  inserted  into  the  system
           <http://en.wikipedia.org/wiki/Kurtosis>

ERROR MESSAGES

       The conditions in which the system can return errors, using a "die" are:

       too few elements to compute function
           Some  functions need a minimum number of elements to be computed: "mean", "variance_n" and "skewness"
           need at least one element, "variance" at least two and "kurtosis" needs at least four.

       variance is zero: cannot compute kurtosis|skewness
           Both kurtosis and skewness need that variance to be greater than zero.

THEORY

       On-line statistics are based on strong mathematical foundations which transform the standard computations
       into a sequence of operations that incrementally update with new values the actual ones.

       There are some referencence in the web. This documentation  suggest  to  start  your  investigation  from
       <http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Higher-order_statistics>.   The  linked
       page provides other useful references on the foundations of the method.

CAVEAT

       The module is intended to be used in all the situations in which: (1) the number of data  elements  could
       be  too  large with respect the memory of the system, or (2) the elements arrive at different time stamps
       and intermediate results are needed.

       If the length of the stream is fixed, all the data elements are present in a single place  and  there  is
       not  need  for  intermediate  results,  it  could  be  better  to  use  different  modules,  for instance
       Statistics::Lite, to make computations.

       The reason for this choice is that the module uses a stable approximation, well suited  for  the  use  on
       steams  (effectively  an  on-line  algorithm).  Using  this system on fixed datasets could introduce some
       (little) approximation.

HISTORY

       0.02
           Corrected typos in documentation

       0.01
           Initial version of the module

AUTHOR

       Francesco Nidito

COPYRIGHT

       Copyright 2009 Francesco Nidito. All rights reserved.

       This library is free software; you can redistribute it and/or modify it under  the  same  terms  as  Perl
       itself.

NAME

SYNOPSIS

DESCRIPTION

METHODS

ERROR MESSAGES

THEORY

CAVEAT

HISTORY

AUTHOR

COPYRIGHT

SEE ALSO