Provided by: libur-perl_0.470+ds-2_all bug

NAME

       UR::DataSource::Filesystem - Get and save objects to delimited text files

SYNOPSIS

         # Create an object for the data file
         my $people_data = UR::DataSource::Filesystem->create(
             columns => ['person_id','name','age','street_address'],
             sorted_columns => ['age','person_id'],
             path => '/var/lib/people/$state/$city/people.txt',
             delimiter        => "\t", # between columns in the file
             record_separator => "\n", # between lines in the file
         );

         # Define an entity class for the people in the file
         class MyProgram::Person {
             id_by => 'person_id',
             has => [
                 name           => { is => 'String' },
                 age            => { is => 'Number' },
                 street_address => { is => 'String' },
                 city           => { is => 'String' },
                 state          => { is => 'String' },
             ],
             data_source_id => $people_data->id,
         };

         # Get all people that live in any city named Springfield older than 40
         my @springfielders = MyProgram::Person->get(city => 'Springfield', 'age >' => 40);

DESCRIPTION

       A Filesystem data source object represents one or more files on the filesystem.  In the simplest case,
       the object's 'path' property names a file that stores the data.

   Properties
       These properties determine the configuration for the data source.

       path <string>
           path  is  a  string  representing the path to the files.  Besides just being a simple pathname to one
           file, the string can also be a specification  of  many  similar  files,  or  a  directory  containing
           multiple files.  See below for more information about 'path'

       record_separator <string>
           The  separator  between  lines  in the file.  This gets stored in $/ before calling getline() to read
           data.  The default record_separator is "\n".

       delimiter <string>
           The separator between columns in the file.  It is used to construct a regex with qr()  to  split()  a
           line  into  a list of values.  The default delimiter is '\s*,\s*', meaning that the file is separated
           by commas.  Another common value would be "\t" for tabs.

       columns <ARRAY>
           A listref of column names in the file.  Just as SQL tables have columns, Filesystem files  also  have
           named  columns  so  the  system knows how to read the file data into object properties.  A Filesystem
           data source does not need to specify named columns if the 'columns_from_header' property is true.

           Classes that use the Filesystem data source attach their properties to the data source's columns  via
           the 'column_name' metadata.  Besides the columns directly named in the 'columns' list, two additional
           column-like  tokens  may  be used as a column_name: '__FILE__' and '$.'.  __FILE__ means the object's
           property will hold the name of the file the data was read from.  $. means the value will be the input
           line number from the file.  These are useful when iterating over the contents of a file.  Since these
           two fake columns are always considered "sorted", it makes reading from the file faster in some cases.
           See the 'sorted_columns' discussion below for more information.

       sorted_columns <ARRAY>
           A listref of column names that the file is sorted by, in the order of the sorting.  If  a  column  is
           sorted  in descending order, put a minus (-) in front of the name.  If the file is sorted by multiple
           columns, say first by last_name and then by first_name, then include them both:

             sorted_columns => ['last_name','first_name']

           The system uses this information to know when to stop reading if a query is done on a sorted  column.
           It's  also  used  to  determine whether a query done on the data source matches the sort order of the
           file.  If not, then the data must be gathered in two passes.  The first pass  finds  records  in  the
           file that match the filter.  After that, the matching records are sorted in the same way the query is
           requesting before returning the data to the Context.

           The  Context  expects incoming data to always be sorted by at least the class' ID properties.  If the
           file is unsorted and the caller wants to be able to iterate over the data, then it is common to  have
           the class' ID properties specified like this:

             id_by => [
                 file => { is => 'String', column_name => '__FILE__' },
                 line => { is => 'Integer', column_name => '$.' },
             ]

           Otherwise,  it  will  need to read in the whole file and sort the contents before returning the first
           row of data from its iterator.

       columns_from_header <boolean>
           If true, the system will read the first line of the file to determine what the column names are.

       header_lines <integer>
           The number of lines at the top of the file that do not contain entity data.  When the file is opened,
           this number of lines are skipped before reading data.  If the columns_from_header flag is  true,  the
           header_lines value should be at least 1.

       handle_class <string>
           Which  class  to  use for reading and writing to the file.  The default is IO::File.  Any other value
           must refer to a class that has the same interface as IO::File, in particular: new, input_line_number,
           getline, tell, seek and print.

   Path specification
       Besides referring to just one file on the filesystem, the path spec is a recipe for finding  files  in  a
       directory  tree.  If a class using a Filesystem data source does not have 'table_name' metadata, then the
       path specification must resolve to file names.  Alternatively, classes  may  specify  their  'table_name'
       which is interpreted as a file within the directory indicated by the path specification.

       Three kinds of special tokens can also appear in a file spec:

       $property
           When  querying, the system will extract the value (or values, for an in-clause) of $property from the
           BoolExpr when constructing the pathname.  If the BoolExpr does not have a value  for  that  property,
           then  the system will do a shell glob to find the possible values.  For example, given this path spec
           and query:

             path => '/var/people/$state/$city/people.txt'
             my @people = MyProgram::People->get(city => 'Springfield', 'age >' => 40);

           it would find the data files using the glob expression

             /var/people/*/Springfield/people.txt

           It also knows that any objects coming from the file

             /var/people/CA/Springfield/people.txt

           must have the value 'CA' for their 'state' property, even though  that  information  is  not  in  the
           contents of the file.

           When committing changes back to the file, the object property values are used to determine which file
           it should be saved to.

           The property name can also be wrapped in braces:

             /var/people/${state}_US/city_${city}/people.txt

       &method
           The replacement value is resolved by calling the named method on the subject class of the query.  The
           method is called like this:

             $replacement = $subject_class->$method( $boolexpr_or_object);

           During a query, the method is passed a BoolExpr; during a commit, the method is passed an object.  It
           must return a string.

           The method name can also be wrapped in braces:

             /&{resolve_prefix}.dir/people.txt

       *, ?
           Literal  shell glob wildcards are honored when finding files, but their values are not used to supply
           values to objects.

   Environment Variables
       If the environment variable $UR_DBI_MONITOR_SQL is true, then  the  Filesystem  data  source  will  print
       information about the queries it runs.

INHERITANCE

         UR::DataSource

SEE ALSO

       UR, UR::DataSource

perl v5.32.1                                       2022-01-17                    UR::DataSource::Filesystem(3pm)