Provided by: libgetdata-doc_0.11.0-13_all bug

NAME

       gd_getdata — retrieve data from a Dirfile database

SYNOPSIS

       #include <getdata.h>

       size_t gd_getdata(DIRFILE *dirfile, const char *field_code, off_t first_frame, off_t first_sample, size_t
              num_frames, size_t num_samples, gd_type_t return_type, void *data_out);

DESCRIPTION

       The  gd_getdata()  function  queries a dirfile(5) database specified by dirfile for the field field_code.
       It fetches num_frames frames plus num_samples samples from this field, starting first_sample samples past
       frame first_frame.  The data is converted to the data type specified by return_type, and  stored  in  the
       user-supplied buffer data_out.

       The  field_code  may contain one of the representation suffixes listed in dirfile-format(5).  If it does,
       gd_getdata() will compute the appropriate complex norm before returning the data.

       The dirfile argument must point to a valid DIRFILE object previously created by  a  call  to  gd_open(3).
       The  argument data_out must point to a valid memory location of sufficient size to hold all data request‐
       ed.

       Unless using GD_HERE (see below), the first sample returned will be

              first_frame * samples_per_frame + first_sample

       as measured from the start of the dirfile, where samples_per_frame is the number of samples per frame  as
       returned by gd_spf(3).  The number of samples fetched is, similarly,

              num_frames * samples_per_frame + num_samples.

       Although calling gd_getdata() using both samples and frames is possible, the function is typically called
       with either num_samples and first_sample, or num_frames and first_frames, equal to zero.

       Instead  of  explicitly specifying the origin of the read, the caller may pass the special symbol GD_HERE
       as first_frame.  This will result in the read occurring at the current position of the  I/O  pointer  for
       the  field  (see  GetData  I/O Pointers below for a discussion of field I/O pointers).  In this case, the
       value of first_sample is ignored.

       When reading a SINDIR field, return_type must be GD_STRING.  For all other field types,  the  return_type
       argument should be one of the following symbols, which indicates the desired return type of the data:

              GD_UINT8
                      unsigned 8-bit integer

              GD_INT8 signed (two's complement) 8-bit integer

              GD_UINT16
                      unsigned 16-bit integer

              GD_INT16
                      signed (two's complement) 16-bit integer

              GD_UINT32
                      unsigned 32-bit integer

              GD_INT32
                      signed (two's complement) 32-bit integer

              GD_UINT64
                      unsigned 64-bit integer

              GD_INT64
                      signed (two's complement) 64-bit integer

              GD_FLOAT32
                      IEEE-754 standard 32-bit single precision floating point number

              GD_FLOAT64
                      IEEE-754 standard 64-bit double precision floating point number

              GD_COMPLEX64
                      C99-conformant 64-bit single precision complex number

              GD_COMPLEX128
                      C99-conformant 128-bit double precision complex number

              GD_NULL the  null  type: the database is queried as usual, but no data is returned.  In this case,
                      data_out is ignored and may be NULL.

       The return type of the data need not be the same as the type of the data stored in  the  database.   Type
       conversion will be performed as necessary to return the requested type.  If the field_code does not indi‐
       cate a representation, but conversion from a complex value to a purely real one is required, only the re‐
       al portion of the requested vector will be returned.

       Upon  successful completion, the I/O pointer of the field will be on the sample immediately following the
       last sample returned, if possible.  On error, the position of the I/O pointer is not specified,  and  may
       not even be well defined.

   Behaviour While Reading Specific Field Types
       MPLEX: Reading  an MPLEX field typically requires GetData to read data before the range returned in order
              to determine the value of the first sample returned.  This can become expensive if the encoding of
              the underlying RAW data does not support seeking backwards (which is true of most compression  en‐
              codings).  How much preceding data GetData searches for the initial value of the returned data can
              be adjusted, or the lookback disabled completely, using gd_mplex_lookback(3).  If the initial val‐
              ue  of  the  field is not found in the data searched, GetData will fill the returned vector, up to
              the next available sample of the mulitplexed  field,  with  zero  for  integer  return  types,  or
              IEEE-754-conforming  NaN (not-a-number) for floating point return types, as it does when providing
              data before the beginning-of-field.

              GetData caches the value of the last sample from every MPLEX it reads so that a subsequent read of
              the field starting from the following sample (either through an explicit starting sample given  by
              the  caller  or  else  implicitly  using GD_HERE) will not need to scan the field backwards.  This
              cache is invalidated if a different return type is used, or if an intervening operation moves  the
              field's I/O pointer.

       SINDIR:
              The  only  allowed return_type when reading SINDIR data is GD_STRING.  The data argument should be
              of type const char **, and be large enough to hold one pointer for each sample requested.  It will
              be filled with pointers to read-only string data.  The caller should not free the returned  string
              pointers.   For  convenience  when  allocating  buffers,  the GD_STRING constant has the property:
              GD_SIZE(GD_STRING) == sizeof(const char *).  On samples where the index vector is out of range  of
              the  SARRAY,  and also on samples before the index vector's frame offset, the value stored in data
              will be the NULL pointer.

       PHASE: A forward-shifted PHASE field will always encounter the end-of-field marker before its input field
              does.   This  has  ramifications  when  reading  streaming  data  with  gd_getdata()   and   using
              gd_nframes(3)  to gauge field lengths (that is: a forward-shifted PHASE field always has less data
              in it than gd_nframes(3) implies that it does).  As with any other field, gd_getdata() will return
              a short count whenever a read from a PHASE field encounters the end-of-field marker.

              Backward-shifted PHASE fields do not suffer from this problem, since gd_getdata() pads reads  past
              the  beginning-of-field marker with NaN or zero as appropriate.  Database creators who wish to use
              the PHASE field type with streaming data are encouraged to work around this limitation by only us‐
              ing backward-shifted PHASE fields, by writing RAW data at the maximal frame lag,  and  then  back-
              shifting  all  data  which  should  have been written earlier.  Another possible work-around is to
              write systematically less data to the reference RAW field in proportion  to  the  maximal  forward
              phase  shift.  This method will work with applications which respect the database size reported by
              gd_nframes(3) resulting in these applications effectively ignoring all frames past the frame  con‐
              taining the maximally forward-shifted PHASE field's end-of-field marker.

       WINDOW:
              The  samples  of a WINDOW for which the field conditional is false will be filled with either zero
              for integer return types, or IEEE-754-conforming NaN  (not-a-number)  for  floating  point  return
              types.

RETURN VALUE

       In all cases, gd_getdata() returns the number of samples (not bytes) successfully read from the database.
       If  the  end-of-field is encountered before the requested number of samples have been read, a short count
       will result.  this is not an error.

       Requests for data before the beginning-of-field marker, which may have been shifted from frame zero by  a
       PHASE  field  or  /FRAMEOFFSET directive, will result in the the data being padded at the front by NaN or
       zero, depending on whether the return type is of floating point or integral type.

       On error, this function returns zero and stores a negative-valued error code in the DIRFILE object  which
       may be retrieved by a subsequent call to gd_error(3).  Possible error codes are:

       GD_E_ALLOC
               The library was unable to allocate memory.

       GD_E_BAD_CODE
               The  field  specified by field_code, or one of the fields it uses for input, was not found in the
               database.

       GD_E_BAD_DIRFILE
               An invalid dirfile was supplied.

       GD_E_BAD_SCALAR
               A scalar field used in the definition of the field was not found, or was not of scalar type.

       GD_E_BAD_TYPE
               An invalid return_type was specified.

       GD_E_DIMENSION
               The supplied field_code referred to a CONST, CARRAY, or STRING  field.   The  caller  should  use
               gd_get_constant(3),  or  gd_get_string(3)  instead.   Or, a scalar field was found where a vector
               field was expected in the definition of field_code or one of its inputs.

       GD_E_DOMAIN
               An immediate read was attempted using GD_HERE, but the I/O pointer of the field was not well  de‐
               fined  because  two  or  more  of  the field's inputs did not agree as to the location of the I/O
               pointer.

       GD_E_INTERNAL_ERROR
               An internal error occurred in the library while trying to perform the task.  This indicates a bug
               in the library.  Please report the incident to the maintainer.

       GD_E_IO An error occurred while trying to open or read from a file on disk  containing  a  raw  field  or
               LINTERP table.

       GD_E_LUT
               A LINTERP table was malformed.

       GD_E_RANGE
               An  attempt  was made to read data outside the addressable Dirfile range (more than 2**63 samples
               past the start of the dirfile).

       GD_E_RECURSE_LEVEL
               Too many levels of recursion were encountered while trying to resolve field_code.   This  usually
               indicates a circular dependency in field specification in the dirfile.

       GD_E_UNKNOWN_ENCODING
               The  encoding scheme of a RAW field could not be determined.  This may also indicate that the bi‐
               nary file associated with the RAW field could not be found.

       GD_E_UNSUPPORTED
               Reading from dirfiles with the encoding scheme of the specified dirfile is not supported  by  the
               library.  See dirfile-encoding(5) for details on dirfile encoding schemes.

       A descriptive error string for the error may be obtained by calling gd_error_string(3).

NOTES

       To  save memory, gd_getdata() uses the memory pointed to by data_out as scratch space while computing de‐
       rived fields.  As a result, if an error is encountered during the computation, the contents of this memo‐
       ry buffer are unspecified, and may have been modified by this call, even though gd_getdata() will  report
       zero samples returned on error.

       Reading  slim-compressed  data  (see defile-encoding(5)), may cause unexpected memory usage.  This is be‐
       cause slimlib internally caches open decompressed files as they are read, and GetData doesn't close  data
       files  between gd_getdata() calls for efficiency's sake.  Memory used by this internal slimlib buffer can
       be reclaimed by calling gd_raw_close(3) on fields when finished reading them.

       When operating on a platform whose size_t is N-bytes wide, a single call of gd_getdata() will  never  re‐
       turn  more than (2**(N-1) - 1) samples.  The request will be truncated at (2**(N-M) - 1) samples, where M
       is the size, in bytes, of the largest data type used to calculate the returned field.  If  a  larger  re‐
       quest  is  specified, less data than requested will be returned, without raising an error.  This limit is
       imposed even when return_type is GD_NULL or when reading from the INDEX field (i.e., even when no  actual
       I/O or calculation occurs).  In all cases, the actual amount of data is returned.

GETDATA I/O POINTERS

       This  is  a general discussion of field I/O pointers in the GetData library, and contains information not
       directly applicable to gd_getdata().

       Every RAW field in an open Dirfile has an I/O pointer which indicates  the  library's  current  read  and
       write  poisition  in the field.  These I/O pointers are useful when performing sequential reads or writes
       on Dirfile fields (see GD_HERE in the description above).  The value of the I/O pointer of a field is re‐
       ported by gd_tell(3).

       Derived fields have virtual I/O pointers arising from the I/O pointers of their input fields.  These vir‐
       tual I/O pointers may be valid (when all input fields agree on their position in the dirfile) or  invalid
       (when  the input fields are not in agreement).  The I/O pointer of some derived fields is always invalid.
       The usual reason for this is the derived field simultaneously reading from two different  places  in  the
       same RAW field.  For example, given the following Dirfile metadata specification:

              a RAW UINT8 1
              b PHASE a 1
              c LINCOM 2 a 1 0 b 1 0

       the  derived  field c never has a valid I/O pointer, since any particular sample of c ultimately involves
       reading from more than one place in the RAW field a.  Attempting to perform sequential  reads  or  writes
       (with  GD_HERE) on a derived field when its I/O pointer is invalid will result in an error (specifically,
       GD_E_DOMAIN).

       The implicit INDEX field has an effective I/O pointer than mostly behaves  like  a  true  RAW  field  I/O
       pointer,  except  that  it  permits  simultaneous reads from multiple locations.  So, given the following
       metadata specification:

              d PHASE INDEX 1
              e LINCOM 2 INDEX 1 0 d 1 0

       the I/O pointer of the derived field e will always be valid, unlike the similarly defined c  above.   The
       virtual I/O pointer of a derived field will change in response to movement of the RAW I/O pointers under‐
       lying  the derived fields inputs, and vice versa: moving the I/O pointer of a derived field will move the
       I/O pointer of the RAW fields from which it ultimately derives.  As a result, the I/O pointer of any par‐
       ticular field may move in unexpected ways if multiple fields are manipulated at the same time.

       When a Dirfile is first opened, the I/O pointer of every RAW field is set to the beginning-of-frame  (the
       value returned by gd_bof(3)), as is the I/O pointer of any newly-created RAW field.

       The following library calls cause I/O pointers to move:

       gd_getdata() and gd_putdata(3)
              These  functions  move  the I/O pointer of affected fields to the sample immediately following the
              last sample read or written, both when performed at an  absolutely  specified  position  and  when
              called  for a sequential read or write using GD_HERE.  When reading a derived field which simulta‐
              neously reads from more than one place in a RAW field (such as c above), the position of that  RAW
              field's I/O pointer is unspecified (that is: it is not specified which input field is read first).

       gd_seek(3)
              This function is used to manipulate I/O pointers directly.

       gd_flush(3) and gd_raw_close(3)
              These  functions  set  the  I/O pointer of any RAW field which is closed back to the beginning-of-
              field.

       calls which result in modifications to raw data files:
              this   may   happen   when   calling   any   of:   gd_alter_encoding(3),   gd_alter_endianness(3),
              gd_alter_frameoffset(3),  gd_alter_entry(3), gd_alter_raw(3), gd_alter_spec(3), gd_malter_spec(3),
              gd_move(3), or gd_rename(3); these functions close affected RAW fields before  making  changes  to
              the raw data files, and so reset the corresponding I/O pointers to the beginning-of-field.

       In  general,  when  these  calls  fail, the I/O pointers of affected fields may be anything, even out-of-
       bounds or invalid.  After an error, the caller should issue an explicit  gd_seek(3)  to  repoisition  I/O
       pointers before attempting further sequential operations.

HISTORY

       The function getdata() appeared in GetData-0.3.0.

       The GD_COMPLEX64 and GD_COMPLEX128 data types appeared in GetData-0.6.0.

       In GetData-0.7.0, this function was renamed to gd_getdata().

       The GD_HERE symbol used for sequential reads appeared in GetData-0.8.0.

       The GD_STRING data type appeared in GetData-0.10.0.

SEE ALSO

       GD_SIZE(3),  gd_error(3), gd_error_string(3), gd_get_constant(3), gd_get_string(3), gd_mplex_lookback(3),
       gd_nframes(3), gd_open(3), gd_raw_close(3), gd_seek(3), gd_spf(3), gd_putdata(3), dirfile(5), dirfile-en‐
       coding(5)

Version 0.10.0                                  25 December 2016                                   gd_getdata(3)