Ubuntu Manpage: FunColumnSelect - select Funtools columns

Provided by: libfuntools-dev_1.4.8-1.1build2_amd64

NAME

       FunColumnSelect - select Funtools columns

SYNOPSIS

         #include <funtools.h>

         int FunColumnSelect(Fun fun, int size, char *plist,
                             char *name1, char *type1, char *mode1, int offset1,
                             char *name2, char *type2, char *mode2, int offset2,
                             ...,
                             NULL)

         int FunColumnSelectArr(Fun fun, int size, char *plist,
                                char **names, char **types, char **modes,
                                int *offsets, int nargs);

DESCRIPTION

       The FunColumnSelect() routine is used to select the columns from a Funtools binary table extension or raw
       event file for processing. This routine allows you to specify how columns in a file are to be read into a
       user record structure or written from a user record structure to an output FITS file.

       The  first  argument is the Fun handle associated with this set of columns. The second argument specifies
       the size of the user record structure into which columns will be read.  Typically, the sizeof() macro  is
       used  to  specify  the  size  of  a  record  structure.  The third argument allows you to specify keyword
       directives for the selection and is described in more detail below.

       Following the first three required arguments is a variable length list of  column  specifications.   Each
       column specification will consist of four arguments:

       •   name: the name of the column

       •   type:  the  data type of the column as it will be stored in the user record struct (not the data type
           of the input file). The following basic data types are recognized:

           •   A: ASCII characters

           •   B: unsigned 8-bit char

           •   I: signed 16-bit int

           •   U: unsigned 16-bit int (not standard FITS)

           •   J: signed 32-bit int

           •   V: unsigned 32-bit int (not standard FITS)

           •   E: 32-bit float

           •   D: 64-bit float

           The syntax used is similar to that which defines the TFORM parameter in FITS binary tables. That  is,
           a numeric repeat value can precede the type character, so that "10I" means a vector of 10 short ints,
           "E"  means  a  single  precision  float, etc.  Note that the column value from the input file will be
           converted to the specified data type as the data is read by FunTableRowGet().

           [ A short digression regarding bit-fields: Special attention is required when reading or writing  the
           FITS bit-field type ("X"). Bit-fields almost always have a numeric repeat character preceding the 'X'
           specification.  Usually  this value is a multiple of 8 so that bit-fields fit into an integral number
           of bytes. For all cases, the byte size of the bit-field B is (N+7)/8, where N is the  numeric  repeat
           character.

           A  bit-field is most easily declared in the user struct as an array of type char of size B as defined
           above. In this case, bytes are simply moved from the file to the user space.  If, instead, a short or
           int scalar or array is used, then the algorithm for reading the bit-field into the user space depends
           on the size of the data type used along with the value of the repeat character.  That is, if the user
           data size is equal to the byte size of the bit-field, then the data is simply  moved  (possibly  with
           endian-based  byte-swapping) from one to the other. If, on the other hand, the data storage is larger
           than the bit-field size, then a data type cast conversion is performed to move parts of the bit-field
           into elements of the array.  Examples will help make this clear:

           •   If the file contains a 16X bit-field and user space specifies a 2B char array[2], then  the  bit-
               field is moved directly into the char array.

           •   If  the  file  contains  a 16X bit-field and user space specifies a 1I scalar short int, then the
               bit-field is moved directly into the short int.

           •   If the file contains a 16X bit-field and user space specifies a 1J scalar int, then the bit-field
               is type-cast to unsigned int before being moved (use of unsigned avoids possible sign extension).

           •   If the file contains a 16X bit-field and user space specifies a 2J int array[2],  then  the  bit-
               field  is handled as 2 chars, each of which are type-cast to unsigned int before being moved (use
               of unsigned avoids possible sign extension).

           •   If the file contains a 16X bit-field and user space specifies a 1B char, then  the  bit-field  is
               treated as a char, i.e., truncation will occur.

           •   If the file contains a 16X bit-field and user space specifies a 4J int array[4], then the results
               are undetermined.

           For  all  user  data types larger than char, the bit-field is byte-swapped as necessary to convert to
           native format, so that bits in the resulting data in user space can be tested, masked,  etc.  in  the
           same way regardless of platform.]

           In  addition  to setting data type and size, the type specification allows a few ancillary parameters
           to be set, using the full syntax for type:

            [@][n]<type>[[['B']poff]][:[tlmin[:tlmax[:binsiz]]]]

           The special character "@" can be prepended to this specification to indicated that the  data  element
           is a pointer in the user record, rather than an array stored within the record.

           The [n] value is an integer that specifies the number of elements that are in this column (default is
           1).  TLMIN,  TLMAX, and BINSIZ values also can be specified for this column after the type, separated
           by colons. If only one such number is specified, it is assumed to be TLMAX, and TLMIN  and BINSIZ are
           set to 1.

           The [poff] value can be used to specify the offset into an array. By default, this  offset  value  is
           set  to  zero  and  the  data  specified  starts at the beginning of the array. The offset usually is
           specified in terms of the data type of the column. Thus  an  offset  specification  of  [5]  means  a
           20-byte  offset  if the data type is a 32-bit integer, and a 40-byte offset for a double. If you want
           to specify a byte offset instead of an offset tied to the column data type, precede the offset  value
           with 'B', e.g. [B6] means a 6-bye offset, regardless of the column data type.

           The  [poff] is especially useful in conjunction with the pointer @ specification, since it allows the
           data element to anywhere stored anywhere in the allocated array.  For example, a  specification  such
           as "@I[2]" specifies the third (i.e., starting from 0) element in the array pointed to by the pointer
           value.  A  value of "@2I[4]" specifies the fifth and sixth values in the array. For example, consider
           the following specification:

             typedef struct EvStruct{
               short x[4], *atp;
             } *Event, EventRec;
             /* set up the (hardwired) columns */
             FunColumnSelect( fun, sizeof(EventRec), NULL,
                              "2i",    "2I  ",    "w", FUN_OFFSET(Event, x),
                              "2i2",   "2I[2]",   "w", FUN_OFFSET(Event, x),
                              "at2p",  "@2I",     "w", FUN_OFFSET(Event, atp),
                              "at2p4", "@2I[4]",  "w", FUN_OFFSET(Event, atp),
                              "atp9",  "@I[9]",   "w", FUN_OFFSET(Event, atp),
                              "atb20", "@I[B20]", "w", FUN_OFFSET(Event, atb),
                              NULL);

           Here we have specified the following columns:

           •   2i: two short ints in an array which is stored as part the record

           •   2i2: the 3rd and 4th elements of an array which is stored as part of the record

           •   an array of at least 10 elements, not stored in the record but allocated elsewhere, and  used  by
               three different columns:

               •   at2p: 2 short ints which are the first 2 elements of the allocated array

               •   at2p4: 2 short ints which are the 5th and 6th elements of the allocated array

               •   atp9: a short int which is the 10th element of the allocated array

           •   atb20: a short int which is at byte offset 20 of another allocated array

           In  this  way,  several  columns  can be specified, all of which are in a single array. NB: it is the
           programmer's responsibility to ensure that specification of a positive value for poff does not  point
           past the end of valid data.

       •   read/write  mode:  "r"  means  that  the  column  is  read  from  an  input  file  into user space by
           FunTableRowGet(), "w" means that the column is written to an output file. Both can specified  at  the
           same time.

       •   offset:  the offset into the user data to store this column. Typically, the macro FUN_OFFSET(recname,
           colname) is used to define the offset into a record structure.

       When all column arguments have been specified, a final NULL argument must  added  to  signal  the  column
       selection list.

       As   an   alternative   to   the   varargs   FunColumnSelect()  routine,  a  non-varargs  routine  called
       FunColumnSelectArr() also is available. The first three arguments (fun, size, plist) of this routine  are
       the  same  as  in  FunColumnSelect().  Instead of a variable argument list, however, FunColumnSelectArr()
       takes 5 additional arguments. The first 4 arrays arguments contain the names, types, modes, and  offsets,
       respectively,  of  the  columns  being  selected.  The  final  argument is the number of columns that are
       contained in these arrays. It is the user's responsibility  to  free  string  space  allocated  in  these
       arrays.

       Consider the following example:

         typedef struct evstruct{
           int status;
           float pi, pha, *phas;
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "r",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "r",  FUN_OFFSET(Ev, pi),
           "pha",     "E",     "r",  FUN_OFFSET(Ev, pha),
           "phas",    "@9E",   "r",  FUN_OFFSET(Ev, phas),
           NULL);

       Each  time  a  row  is  read  into  the  Ev  struct, the "status" column is converted to an int data type
       (regardless of its data type in the file) and stored in the status value of the struct.  Similarly,  "pi"
       and "pha", and the phas vector are all stored as floats. Note that the "@" sign indicates that the "phas"
       vector  is  a  pointer to a 9 element array, rather than an array allocated in the struct itself. The row
       record can then be processed as required:

         /* get rows -- let routine allocate the row array */
         while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){
           /* process all rows */
           for(i=0; i<got; i++){
             /* point to the i'th row */
             ev = ebuf+i;
             ev->pi = (ev->pi+.5);
             ev->pha = (ev->pi-.5);
           }

       FunColumnSelect() can also be called to define "writable" columns in order  to  generate  a  FITS  Binary
       Table,  without reference to any input columns.  For example, the following will generate a 4-column FITS
       binary table when FunTableRowPut() is used to write Ev records:

         typedef struct evstruct{
           int status;
           float pi, pha
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "w",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "w",   FUN_OFFSET(Ev, pi),
           "pha",     "E",     "w",   FUN_OFFSET(Ev, pha),
           "energy",  "D",       "w",   FUN_OFFSET(Ev, energy),
           NULL);

       All columns are declared to be write-only, so presumably the column data is being generated or read  from
       some other source.

       In  addition,  FunColumnSelect() can be called to define both "readable" and "writable" columns.  In this
       case, the "read" columns are associated with an input file, while the "write" columns are associated with
       the output file. Of course, columns can be specified as both "readable" and  "writable",  in  which  case
       they  are  read  from  input  and  (possibly  modified  data  values  are)  written  to  the output.  The
       FunColumnSelect() call itself is made by passing the input Funtools handle, and it is  assumed  that  the
       output file has been opened using this input handle as its Funtools reference handle.

       Consider the following example:

         typedef struct evstruct{
           int status;
           float pi, pha, *phas;
           double energy;
         } *Ev, EvRec;

         FunColumnSelect(fun, sizeof(EvRec), NULL,
           "status",  "J",     "r",   FUN_OFFSET(Ev, status),
           "pi",      "E",     "rw",  FUN_OFFSET(Ev, pi),
           "pha",     "E",     "rw",  FUN_OFFSET(Ev, pha),
           "phas",    "@9E",   "rw",  FUN_OFFSET(Ev, phas),
           "energy",  "D",     "w",   FUN_OFFSET(Ev, energy),
           NULL);

       As  in  the  "read"  example  above,  each time an row is read into the Ev struct, the "status" column is
       converted to an int data type (regardless of its data type in the file) and stored in the status value of
       the struct.  Similarly, "pi" and "pha", and the phas vector are all stored as floats.   Since  the  "pi",
       "pha",  and  "phas" variables are declared as "writable" as well as "readable", they also will be written
       to the output file.  Note, however, that the "status" variable is declared as "readable" only, and  hence
       it  will  not be written to an output file.  Finally, the "energy" column is declared as "writable" only,
       meaning it will not be read from the input file. In this case, it can be assumed that  "energy"  will  be
       calculated in the program before being output along with the other values.

       In  these  simple  cases, only the columns specified as "writable" will be output using FunTableRowPut().
       However, it often is the case that you want to merge the user columns back in  with  the  input  columns,
       even  in  cases  where  not  all  of  the  input column names are explicitly read or even known. For this
       important case, the merge=[type] keyword is provided in the plist string.

       The merge=[type] keyword tells Funtools to merge the columns from the input file  with  user  columns  on
       output.   It  is  normally  used when an input and output file are opened and the input file provides the
       Funtools reference handle for the output file. In this case, each time FunTableRowGet()  is  called,  the
       raw  input rows are saved in a special buffer. If FunTableRowPut() then is called (before another call to
       FunTableRowGet()), the contents of the raw input rows are merged with the  user  rows  according  to  the
       value of type as follows:

       •   update: add new user columns, and update value of existing ones (maintaining the input data type)

       •   replace:  add  new user columns, and replace the data type and value of existing ones.  (Note that if
           tlmin/tlmax values are not specified in the replacing column,  but  are  specified  in  the  original
           column being replaced, then the original tlmin/tlmax values are used in the replacing column.)

       •   append: only add new columns, do not "replace" or "update" existing ones

       Consider the example above. If merge=update is specified in the plist string, then "energy" will be added
       to  the input columns, and the values of "pi", "pha", and "phas" will be taken from the user space (i.e.,
       the values will be updated from the original values, if they were changed by the program).  The data type
       for "pi", "pha", and "phas" will be the same as in the original file.   If  merge=replace  is  specified,
       both  the  data type and value of these three input columns will be changed to the data type and value in
       the user structure.  If merge=append is specified, none of these three columns will be updated, and  only
       the  "energy" column will be added. Note that in all cases, "status" will be written from the input data,
       not from the user record, since it was specified as read-only.

       Standard applications will call FunColumnSelect() to define user columns. However, if this routine is not
       called, the default behavior is to transfer all input columns into user space. For this purpose a default
       record structure is defined such that each data  element  is  properly  aligned  on  a  valid  data  type
       boundary.   This  mechanism  is  used by programs such as fundisp and funtable to process columns without
       needing to know the specific names of those columns.  It is not anticipated that  users  will  need  such
       capabilities (contact us if you do!)

       By default, FunColumnSelect() reads/writes rows to/from an "array of structs", where each struct contains
       the  column  values for a single row of the table. This means that the returned values for a given column
       are not contiguous. You can set up the IO to return a "struct of arrays" so that  each  of  the  returned
       columns  are  contiguous  by  specifying  org=structofarrays  (abbreviation: org=soa) in the plist.  (The
       default case is org=arrayofstructs or org=aos.)

       For example, the default setup to retrieve rows from a table would be to define a record structure for  a
       single event and then call
        FunColumnSelect() as follows:

         typedef struct evstruct{
           short region;
           double x, y;
           int pi, pha;
           double time;
         } *Ev, EvRec;

         got = FunColumnSelect(fun, sizeof(EvRec), NULL,
                               "x",       "D:10:10", mode, FUN_OFFSET(Ev, x),
                               "y",       "D:10:10", mode, FUN_OFFSET(Ev, y),
                               "pi",      "J",       mode, FUN_OFFSET(Ev, pi),
                               "pha",     "J",       mode, FUN_OFFSET(Ev, pha),
                               "time",    "1D",      mode, FUN_OFFSET(Ev, time),
                               NULL);

       Subsequently,  each  call to FunTableRowGet() will return an array of structs, one for each returned row.
       If instead you wanted to read columns into contiguous arrays, you specify org=soa:

         typedef struct aevstruct{
           short region[MAXROW];
           double x[MAXROW], y[MAXROW];
           int pi[MAXROW], pha[MAXROW];
           double time[MAXROW];
         } *AEv, AEvRec;

         got = FunColumnSelect(fun, sizeof(AEvRec), "org=soa",
                             "x",       "D:10:10", mode, FUN_OFFSET(AEv, x),
                             "y",       "D:10:10", mode, FUN_OFFSET(AEv, y),
                             "pi",      "J",       mode, FUN_OFFSET(AEv, pi),
                             "pha",     "J",       mode, FUN_OFFSET(AEv, pha),
                             "time",    "1D",      mode, FUN_OFFSET(AEv, time),
                             NULL);

       Note that the only modification to the call is in the plist string.

       Of course, instead of using statically allocated arrays,  you  also  can  specify  dynamically  allocated
       pointers:

         /* pointers to arrays of columns (used in struct of arrays) */
         typedef struct pevstruct{
           short *region;
           double *x, *y;
           int *pi, *pha;
           double *time;
         } *PEv, PEvRec;

         got = FunColumnSelect(fun, sizeof(PEvRec), "org=structofarrays",
                             "$region", "@I",       mode, FUN_OFFSET(PEv, region),
                             "x",       "@D:10:10", mode, FUN_OFFSET(PEv, x),
                             "y",       "@D:10:10", mode, FUN_OFFSET(PEv, y),
                             "pi",      "@J",       mode, FUN_OFFSET(PEv, pi),
                             "pha",     "@J",       mode, FUN_OFFSET(PEv, pha),
                             "time",    "@1D",      mode, FUN_OFFSET(PEv, time),
                             NULL);

       Here, the actual storage space is either allocated by the user or by the FunColumnSelect() call).

       In all of the above cases, the same call is made to retrieve rows, e.g.:

           buf = (void *)FunTableRowGet(fun, NULL, MAXROW, NULL, &got);

       However,  the  individual  data  elements are accessed differently.  For the default case of an "array of
       structs", the individual row records are accessed using:

         for(i=0; i<got; i++){
           ev = (Ev)buf+i;
           fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
                   ev->x, ev->y, ev->pi, ev->pha, ev->dx, ev->dy, ev->time);
         }

       For a struct of arrays or a struct of array pointers, we have a single struct  through  which  we  access
       individual columns and rows using:

         aev = (AEv)buf;
         for(i=0; i<got; i++){
           fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
                   aev->x[i], aev->y[i], aev->pi[i], aev->pha[i],
                   aev->dx[i], aev->dy[i], aev->time[i]);
         }

       Support for struct of arrays in the FunTableRowPut() call is handled analogously.

       See  the  evread  example  code and evmerge example code for working examples of how FunColumnSelect() is
       used.

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO