Provided by: libbtparse-dev_0.89-1build3_amd64 bug

NAME

       btparse - C library for parsing and processing BibTeX data files

SYNOPSIS

          #include <btparse.h>

          /* Basic library initialization / cleanup */
          void bt_initialize (void);
          void bt_free_ast (AST *ast);
          void bt_cleanup (void);

          /* Input / interface to parser */
          void   bt_set_stringopts (bt_metatype_t metatype, btshort options);
          AST * bt_parse_entry_s (char *    entry_text,
                                  char *    filename,
                                  int       line,
                                  btshort    options,
                                  boolean * status);
          AST * bt_parse_entry   (FILE *    infile,
                                  char *    filename,
                                  btshort    options,
                                  boolean * status);
          AST * bt_parse_file    (char *    filename,
                                  btshort    options,
                                  boolean * overall_status);

          /* AST traversal/query */
          AST * bt_next_entry (AST * entry_list,
                               AST * prev_entry)
          AST * bt_next_field (AST *entry, AST *prev, char **name);
          AST * bt_next_value (AST *head,
                               AST *prev,
                               bt_nodetype_t *nodetype,
                               char **text);

          bt_metatype_t bt_entry_metatype (AST *entry);
          char *bt_entry_type (AST *entry);
          char *bt_entry_key (AST *entry);
          char *bt_get_text (AST *node);

          /* Splitting names and lists of names */
          bt_stringlist * bt_split_list (char *   string,
                                         char *   delim,
                                         char *   filename,
                                         int      line,
                                         char *   description);
          void bt_free_list (bt_stringlist *list);
          bt_name * bt_split_name (char *  name,
                                   char *  filename,
                                   int     line,
                                   int     name_num);
          void bt_free_name (bt_name * name);

          /* Formatting names */
          bt_name_format * bt_create_name_format (char * parts, boolean abbrev_first);
          void bt_free_name_format (bt_name_format * format);
          void bt_set_format_text (bt_name_format * format,
                                   bt_namepart part,
                                   char * pre_part,
                                   char * post_part,
                                   char * pre_token,
                                   char * post_token);
          void bt_set_format_options (bt_name_format * format,
                                      bt_namepart part,
                                      boolean abbrev,
                                      bt_joinmethod join_tokens,
                                      bt_joinmethod join_part);
          char * bt_format_name (bt_name * name, bt_name_format * format);

          /* Construct tree from TeX groups */
          bt_tex_tree * bt_build_tex_tree (char * string);
          void          bt_free_tex_tree (bt_tex_tree **top);
          void          bt_dump_tex_tree (bt_tex_tree *node, int depth, FILE *stream);
          char *        bt_flatten_tex_tree (bt_tex_tree *top);

          /* Miscellaneous string utilities */
          void bt_purify_string (char * string, btshort options);
          void bt_change_case (char transform, char * string, btshort options);

DESCRIPTION

       btparse is a C library for parsing and processing BibTeX files.  It provides a lexical scanner and LR
       parser (constructed by PCCTS), both of which are efficient and offer good error detection and recovery; a
       set of functions for traversing the AST (abstract syntax tree) generated by the parser; and utility
       functions for manipulating strings according to BibTeX conventions.  (Note that nothing in the library
       assumes that you're using BibTeX files for their original purpose of bibliographic data for scholarly
       publications; you could use the file format for any conceivable purpose that fits it.  However, there is
       some code in the library that is really only appropriate for use with strings meant to be processed in
       the same way that BibTeX itself does.  This is all entirely optional, though.)

       Note that the interface provided by btparse, while complete, is fairly low-level.  If you have more
       sophisticated needs, you might be interested my "Text::BibTeX" module for Perl 5 (available on CPAN).

CONCEPTS AND TERMINOLOGY

       To understand this document and use btparse, you should already be familiar with the BibTeX
       language---more specifically, the BibTeX data description language.  (BibTeX being the complex beast that
       it is, one can conceive of the term applying to the program, the data language, the particular database
       structure described in the original BibTeX documentation, the ".bst" formatting language, and the set of
       conventions embodied in the standard styles included with the BibTeX distribution.  In this document,
       I'll stick to the first two meanings---the data language because that's what btparse deals with, and the
       program because it's occasionally necessary to explain differences between my parser and BibTeX's.)

       In particular, you should have a good idea what's going on in the following:

          @string{and = { and },
                  joe = "Blow, Joe",
                  john = "John Smith"}

          @book(ourbook,
                author = joe # and # john,
                title = {Our Little Book})

       If this looks like something you want to parse, but don't want to have to write your own parser for,
       you've come to the right place.

       Before going much further, though, you're going to have to learn some of the terminology I use for
       describing BibTeX data.  Most of it's the same as you'll find in any BibTeX documentation, but it's
       important to be sure that we're talking about the same things here.  So, some definitions:

       top-level
           All  text  in  a  BibTeX file from the start of the file to the start of the first entry, and between
           entries thereafter.

       name
           A string of letters, digits, and the following characters:

              ! $ & * + - . / : ; < > ? [ ] ^ _ ` |

           A "name" is a catch-all used for entry types, entry keys, and field  and  macro  names.   For  BibTeX
           compatibility,  there  are slightly different rules for these four entities; currently, the only such
           rule actually implemented is that field and macro names may not begin with a digit.   Some  names  in
           the above example: "string", "and".

       entry
           A  chunk  of text starting with an "at" sign ("@") at top-level, followed by a name (the entry type),
           an entry delimiter ("{" or "("), and proceeding to the matching closing delimiter.   Also,  the  data
           structure that results from parsing this chunk of text.  There are two entries in the above example.

       entry type
           The name that comes right after an "@" at top-level.  Examples from above: "string", "book".

       entry metatype
           A  classification  of  entry  types  that  allows  us to group one or more entry types under the same
           heading.  With the standard BibTeX database structure, "article", "book",  "inbook",  etc.  all  fall
           under  the  "regular entry" metatype.  Other metatypes are "macro definition" (for "string" entries),
           "preamble" (for "preamble") entries, and "comment" ("comment" entries).  In  fact,  any  entry  whose
           type is not one of "string", "preamble", or "comment" is called a "regular" entry.

       entry delimiters
           "{"  and  "}",  or "(" and ")": the pair of characters that (almost) mark the boundaries of an entry.
           "Almost" because the start of an entry is marked by an "@", not by the "entry open" delimiter.

       entry key
           (Or just key when it's clear what we're speaking of.)  The name immediately following the entry  open
           delimiter  in  a  regular entry, which uniquely identifies the entry.  Example from above: "ourbook".
           Only regular entries have keys.

       field
           A name to the left of an equals sign in a regular or macro-definition entry.  In the latter  context,
           might also be called a macro name.  Examples from above: "joe", "author".

       field list
           In  a  regular  entry,  everything between the entry delimiters except for the entry key.  In a macro
           definition entry, everything between the entry delimiters (possibly also called a macro list).

       compound value
           (Usually just "value".)  The text that follows an equals sign ("=") in a regular or macro  definition
           entry, up to a comma or the entry close delimiter; a list of one or more simple values joined by hash
           signs ("#").

       simple value
           A string, macro, or number.

       string
           (Or,  sometimes,  "quoted  string.")   A  chunk of text between quotes (""") or braces ("{" and "}").
           Braces must balance: "{this is a {string}" is not a BibTeX string, but "{this  is  a  {string}}"  is.
           ("this  is  a  {string"  is  also  illegal,  mainly  to avoid the possibility of generating bogus TeX
           code--which BibTeX will do in certain cases.)

       macro
           A name that appears on the right-hand side of an equals sign (i.e. as one simple value in a  compound
           value).  Implies that this name was defined as a macro in an earlier macro definition entry, but this
           is only checked if btparse is being asked to expand macros to their full definitions.

       number
           An unquoted string of digits.

       Working  with  btparse  generally  consists of passing the library some BibTeX data (or a source for some
       BibTeX data, such as a filename or a file pointer), which it then lexically scans, parses, and constructs
       an abstract syntax tree (AST) from.  It returns this AST to you, and you call other btparse functions  to
       traverse and query the tree.

       The  contents  of AST nodes are the private domain of the library, and you shouldn't go poking into them.
       This being C, though, there's nothing  to  prevent  you  from  doing  so  except  good  manners  and  the
       possibility  that  I  might change the AST structure in future releases, breaking any badly-behaved code.
       Also, it's not necessary to know the structural relationships between nodes  in  the  AST---that's  taken
       care of by the query/traversal functions.

       However,  it's  useful  to  know  some  of the things that btparse deposits in the AST and returns to you
       through those query/traversal functions.  First off, each node has  a  "node  type,"  which  records  the
       syntactic element corresponding to each node.  For instance, the entry

          @book{mybook, author = "Joe Blow", title = "My Little Book"}

       is  rooted  by  an  "entry" node; under this would be found a "key" node (for the entry key), two "field"
       nodes (for the "author" and "title" fields); and associated with each field  node  would  be  a  "string"
       node.   The  only  time this concerns you is when you ask the library for a simple value; just looking at
       the text is not enough to distinguish quoted strings, numbers, and macro names, so  btparse  returns  the
       nodetype as well.

       In addition to the nodetype, btparse records the metatype of each "entry" node.  This allows you (and the
       library) to distinguish, say, regular entries from comment entries.  Not only do they have very different
       structures  and  must  therefore be traversed differently by the library, but certain traversal functions
       make no sense on certain entry metatypes---thus it's necessary for you to be able to make the distinction
       as well.

       That said, everything you need to know to work with the AST is explained in bt_traversal.

DATA TYPES AND MACROS

       btparse defines several types required for  the  external  interface.   First,  it  trivially  defines  a
       "boolean"  type  (along  with  "TRUE"  and  "FALSE"  macros).   This  might affect you when including the
       btparse.h header in your own code---since it's not possible for the code to detect if there is already  a
       "boolean"  type  defined,  you  might have to define the "HAVE_BOOLEAN" pre-processor token to deactivate
       btparse.h's "typedef" of "boolean".

       Next, two enumeration types are defined:  "bt_metatype"  and  "bt_nodetype".   Both  of  these  are  used
       extensively  in  the  library  itself, and are made available to users of the library because they can be
       found in nodes of the "btparse" AST (abstract syntax  tree).   (I.e.,  querying  the  AST  can  give  you
       "bt_metatype" and "bt_nodetype" values, so the "typedef"s must be available to your code.)

   Entry metatype enum
       "bt_metatype_t" has the following values:

       •   "BTE_UNKNOWN"

       •   "BTE_REGULAR"

       •   "BTE_COMMENT"

       •   "BTE_PREAMBLE"

       •   "BTE_MACRODEF"

       which  are  determined  by  the  "entry  type" token.  (@string entries have the "BTE_MACRODEF" metatype;
       @comment and @preamble correspond to "BTE_COMMENT" and "BTE_PREAMBLE"; and any other entry type  has  the
       "BTE_REGULAR" metatype.)

   AST nodetype enum
       "bt_nodetype" has the following values:

       •   "BTAST_UNKNOWN"

       •   "BTAST_ENTRY"

       •   "BTAST_KEY"

       •   "BTAST_FIELD"

       •   "BTAST_STRING"

       •   "BTAST_NUMBER"

       •   "BTAST_MACRO"

       Of  these,  you'll  only  ever  deal with the last three.  They are returned when you query the AST for a
       simple value---just seeing the text isn't enough to distinguish between a quoted string, a number, and  a
       macro, so the AST nodetype is supplied along with the text.

   String processing option macros
       Since  BibTeX  is  essentially  a  system  for  glueing  strings  together in a wide variety of ways, the
       processing done to its strings is fairly important.  Most of the string transformations are done  outside
       of   the  lexer/parser;  this  reduces  their  complexity,  and  makes  it  easier  to  switch  different
       transformations on and off.  This switching is done with an "options" bitmap which can be specified on  a
       per-entry-metatype  basis.   (That is, you can have one set of transformations done to the strings in all
       regular entries, another set done to the strings in all macro definition entries, and  so  on.)   If  you
       need  finer  control than that, it's currently unavailable outside of the library (but it's just a matter
       of making a couple functions available and documenting them---so bug me if you need this feature).

       There are three basic macros for constructing this bitmap:

       "BTO_CONVERT"
           Convert "number" values to strings.  (The conversion is trivial, involving changing the type  of  the
           AST  node  representing the number from "BTAST_NUMBER" to "BTAST_STRING".  "Number" values are stored
           as strings of digits, just as they are in the input data.)

       "BTO_EXPAND"
           Expand macro invocations to the full macro text.

       "BTO_PASTE"
           Paste simple values together.

       "BTO_COLLAPSE"
           Collapse whitespace according to the BibTeX rules.

       For instance, supplying "BTO_CONVERT | BTO_EXPAND" as the string options  bitmap  for  the  "BTE_REGULAR"
       metatype  means  that  all  simple values in "regular" entries will be converted to strings: numbers will
       simply have their "nodetype" changed, and macros will be expanded.  Nothing else  will  be  done  to  the
       simple  values,  though---they  will  not  be  concatenated,  nor  will whitespace be collapsed.  See the
       bt_set_stringopts() and "bt_parse_*()" functions in bt_input for more information on the various  options
       for parsing; see bt_postprocess for details on the post-processing.

USING THE LIBRARY

       The following code is a skeletal example of using the btparse library:

           #include <btparse.h>

           int main (void)
           {
              bt_initialize ();

              /* process some data */

              bt_cleanup ();
              exit (0);
           }

       Please  note  the  call to bt_initialize(); this is very important!  Without it, the library may crash or
       fail  mysteriously.   You  must  call  bt_initialize()  before  calling  any  other  btparse   functions.
       bt_cleanup()  just  frees  the  memory allocated by bt_initialize(); if you are careful to call it before
       exiting, and bt_free_ast() on any abstract syntax trees generated by btparse when you are done with them,
       then your program shouldn't have any memory leaks.  (Unless they're due to your own code, of course!)

BUGS AND LIMITATIONS

       btparse has several inherent limitations that are due to the lexical  scanner  and  parser  generated  by
       PCCTS 1.x.  In short, the scanner and parser are both heavily dependent on global variables, meaning that
       thread safety -- or even the ability to have two files open and being parsed at the same time -- is well-
       nigh  impossible.   This will not change until I get with the times and adopt ANTLR 2.0, the successor to
       PCCTS -- presuming of course that it can generate more modular C scanners and parsers.

       Another limitation that is due to PCCTS: entries with a large number of fields (more than  about  90,  if
       each field value is just a single string) will cause the parser to crash.  This is unavoidable due to the
       parser using statically-allocated stacks for attributes and abstract-syntax tree nodes.  I could increase
       the  static allocation, but that would just decrease the likelihood of encountering the problem, not make
       it go away.  Again, the chances of this changing as long as I'm using PCCTS 1.x are nil.

       Apart from those inherent limitations, there are no known bugs in btparse.  Any  segmentation  faults  or
       bus  errors  from  the  library  should  be considered bugs.  They probably result from using the library
       incorrectly (eg. attempting to interleave the parsing of two files), but I do make an  attempt  to  catch
       all such mistakes, and if I've missed any I'd like to know about it.

       Any  memory leaks from the library are also a concern; as long as you are conscientious about calling the
       cleanup functions (bt_free_ast() and bt_cleanup()), then the library shouldn't leak.

SEE ALSO

       To read and parse BibTeX data files, see bt_input.

       To traverse the syntax tree that results, see bt_traversal.

       To learn what is done to values in parsed entries, and how to customize that munging, see bt_postprocess.

       To learn how btparse deals with strings, see bt_strings (oops, I haven't written this one yet!).

       To manipulate and access the btparse macro table, see bt_macros.

       For splitting author names and lists "the BibTeX way" using btparse, bt_split_names.

       To put author names back together again, see bt_format_names.

       Miscellaneous functions for processing strings "the BibTeX way": bt_misc.

       A semi-formal language definition is in bt_language.

AUTHOR

       Greg Ward <gward@python.net>

COPYRIGHT

       Copyright (c) 1996-97 by Gregory P. Ward.

       This library is free software; you can redistribute it and/or modify  it  under  the  terms  of  the  GNU
       Library  General  Public  License  as  published by the Free Software Foundation; either version 2 of the
       License, or (at your option) any later version.

       This library is distributed in the hope that it will be useful, but WITHOUT ANY  WARRANTY;  without  even
       the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Library General
       Public License for more details.

       You  should  have  received  a copy of the GNU Library General Public License along with this library; if
       not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

AVAILABILITY

       The btOOL home page, where you can get up-to-date information about  btparse  (and  download  the  latest
       version) is

          http://starship.python.net/~gward/btOOL/

       You will also find the latest version of Text::BibTeX, the Perl library that provides a high-level front-
       end to btparse, there.  btparse is needed to build "Text::BibTeX", and must be downloaded separately.

       Both    libraries    are   also   available   on   CTAN   (the   Comprehensive   TeX   Archive   Network,
       "http://www.ctan.org/tex-archive/")   and    CPAN    (the    Comprehensive    Perl    Archive    Network,
       "http://www.cpan.org/").   Look  in  biblio/bibtex/utils/btOOL/  on CTAN, and authors/Greg_Ward/ on CPAN.
       For example,

          http://www.ctan.org/tex-archive/biblio/bibtex/utils/btOOL/
          http://www.cpan.org/authors/Greg_Ward

       will both get you to the latest version of "Text::BibTeX" and btparse -- but of course, you should always
       access busy sites like CTAN and CPAN through a mirror.

btparse, version 0.89                              2024-03-31                           btparse::doc::btparse(3)