Provided by: librulexdb-dev_3.8.6-1_amd64 bug

NAME

       rulexdb_open - open or create a rulex database

SYNOPSIS

       #include <rulexdb.h>

       RULEXDB *rulexdb_open(const char *path, int mode);

DESCRIPTION

       The  rulexdb_open()  function opens the rulex database in the file whose name is the string pointed to by
       path and allocates and initializes all necessary internal data structures associated with it.

       The argument mode specifies a database access mode. It may accept one of the following values:

       RULEXDB_SEARCH
              Open the database only for searching (read only mode).

       RULEXDB_UPDATE
              Open existing database for searching and updating (read and write mode).

       RULEXDB_CREATE
              Create new database and open it for updating and searching.

DATABASE STRUCTURE

       The rulex database consists of two dictionaries and four sets of rules. The Explicit dictionary  contains
       the  words  that  are  described  individually  and  do  not  imply any information for other forms. This
       dictionary is looked up first if the search includes this stage. The Implicit dictionary  contains  words
       in  some basic form. This dictionary is used to construct pronunciation string for various forms of these
       words. The basic form of a word is guessed according  to  the  rules  from  the  Classifiers  and  Prefix
       detectors  rulesets. This is the second stage of search process. If these stages do not bring a result or
       are not performed the rules from the General ruleset are used to guess stressing word. If no one of these
       rules can be applied than no guessing is made and search process fails.

       Externally all the data are represented textually. For the Russian letters the koi8-r  character  set  is
       used and only lower case is allowed.

       Each dictionary record consists of two fields. The first field contains Russian word that serves as a key
       when  searching. Only lowercase Russian letters are allowed here. The second field provides pronunciation
       string for this word. The pronunciation string is the word itself, but written in such  a  manner  as  it
       should  be  pronounced. There are three additional symbols allowed in the pronunciation string along with
       the lowercase Russian letters. The "+" sign can be used to point the stressed letter. It should be placed
       just after that letter. The "=" sign is used in some cases just in the same  manner  to  point  so-called
       weak  stress.  The "-" sign can serve as a separator in some complex words. All other symbols are treated
       as illegal.

       There are four rulesets in the database: General rules, Classifiers,  Prefix  detectors  and  Correctors.
       Externally  all  these  rules are represented by records consisting of one or two fields. The first field
       always contains a regular expression which is matched against the word to make a  decision  whether  this
       rule can be applied.

       The  only  task  of General rules is to guess stress in the words when dictionary lookup fails. The rules
       are tried sequentially until match or the list exhaustion.  If  match  succeeds  then  the  "+"  sign  is
       inserted into the word right after the first subexpression match to point stressing position.
        These rules do not contain a second field.

       For  the  Classifiers  ruleset each rule is checked one by one until match occurs. Then the part from the
       beginning of the word through to the end of the first subexpression match is extracted and  if  a  second
       field  is  present it is appended to the extracted part as a suffix. The resulting string is treated as a
       basic form of the word, so it is looked up in the Implicit dictionary.  If nothing is found  the  process
       continues until the ruleset will be exceeded.

       When nothing is found in the database for a word in its original form, Prefix detection rules are applied
       to  it  sequentially  until  match occurs. The matched prefix is stripped and replaced by the replacement
       string if any. Then the result word is searched in the Implicit dictionary. In the case  of  success  the
       original prefix is restored in the pronunciation string.

       The rules from Correctors ruleset are applied to the pronunciation strings instead of the original words.
       The  second  field  in  these  rules  specifies  a  regular  replacement  string  where  digits  serve as
       subexpression numbers.

RETURN VALUE

       Upon successful completion rulexdb_open() return a RULEXDB pointer that should be used in other  database
       access functions for referencing the database.  Otherwise, NULL is returned.

SEE ALSO

       rulexdb_classify(3),     rulexdb_close(3),     rulexdb_dataset_name(3),    rulexdb_discard_dictionary(3),
       rulexdb_discard_ruleset(3),    rulexdb_fetch_rule(3),    rulexdb_lexbase(3),     rulexdb_load_ruleset(3),
       rulexdb_remove_item(3),  rulexdb_remove_rule(3),  rulexdb_remove_this_item(3),  rulexdb_retrieve_item(3),
       rulexdb_search(3), rulexdb_seq(3), rulexdb_subscribe_item(3), rulexdb_subscribe_rule(3)

AUTHOR

       Igor B. Poretsky <poretsky@mlbox.ru>.

                                                February 19, 2012                                RULEXDB_OPEN(3)