Provided by: recollcmd_1.36.1-1build2_amd64 bug

NAME

       recollindex - indexing command for the Recoll full text search system

SYNOPSIS

       recollindex -h
       recollindex [ -z|-Z ] [ -k ] [ --nopurge ] [ -P ] [ --diagsfile <diagpath> ]
       recollindex -m [ -w <secs>] [ -D ] [ -x ] [ -C ] [ -n|-k ]
       recollindex -i [ -Z -k -f -P ] [<path [path ...]>]
       recollindex -r [ -Z -K -e -f ] [ -p pattern ] <dirpath>
       recollindex -e [<path [path ...]>]
       recollindex -l|-S|-E
       recollindex -s <lang>
       recollindex --webcache-compact
       recollindex --webcache-burst <destdir>
       recollindex --notindexed [path [path ...]]

DESCRIPTION

       Create or update a Recoll index.

       There  are  several  modes  of operation. All modes support an optional -c <cfgdir> option to specify the
       configuration directory name, overriding the default or $RECOLL_CONFDIR (or $HOME/.recoll by default).

       The normal mode will index the set of files described in  the  configuration.   This  will  incrementally
       update  the  index  with  files that changed since the last run. If option -z is given, the index will be
       erased before starting. If option -Z is given, the index will  not  be  reset,  but  all  files  will  be
       considered as needing reindexing (in place reset).

       recollindex  does  not  process  again  files  which previously failed to index (for example because of a
       missing helper program). If option -k is given, recollindex will try again to process all  failed  files.
       Please  note  that  recollindex  may  also  decide to retry failed files if the auxiliary checking script
       defined by the "checkneedretryindexscript" configuration variable indicates that this should happen.

       The --nopurge option will disable the normal erasure of deleted documents from the  index.  This  can  be
       useful in special cases (when it is known that part of the document set is temporarily not accessible).

       The  -P  option  will force the purge pass. This is useful only if the idxnoautopurge parameter is set in
       the configuration file.

       If the option --diagsfile is  given,  the  path  given  as  parameter  will  be  truncated  and  indexing
       diagnostics will be written to it. Each line in the file will have a diagnostic type (reason for the file
       not  to be indexed), the file path, and a possible additional piece of information, which can be the MIME
       type or the archive internal path depending on the issue. The following diagnostic  types  are  currently
       defined:

              Skipped : the path matches an element of skippedPaths or skippedNames.

              NoContentSuffix : the file name suffix is found in the noContentSuffixes list.

              MissingHelper : a helper program is missing.

              Error : general error (see the log).

              NoHandler: no handler is defined for the MIME type.

              ExcludedMime : the MIME type is part of the excludedmimetypes list.

              NotIncludedMime : the onlymimetypes list is not empty and the the MIME type is not in it.

       If  option -m is given, recollindex is started for real time monitoring, using the file system monitoring
       package it was configured for (either fam, gamin, or  inotify).  This  mode  must  have  been  explicitly
       configured  when  building  the package, it is not available by default. The program will normally detach
       from the controlling terminal and become a daemon. If option -D is given, it will stay in the foreground.
       Option -w <seconds> can be used to specify that the program should sleep for the  specified  time  before
       indexing  begins. The default value is 60. The daemon normally monitors the X11 session and exits when it
       is reset.  Option -x disables this X11 session monitoring (daemon will  stay  alive  even  if  it  cannot
       connect  to  the  X11 server). You need to use this too if you use the daemon without an X11 context. You
       can use option -n to skip the initial incrementing pass which is  normally  performed  before  monitoring
       starts.  Once  monitoring  is  started,  the daemon normally monitors the configuration and restarts from
       scratch if a change is made. You can disable this with option -C

       recollindex -i will index individual files into the index. The stem expansion and aspell  databases  will
       not  be  updated.  The  skippedPaths  and skippedNames configuration variables will be used, so that some
       files may be skipped. You can tell recollindex to ignore skippedPaths and skippedNames by setting the  -f
       option.  This  allows  fully  custom  file selection for a given subtree, for which you would add the top
       directory to skippedPaths, and use any custom tool to generate the file list (ie: a tool  from  a  source
       code  control  system).  When run this way, the indexer normally does not perform the deleted files purge
       pass, because it cannot be sure to have seen all the existing files. You can force a purge pass with -P.

       recollindex -e will erase data for individual files from the index. The stem expansion databases will not
       be updated.

       Options -i and -e can be combined. This will first perform the purge, then the indexing.

       With options -i or -e , if no file names are given on the command line, they will be read from stdin,  so
       that you could for example run:

       find /path/to/dir -print | recollindex -e -i

       to  force  the  reindexing of a directory tree (which has to exist inside the file system area defined by
       topdirs in recoll.conf). You could mostly accomplish the same thing with

       find /path/to/dir -print | recollindex -Z -i

       The latter will perform a less thorough job of purging stale sub-documents though.

       recollindex -r mostly works like -i , but the parameter is a single directory, which will be  recursively
       updated. This mostly does nothing more than find topdir | recollindex -i but it may be more convenient to
       use when started from another program. This retries failed files by default, use option -K to change. One
       or multiple -p options can be used to set shell-type selection patterns (e.g.: *.pdf).

       recollindex -l will list the names of available language stemmers.

       recollindex  -s will build the stem expansion database for a given language, which may or may not be part
       of the list in the configuration file. If the language  is  not  part  of  the  configuration,  the  stem
       expansion  database  will  be deleted at the end of the next normal indexing run. You can get the list of
       stemmer names from the recollindex -l command. Note that this is mostly for experimental use, the  normal
       way  to  add a stemming language is to set it in the configuration, either by editing "recoll.conf" or by
       using the GUI indexing configuration dialog.
       At the time of this writing, the following languages are recognized (out of Xapian's stem.h):

       •      danish

       •      dutch

       •      english Martin Porter's 2002 revision of his stemmer

       •      english_lovins Lovin's stemmer

       •      english_porter Porter's stemmer as described in his 1980 paper

       •      finnish

       •      french

       •      german

       •      italian

       •      norwegian

       •      portuguese

       •      russian

       •      spanish

       •      swedish

       recollindex -S will rebuild the phonetic/orthographic index. This feature uses the aspell package,  which
       must be installed on the system.

       recollindex  -E will check the configuration file for topdirs and other relevant paths existence (to help
       catch typos).

       recollindex --webcache-compact will recover the space wasted by erased  page  instances  inside  the  Web
       cache. It may temporarily need to use twice the disk space used by the Web cache.

       recollindex  --webcache-burst  <destdir>  will  extract  all  entries from the Web cache to files created
       inside <destdir>. Each cache entry is extracted as two files, for the data and metadata.

       recollindex --notindexed [path [path ...]]  will check each path and print out  those  which  are  absent
       from  the  index  (with  an "ABSENT" prefix), or caused an indexing error (with an "ERROR" prefix). If no
       paths are given on the command line, the command will read them, one per line, from stdin.

       Interrupting the command: as indexing can sometimes take a long time, the command can be  interrupted  by
       sending  an  interrupt  (Ctrl-C,  SIGINT)  or terminate (SIGTERM) signal. Some time may elapse before the
       process exits, because it needs to properly flush and close the index. This can also  be  done  from  the
       recoll  GUI  (menu  entry:  File/Stop_Indexing).  After  such an interruption, the index will be somewhat
       inconsistent because some operations which are normally performed at the end of the  indexing  pass  will
       have  been  skipped (for example, the stemming and spelling databases will be inexistent or out of date).
       You just need to restart indexing at a later time to restore consistency. The indexing  will  restart  at
       the  interruption  point  (the  full  file  tree will be traversed, but files that were indexed up to the
       interruption and for which the index is still up to date will not need to be reindexed).

SEE ALSO

       recoll(1) recoll.conf(5)

                                                 8 January 2006                                   RECOLLINDEX(1)