Provided by: public-inbox_1.9.0-1_all bug

NAME

       public-inbox-index - create and update search indices

SYNOPSIS

       public-inbox-index [OPTIONS] INBOX_DIR...

       public-inbox-index [OPTIONS] --all

DESCRIPTION

       public-inbox-index creates and updates the search, overview and NNTP article number database used by the
       read-only public-inbox HTTP and NNTP interfaces.  Currently, this requires DBD::SQLite and DBI Perl
       modules.  Search::Xapian is optional, only to support the PSGI search interface.

       Once the initial indices are created by public-inbox-index, public-inbox-mda(1) and public-inbox-watch(1)
       will automatically maintain them.

       Running this manually to update indices is only required if relying on git-fetch(1) to mirror an existing
       public-inbox; or if upgrading to a new version of public-inbox using the "--reindex" option.

       Having the overview and article number database is essential to running the NNTP interface, and strongly
       recommended for the HTTP interface as it provides thread grouping in addition to normal search
       functionality.

OPTIONS

       -j JOBS
       --jobs=JOBS
           Influences the number of Xapian indexing shards in a (public-inbox-v2-format(5)) inbox.

           See "--jobs" in public-inbox-init(1) for a full description of sharding.

           "--jobs=0" is accepted as of public-inbox 1.6.0 to disable parallel indexing regardless of the number
           of pre-existing shards.

           If  the  inbox  has  not  been  indexed or initialized, "JOBS - 1" shards will be created (one job is
           always needed for indexing the overview and article number mapping).

           Default: the number of existing Xapian shards

       -c
       --compact
           Compacts the Xapian DBs after indexing.  This is recommended when using "--reindex" to avoid  running
           out of disk space while indexing multiple inboxes.

           While  option  takes  a  negligible  amount  of time compared to "--reindex", it requires temporarily
           duplicating the entire contents of the Xapian DB.

           This switch may be specified twice, in which case compaction happens both before and  after  indexing
           to minimize the temporal footprint of the (re)indexing operation.

           Available since public-inbox 1.4.0.

       --reindex
           Forces  a re-index of all messages in the inbox.  This can be used for in-place upgrades and bugfixes
           while NNTP/HTTP server processes are utilizing the index.  Keep in mind this roughly doubles the size
           of the already-large Xapian database.  Using this with "--compact" or running public-inbox-compact(1)
           afterwards is recommended to release free space.

           public-inbox protects writes to various indices  with  flock(2),  so  it  is  safe  to  reindex  (and
           rethread) while public-inbox-watch(1), public-inbox-mda(1) or public-inbox-learn(1) run.

           This  does  not  touch  the  NNTP  article  number  database.   It  does  not affect threading unless
           "--rethread" is used.

       --all
           Index all inboxes configured  in  ~/.public-inbox/config.   This  is  an  alternative  to  specifying
           individual inboxes directories on the command-line.

       --rethread
           Regenerate internal THREADID and message thread associations when reindexing.

           This  fixes  some  bugs  in older versions of public-inbox.  While it is possible to use this without
           "--reindex", it makes little sense to do so.

           Available in public-inbox 1.6.0+.

       --prune
           Run git-gc(1) to prune and expire reflogs if discontiguous history is detected.  This is intended  to
           be  used  in  mirrors  after  running public-inbox-edit(1) or public-inbox-purge(1) to ensure data is
           expunged from mirrors.

           Available since public-inbox 1.2.0.

       --max-size SIZE
           Sets    or    overrides    "publicinbox.indexMaxSize"    on    a    per-invocation    basis.      See
           "publicinbox.indexMaxSize" below.

           Available since public-inbox 1.5.0.

       --batch-size SIZE
           Sets    or    overrides    "publicinbox.indexBatchSize"    on    a    per-invocation    basis.    See
           "publicinbox.indexBatchSize" below.

           When  using  rotational  storage  but  abundant  RAM,  using  a  large  value  (e.g.   "500m")   with
           "--sequential-shard" can significantly speed up and reduce fragmentation during the initial index and
           full "--reindex" invocations (but not incremental updates).

           Available in public-inbox 1.6.0+.

       --no-fsync
           Disables  fsync(2)  and  fdatasync(2)  operations  on SQLite and Xapian.  This is only effective with
           Xapian 1.4+.  This  is  primarily  intended  for  systems  with  low  RAM  and  the  small  (default)
           "--batch-size=1m".   Users  of  large  "--batch-size" may even find disabling fdatasync(2) causes too
           much dirty data to accumulate, resulting on latency spikes from writeback.

           Available in public-inbox 1.6.0+.

       --dangerous
           Speed up initial index by using in-place updates and denying support for concurrent readers.  This is
           only effective with Xapian 1.4+.

           Available in public-inbox 1.8.0+

       --sequential-shard
           Sets   or   overrides   "publicinbox.indexSequentialShard"   on   a   per-invocation   basis.     See
           "publicinbox.indexSequentialShard" below.

           Available in public-inbox 1.6.0+.

       --skip-docdata
           Stop storing document data in Xapian on an existing inbox.

           See "--skip-docdata" in public-inbox-init(1) for description and caveats.

           Available in public-inbox 1.6.0+.

       -E EXTINDEX
       --update-extindex=EXTINDEX
           Update the given external index (public-inbox-extindex-format(5).  Either the configured section name
           (e.g. "all") or a directory name may be specified.

           Defaults to "all" if "[extindex "all"]" is configured, otherwise no external indices are updated.

           May be specified multiple times in rare cases where multiple external indices are configured.

       --no-update-extindex
           Do   not   update   the  "all"  external  index  by  default.   This  negates  all  uses  of  "-E"  /
           "--update-extindex=" on the command-line.

       --since=DATESTRING
       --after=DATESTRING
       --until=DATESTRING
       --before=DATESTRING
           Passed directly to git-log(1) to limit changes for "--reindex"

FILES

       For v1 (ssoma) repositories described in public-inbox-v1-format(5).  All public-inbox-specific files  are
       contained within the "$GIT_DIR/public-inbox/" directory.

       v2 inboxes are described in public-inbox-v2-format(5).

CONFIGURATION

       publicinbox.indexMaxSize
               Prevents  indexing of messages larger than the specified size value.  A single suffix modifier of
               "k", "m" or "g" is supported, thus the value of "1m" to prevents indexing of messages larger than
               one megabyte.

               This is useful for  avoiding  memory  exhaustion  in  mirrors  via  git.   It  does  not  prevent
               public-inbox-mda(1) or public-inbox-watch(1) from importing (and indexing) a message.

               This option is only available in public-inbox 1.5 or later.

               Default: none

       publicinbox.indexBatchSize
               Flushes  changes  to  the filesystem and releases locks after indexing the given number of bytes.
               The default value of "1m" (one megabyte) is low to minimize memory use and reduce contention with
               parallel invocations of public-inbox-mda(1), public-inbox-learn(1), and public-inbox-watch(1).

               Increase this value on powerful systems to improve throughput at the expense of memory use.   The
               reduction  of  lock  granularity  may not be noticeable on fast systems.  With SSDs, values above
               "4m" have little benefit.

               For public-inbox-v2-format(5) inboxes, this value is multiplied by the number of  Xapian  shards.
               Thus  a typical v2 inbox with 3 shards will flush every 3 megabytes by default unless parallelism
               is disabled via "--sequential-shard" or "--jobs=0".

               This influences memory usage of Xapian, but it is not exact.  The actual memory  used  by  Xapian
               and Perl has been observed in excess of 10x this value.

               This  option  is  available  in public-inbox 1.6 or later.  public-inbox 1.5 and earlier used the
               current default, "1m".

               Default: 1m (one megabyte)

       publicinbox.indexSequentialShard
               For public-inbox-v2-format(5) inboxes, setting this to "true" allows indexing  Xapian  shards  in
               multiple  passes.   This  speeds  up  indexing  on  rotational  storage with high seek latency by
               allowing individual shards to fit into the kernel page cache.

               Using a higher-than-normal number of "--jobs" with public-inbox-init(1) may be required to ensure
               individual shards are small enough to fit into cache.

               Warning: interrupting "public-inbox-index(1)" while this option is in use may  leave  the  search
               indices  out-of-date  with respect to SQLite databases.  WWW and IMAP users may notice incomplete
               search results, but it is otherwise non-fatal.  Using "--reindex" will bring everything back  up-
               to-date.

               Available in public-inbox 1.6.0+.

               This is ignored on public-inbox-v1-format(5) inboxes.

               Default: false, shards are indexed in parallel

       publicinbox.<name>.indexSequentialShard
               Identical to "publicinbox.indexSequentialShard", but only affect the inbox matching <name>.

ENVIRONMENT

       PI_CONFIG
               Used to override the default "~/.public-inbox/config" value.

       XAPIAN_FLUSH_THRESHOLD
               The number of documents to update before committing changes to disk.  This environment is handled
               directly by Xapian, refer to Xapian API documentation for more details.

               For public-inbox 1.6 and later, use "publicinbox.indexBatchSize" instead.

               Setting  "XAPIAN_FLUSH_THRESHOLD"  or  "publicinbox.indexBatchSize"  for  a large "--reindex" may
               cause public-inbox-mda(1), public-inbox-learn(1) and public-inbox-watch(1) tasks to wait long and
               unpredictable periods of time during "--reindex".

               Default: none, uses "publicinbox.indexBatchSize"

UPGRADING

       Occasionally, public-inbox will update it's schema version and require  a  full  index  by  running  this
       command.

CONTACT

       Feedback welcome via plain-text mail to <mailto:meta@public-inbox.org>

       The       mail       archives      are      hosted      at      <https://public-inbox.org/meta/>      and
       <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>

COPYRIGHT

       Copyright all contributors <mailto:meta@public-inbox.org>

       License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>

SEE ALSO

       Search::Xapian, DBD::SQLite, public-inbox-extindex-format(5)

public-inbox.git                                   1993-10-02                              PUBLIC-INBOX-INDEX(1)