Provided by: public-inbox_1.9.0-1_all bug

NAME

       public-inbox-tuning - tuning public-inbox

DESCRIPTION

       public-inbox intends to support a wide variety of hardware.  While we strive to provide the best out-of-
       the-box performance possible, tuning knobs are an unfortunate necessity in some cases.

       1.  New inboxes: public-inbox-init -V2

       2.  Optional Inline::C use

       3.  Performance on rotational hard disk drives

       4.  Btrfs (and possibly other copy-on-write filesystems)

       5.  Performance on solid state drives

       6.  Read-only daemons

       7.  Other OS tuning knobs

       8.  Scalability to many inboxes

   New inboxes: public-inbox-init -V2
       If  you're starting a new inbox (and not mirroring an existing one), the -V2 requires DBD::SQLite, but is
       orders of magnitude more scalable than the original "-V1" format.

   Optional Inline::C use
       Our optional use of Inline::C speeds up subprocess spawning from large daemon processes.

       To enable Inline::C, either set the "PERL_INLINE_DIRECTORY" environment variable to point to  a  writable
       directory, or create "~/.cache/public-inbox/inline-c" for any user(s) running public-inbox processes.

       If  libgit2  development  files  are installed and Inline::C is enabled (described above), per-inbox "git
       cat-file --batch" processes are replaced with a single perl(1) process running  "PublicInbox::Gcf2::loop"
       in read-only daemons.  libgit2 use will be available in public-inbox 1.7.0+

       More  (optional)  Inline::C  use  will  be  introduced  in  the  future  to  lower memory use and improve
       scalability.

       Note: Inline::C is required for lei(1), but not public-inbox-*

   Performance on rotational hard disk drives
       Random I/O performance is poor on rotational HDDs.  Xapian indexing performance degrades significantly as
       DBs grow larger than available RAM.  Attempts to parallelize random I/O on  HDDs  leads  to  pathological
       slowdowns as inboxes grow.

       While   "-V2"   introduced   Xapian   shards   as   a   parallelization   mechanism  for  SSDs;  enabling
       "publicInbox.indexSequentialShard" repurposes sharding as mechanism  to  reduce  the  kernel  page  cache
       footprint when indexing on HDDs.

       Initializing  a mirror with a high "--jobs" count to create more shards (in "-V2" inboxes) will keep each
       shard smaller and reduce its kernel page cache footprint.  Keep in  mind  excessive  sharding  imposes  a
       performance penalty for read-only queries.

       Users  with  large  amounts  of  RAM are advised to set a large value for "publicinbox.indexBatchSize" as
       documented in public-inbox-index(1).

       "dm-crypt"   users   on    Linux    4.0+    are    advised    to    try    the    "--perf-same_cpu_crypt"
       "--perf-submit_from_crypt_cpus"  switches of cryptsetup(8) to reduce I/O contention from kernel workqueue
       threads.

   Btrfs (and possibly other copy-on-write filesystems)
       btrfs(5) performance degrades from fragmentation when using  large  databases  and  random  writes.   The
       Xapian + SQLite indices used by public-inbox are no exception to that.

       public-inbox  1.6.0+  disables  copy-on-write  (CoW)  on  Xapian  and  SQLite indices on btrfs to achieve
       acceptable performance (even on SSD).  Disabling copy-on-write also disables checksumming,  thus  "raid1"
       (or higher) configurations may be corrupt after unsafe shutdowns.

       Fortunately, these SQLite and Xapian indices are designed to recoverable from git if missing.

       Disabling  CoW  does  not  prevent  all fragmentation.  Large values of "publicInbox.indexBatchSize" also
       limit fragmentation during the initial index.

       Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.  Snapshots  use  CoW  despite  our
       efforts to disable it, resulting in fragmentation.

       filefrag(8) can be used to monitor fragmentation, and "btrfs filesystem defragment -fr $INBOX_DIR" may be
       necessary.

       Large filesystems benefit significantly from the "space_cache=v2" mount option documented in btrfs(5).

       Older, non-CoW filesystems are generally work well out-of-the-box for our Xapian and SQLite indices.

   Performance on solid state drives
       While  SSD  read  performance  is generally good, SSD write performance degrades as the drive ages and/or
       gets full.  Issuing "TRIM" commands via fstrim(8) or similar is required to sustain write performance.

       Users of the Flash-Friendly  File  System  F2FS  <https://en.wikipedia.org/wiki/F2FS>  may  benefit  from
       optimizations found in SQLite 3.21.0+.  Benchmarks are greatly appreciated.

   Read-only daemons
       public-inbox-httpd(1),  public-inbox-imapd(1),  and  public-inbox-nntpd(1)  are all designed for C10K (or
       higher) levels of concurrency from a single process.  SMP systems  may  use  "--worker-processes=NUM"  as
       documented in public-inbox-daemon(8) for parallelism.

       The open file descriptor limit ("RLIMIT_NOFILE", "ulimit -n" in sh(1), "LimitNOFILE=" in systemd.exec(5))
       may need to be raised to accommodate many concurrent clients.

       Transport  Layer  Security  (IMAPS,  NNTPS, or via STARTTLS) significantly increases memory use of client
       sockets, sure to account for that in capacity planning.

   Other OS tuning knobs
       Linux users: the "sys.vm.max_map_count" sysctl may need to be increased if handling thousands of  inboxes
       (with public-inbox-extindex(1)) to avoid out-of-memory errors from git.

       Other OSes may have similar tuning knobs (patches appreciated).

   Scalability to many inboxes
       public-inbox-extindex(1) allows any number of public-inboxes to share the same Xapian indices.

       git  2.33+ startup time is orders-of-magnitude faster and uses less memory when dealing with thousands of
       alternates required for thousands of inboxes with public-inbox-extindex(1).

       Frequent  packing  (via  git-gc(1))  both  improves  performance  and  reduces  the  need   to   increase
       "sys.vm.max_map_count".

CONTACT

       Feedback encouraged via plain-text mail to <mailto:meta@public-inbox.org>

       Information for *BSDs and non-traditional filesystems especially welcome.

       Our          archives          are          hosted          at          <https://public-inbox.org/meta/>,
       <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>, and other places

COPYRIGHT

       Copyright all contributors <mailto:meta@public-inbox.org>

       License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>

public-inbox.git                                   1993-10-02                             PUBLIC-INBOX-TUNING(7)