Provided by: manpages_6.9.1-1_all bug

NAME

       /proc/sys/vm/ - virtual memory subsystem

DESCRIPTION

       /proc/sys/vm/
              This directory contains files for memory management tuning, buffer, and cache management.

       /proc/sys/vm/admin_reserve_kbytes (since Linux 3.10)
              This  file  defines  the  amount of free memory (in KiB) on the system that should be reserved for
              users with the capability CAP_SYS_ADMIN.

              The default value in this file is the minimum of [3% of free pages, 8MiB] expressed as  KiB.   The
              default  is  intended  to  provide  enough  for  the  superuser  to  log in and kill a process, if
              necessary, under the default overcommit 'guess' mode (i.e., 0 in /proc/sys/vm/overcommit_memory).

              Systems running in "overcommit never" mode  (i.e.,  2  in  /proc/sys/vm/overcommit_memory)  should
              increase  the  value in this file to account for the full virtual memory size of the programs used
              to recover (e.g., login(1) ssh(1), and top(1)) Otherwise, the superuser may not be able to log  in
              to recover the system.  For example, on x86-64 a suitable value is 131072 (128MiB reserved).

              Changing the value in this file takes effect whenever an application requests memory.

       /proc/sys/vm/compact_memory (since Linux 2.6.35)
              When  1  is  written  to  this file, all zones are compacted such that free memory is available in
              contiguous  blocks  where  possible.   The  effect  of  this  action  can  be  seen  by  examining
              /proc/buddyinfo.

              Present only if the kernel was configured with CONFIG_COMPACTION.

       /proc/sys/vm/drop_caches (since Linux 2.6.16)
              Writing  to  this  file  causes the kernel to drop clean caches, dentries, and inodes from memory,
              causing that memory to become free.   This  can  be  useful  for  memory  management  testing  and
              performing  reproducible  filesystem benchmarks.  Because writing to this file causes the benefits
              of caching to be lost, it can degrade overall system performance.

              To free pagecache, use:

                  echo 1 > /proc/sys/vm/drop_caches

              To free dentries and inodes, use:

                  echo 2 > /proc/sys/vm/drop_caches

              To free pagecache, dentries, and inodes, use:

                  echo 3 > /proc/sys/vm/drop_caches

              Because writing to this file is a nondestructive operation and dirty objects are not freeable, the
              user should run sync(1) first.

       /proc/sys/vm/sysctl_hugetlb_shm_group (since Linux 2.6.7)
              This writable file contains a group ID that is allowed to allocate memory using huge pages.  If  a
              process  has  a filesystem group ID or any supplementary group ID that matches this group ID, then
              it  can  make  huge-page  allocations   without   holding   the   CAP_IPC_LOCK   capability;   see
              memfd_create(2), mmap(2), and shmget(2).

       /proc/sys/vm/legacy_va_layout (since Linux 2.6.9)
              If  nonzero,  this  disables  the new 32-bit memory-mapping layout; the kernel will use the legacy
              (2.4) layout for all processes.

       /proc/sys/vm/memory_failure_early_kill (since Linux 2.6.32)
              Control how to kill processes when an uncorrected memory error  (typically  a  2-bit  error  in  a
              memory module) that cannot be handled by the kernel is detected in the background by hardware.  In
              some  cases  (like the page still having a valid copy on disk), the kernel will handle the failure
              transparently without affecting any applications.  But if there is no other up-to-date copy of the
              data, it will kill processes to prevent any data corruptions from propagating.

              The file has one of the following values:

              1      Kill all processes that have the corrupted-and-not-reloadable page mapped as  soon  as  the
                     corruption  is detected.  Note that this is not supported for a few types of pages, such as
                     kernel internally allocated data or the swap cache, but works  for  the  majority  of  user
                     pages.

              0      Unmap  the  corrupted page from all processes and kill a process only if it tries to access
                     the page.

              The kill is performed using a SIGBUS signal with si_code  set  to  BUS_MCEERR_AO.   Processes  can
              handle this if they want to; see sigaction(2) for more details.

              This  feature  is  active only on architectures/platforms with advanced machine check handling and
              depends on the hardware capabilities.

              Applications can override the memory_failure_early_kill setting  individually  with  the  prctl(2)
              PR_MCE_KILL operation.

              Present only if the kernel was configured with CONFIG_MEMORY_FAILURE.

       /proc/sys/vm/memory_failure_recovery (since Linux 2.6.32)
              Enable memory failure recovery (when supported by the platform).

              1      Attempt recovery.

              0      Always panic on a memory failure.

              Present only if the kernel was configured with CONFIG_MEMORY_FAILURE.

       /proc/sys/vm/oom_dump_tasks (since Linux 2.6.25)
              Enables a system-wide task dump (excluding kernel threads) to be produced when the kernel performs
              an  OOM-killing.   The  dump  includes  the following information for each task (thread, process):
              thread ID, real user ID, thread group ID (process ID), virtual memory size, resident set size, the
              CPU that the task is scheduled on, oom_adj score (see the description of  /proc/pid/oom_adj),  and
              command  name.   This  is  helpful to determine why the OOM-killer was invoked and to identify the
              rogue task that caused it.

              If this contains the value zero, this information is  suppressed.   On  very  large  systems  with
              thousands  of  tasks,  it  may  not be feasible to dump the memory state information for each one.
              Such systems should not be forced to incur a  performance  penalty  in  OOM  situations  when  the
              information may not be desired.

              If  this  is  set  to  nonzero, this information is shown whenever the OOM-killer actually kills a
              memory-hogging task.

              The default value is 0.

       /proc/sys/vm/oom_kill_allocating_task (since Linux 2.6.24)
              This enables or disables killing the OOM-triggering task in out-of-memory situations.

              If this is set to zero, the OOM-killer will scan through the entire tasklist  and  select  a  task
              based  on  heuristics  to kill.  This normally selects a rogue memory-hogging task that frees up a
              large amount of memory when killed.

              If this is set to nonzero, the OOM-killer simply kills the task that triggered  the  out-of-memory
              condition.  This avoids a possibly expensive tasklist scan.

              If  /proc/sys/vm/panic_on_oom  is  nonzero,  it  takes  precedence  over whatever value is used in
              /proc/sys/vm/oom_kill_allocating_task.

              The default value is 0.

       /proc/sys/vm/overcommit_kbytes (since Linux 3.14)
              This writable file provides an alternative to /proc/sys/vm/overcommit_ratio  for  controlling  the
              CommitLimit  when  /proc/sys/vm/overcommit_memory has the value 2.  It allows the amount of memory
              overcommitting to be specified as an absolute value (in kB), rather than as a  percentage,  as  is
              done  with overcommit_ratio.  This allows for finer-grained control of CommitLimit on systems with
              extremely large memory sizes.

              Only one of overcommit_kbytes or overcommit_ratio can have an effect: if overcommit_kbytes  has  a
              nonzero  value,  then  it  is  used  to calculate CommitLimit, otherwise overcommit_ratio is used.
              Writing a value to either of these files causes the value in the other file to be set to zero.

       /proc/sys/vm/overcommit_memory
              This file contains the kernel virtual memory accounting mode.  Values are:

                     0: heuristic overcommit (this is the default)
                     1: always overcommit, never check
                     2: always check, never overcommit

              In mode 0, calls of mmap(2) with MAP_NORESERVE are not checked, and  the  default  check  is  very
              weak, leading to the risk of getting a process "OOM-killed".

              In mode 1, the kernel pretends there is always enough memory, until memory actually runs out.  One
              use  case  for  this  mode  is  scientific computing applications that employ large sparse arrays.
              Before Linux 2.6.0, any nonzero value implies mode 1.

              In mode 2 (available since Linux 2.6), the total virtual  address  space  that  can  be  allocated
              (CommitLimit in /proc/meminfo) is calculated as

                  CommitLimit = (total_RAM - total_huge_TLB) *
                             overcommit_ratio / 100 + total_swap

              where:

              •  total_RAM is the total amount of RAM on the system;

              •  total_huge_TLB is the amount of memory set aside for huge pages;

              •  overcommit_ratio is the value in /proc/sys/vm/overcommit_ratio; and

              •  total_swap is the amount of swap space.

              For  example,  on  a  system with 16 GB of physical RAM, 16 GB of swap, no space dedicated to huge
              pages, and an overcommit_ratio of 50, this formula yields a CommitLimit of 24 GB.

              Since Linux 3.14, if the value in /proc/sys/vm/overcommit_kbytes is nonzero, then  CommitLimit  is
              instead calculated as:

                  CommitLimit = overcommit_kbytes + total_swap

              See       also       the       description      of      /proc/sys/vm/admin_reserve_kbytes      and
              /proc/sys/vm/user_reserve_kbytes.

       /proc/sys/vm/overcommit_ratio (since Linux 2.6.0)
              This writable file defines a percentage by which memory can be overcommitted.  The  default  value
              in the file is 50.  See the description of /proc/sys/vm/overcommit_memory.

       /proc/sys/vm/panic_on_oom (since Linux 2.6.18)
              This enables or disables a kernel panic in an out-of-memory situation.

              If  this  file  is  set  to  the  value  0,  the kernel's OOM-killer will kill some rogue process.
              Usually, the OOM-killer is able to kill a rogue process and the system will survive.

              If this file is set to the value 1, then the kernel normally panics  when  out-of-memory  happens.
              However,  if  a  process  limits  allocations  to  certain  nodes  using memory policies (mbind(2)
              MPOL_BIND) or cpusets (cpuset(7)) and those nodes reach memory exhaustion status, one process  may
              be  killed  by  the  OOM-killer.  No panic occurs in this case: because other nodes' memory may be
              free, this means the system as a whole may not have reached an out-of-memory situation yet.

              If this file is set to the value 2, the kernel  always  panics  when  an  out-of-memory  condition
              occurs.

              The  default value is 0.  1 and 2 are for failover of clustering.  Select either according to your
              policy of failover.

       /proc/sys/vm/swappiness
              The value in this file controls how aggressively the kernel will swap memory pages.  Higher values
              increase aggressiveness, lower values decrease aggressiveness.  The default value is 60.

       /proc/sys/vm/user_reserve_kbytes (since Linux 3.10)
              Specifies an amount of memory (in KiB) to reserve for user processes.  This is intended to prevent
              a user from starting a single memory hogging process, such that  they  cannot  recover  (kill  the
              hog).   The  value in this file has an effect only when /proc/sys/vm/overcommit_memory is set to 2
              ("overcommit never" mode).  In this case, the system reserves an amount  of  memory  that  is  the
              minimum of [3% of current process size, user_reserve_kbytes].

              The default value in this file is the minimum of [3% of free pages, 128MiB] expressed as KiB.

              If  the value in this file is set to zero, then a user will be allowed to allocate all free memory
              with a single process (minus  the  amount  reserved  by  /proc/sys/vm/admin_reserve_kbytes).   Any
              subsequent attempts to execute a command will result in "fork: Cannot allocate memory".

              Changing the value in this file takes effect whenever an application requests memory.

       /proc/sys/vm/unprivileged_userfaultfd (since Linux 5.2)
              This  (writable)  file  exposes a flag that controls whether unprivileged processes are allowed to
              employ userfaultfd(2).  If this file  has  the  value  1,  then  unprivileged  processes  may  use
              userfaultfd(2).   If  this  file has the value 0, then only processes that have the CAP_SYS_PTRACE
              capability may employ userfaultfd(2).  The default value in this file is 1.

SEE ALSO

       proc(5), proc_sys(5)

Linux man-pages 6.9.1                              2024-05-02                                     proc_sys_vm(5)