Tuesday, May 10, 2011

Linux's Overcommit Allocator and OOM Killer

This is a topic that I have dealt with personally before, but I never found a good description of this. So I have decided to write about Linux's OOM Killer.

By default, memory allocation by applications in modern Linux kernels uses a scheme called overcommitting. What this means is that when an application requests memory, it virtually always succeeds. However, no physical page frames are actually allocated until that memory is 'touched' and a page has to be faulted in. Because of this, it is possible, and often likely, that the kernel has promised more memory to processes than is actually available and in the less likely case that they all have to use this memory at the same time, some process has to be sacrificed to free up memory.

There is a heuristical algorithm that decides which process to sacrifice. This algorithm is called the out of memory killer or oom killer for short. There are some tunables parameters and ways to avoid this seemingly-non-deterministic behavior.

The end result is that snippets such as the following might not terminate until well after the free number of pages have been allocated.

for(;;) {
  void *ptr = malloc(PAGE_SIZE);
}

If the memory were actually used, rather than leaked as with this trivial example, some process would have to be sacrificed in order to free some physical page frames (or ultimately, swap space). It isn't always the process with the most memory or a process trying to allocate memory at that time. That process could be a benevolent process that is behaving well but is unfortunate enough to be selected by the OOM Killer.

Disabling OOM-Killer per-process
Any particular process leader may be immunized against the oom killer if the value of its /proc//oomadj is set to the constant OOM_DISABLE (currently defined as -17).

Setting globally at runtime using the proc filesystem:
echo 100 > /proc/sys/vm/overcommit_ratio
echo 2 > /proc/sys/vm/overcommit_memory

Setting globally at runtime using sysctl:
This approach is a bit cleaner than just writing to the proc files.

sysctl -w vm.overcommit_ratio=100
sysctl -w vm.overcommit_memory=2

Setting globally at boot (persistent):
Add the following lines to /etc/sysctl.conf

vm.overcommit_ratio=100
vm.overcommit_memory=2

After you modify /etc/syctl.conf, the changes will be applied at the next reboot or you can run the following to apply the settings at runtime

sysctl -p

Notes
  • All of these approaches set the commit ration to 100% (meaning don't overcommit) as well as turning the memory allocator to mode '2', meaning check if there is enough memory, and never overcommit.
  • The amount of memory available to the allocator is the total physical RAM plus the total swap space, not just the RAM.