How does the NT kernel handle OOM situations, compared to Linux? I know it feels...

hifromwork · on Oct 9, 2023

I don't know the full answer, but on Windows the problem is less significant because of the core memory management decisions that were made.

In Linux you get a ton of copy-on-write memory - every fork() (the most basic way of multiprocessing) creates a new process that shares all of its memory with parent. Only when something is written the child process actually gets "its" memory pages.

To put that into perspective, imaging you have only one process in your system, and it has a big 4GB buffer of rw memory allocated. So far so good. Then you fork() three times - your overall system memory usage is still roughly 4 GB. And now all four processes (parent and 3 children) overwrite that 4GB buffer to random values. Only at this point your system RAM usage spikes to 16GB.

This means, that the thing that actually OOMS may be just "buffer[i] = 1". It's very hard to recover from this situation gracefully, because this is an exceptional situation, and exceptional situations may require more allocations which are already impossible. Now compare that to Windows, where most memory allocations are in predictable moments, like when malloc() is called, and failures can be safely handled at that point.

So, in the ideal situation, Windows running out of memory will just stop giving new memory to processes and every malloc will fail. In Linux it's not an option, since every write to a memory location can suddenly cause allocation due to copy on write.

the8472 · on Oct 9, 2023

Which can lead to dozens of unrelated applications dying on windows when they assume infallible allocators while linux keeps going (sluggishly) until it has to kill just the biggest one.

Espressosaurus · on Oct 10, 2023

I've worked on memory constrained Windows VMs. The problem shows up as the application you're on dying, because guess what, you're trying to allocate memory that isn't there.

The rest of the system is still usable.

It's fine.

For the longest time I also ran with no swap on Windows (and just an excessive amount of memory). I'd notice when I'd run out of memory when a particularly hungry application like Affinity Photo died and I had a zillion browser tabs open, but again, the system is perfectly responsive and fine.

The Windows behavior seems much closer to deterministic and much more sane than the OOM killer of Linux.

the8472 · on Oct 10, 2023

I've had important background processes die on windows when the offender didn't die and the OOM situation persisted for some time - I assume because it was using fallible allocations while the other processes weren't.

ValdikSS · on Oct 10, 2023

Linux swapper used to be very aggressive on file cache, evicting it to the point that for the next second you'll need all of these libraries again. That is the main reason of the slowdowns.

Fortunately now we have MGLRU patchset, which "freezes" the active file cache for a desired amount of milliseconds, and in general is much smarter algo.

patrakov · on Oct 10, 2023

This may be applicable for desktops, but not for servers.

In a low-memory situation, the admin wants to ssh into the server and fix the problem that led into memory exhaustion in the first place. Whoops, MGLRU freezes the active file cache only, which includes the memory hog, but does not include sshd, bash, PAM, and other files that are normally unused when nobody is logged in, but become essential during an admin intervention. So, de facto, the admin still cannot login, and the server is effectively inaccessible. The only difference is that the production application is still responding, which is not so helpful for restarting it.