I think a very serious issue with GC is that:

- The number of edges in a graph tend to scale superlinearly with heap size, as the number of edges possible in a graph are quadratic wrt no of objects.

- Memory bandwidth hasn't been scaling very much during the past decade and a half, even compared to memory size. It's also not a thing people think about or even easy to display in any performance monitoring tool.

But considering if you had a machine 15 years ago with 4GB or ram that could be read at 15GB/s, and now you have one with 32GB that can be read at 60GB/s, it means that your bandwidth compared to heap size has halved. Considering the quadratic nature of references, the 'amplification factor', the number of times you have to revisit an already visited block of memory is higher as well.

This is in addition to the cache trashing issues mentioned in the post.

If you need to read the whole heap, this sets a lower bound on how much time the GC will take ~0.25s on the old machine, ~0.5s on the new one.

Suppose your GC triggers a memory bandwidth issue - how do you even profile for that? This is kind of an invisible resource that just gets used up.