Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

8 dual port 100G NICs (available for a while)or 2 dual port 400 gigabit NICs (not yet available) would out-bandwidth the memory controller on an Epyc 7742; how are NICs such a bottleneck that they need to be increased 100x to keep up when DDR5 only doubles the bandwidth?

I forget what the bandwidth of CPU cache is but I'm guessing it's not 10 terabit/second either.



HPC needs HBM or TCI memory, not DDR5. Systems using HBM[1] and TCI[2] can already push an aggregate bandwidth of 8 Tbps.

1. https://en.wikipedia.org/wiki/High_Bandwidth_Memory

2. https://www.hotchips.org/wp-content/uploads/hc_archives/hc26...


It's a good point that HBM has high aggregate bandwidth but I still don't think it makes sense to call it a 100x server/network boundary bottleneck when the already available NIC is faster than NVlink in the first place.

What "server/network boundary" is in this case might not be the classical boundary though so maybe they also mean the same thing I'm saying just from a different perspective.


«how are NICs such a bottleneck»

Simple: when the packet data does not need to be processed by the CPU. For example a router forwarding network packets at 10 Tbit/s. The data can stay in the NIC cache as it is being forwarded. No PCIe/CPU/RAM bottleneck here.

Also, EPYC Rome has 1.64 Tbps of RAM bandwidth today (eight DDR4-3200 channels). 10 Tbps is less than three doublings away. It's conceivable server CPUs can reach this bandwidth in 4-6 years.


32 port 400G (12.8 terabit/s) 1u routers already exist today, the conversation is about the NIC at the server <-> network boundary not switching ASICs. The only reason you don't see 400G NICs in servers is the lack of the servers ability to put that much bandwidth over PCIe (the real bottleneck location).


The ASR-9000 series can handle up to 3.2 Tbps of L3 traffic per linecard, but this is only achievable because of the dedicated routing ASICs.


You still need to account for NIC-to-NIC packet transfers for scenarios where pkt arrives on Phy NIC A and needs to egress via Phy NIC B. Obviously there are better options than just PCIe transport these days.


This is (sort of) a forget everything you know about ______ scenario. But I think the short answer is multi-die interconnects.

https://www.anandtech.com/show/14211/intels-interconnected-f...

https://semiengineering.com/using-silicon-bridges-in-package...


> I forget what the bandwidth of CPU cache is but I'm guessing it's not 10 terabit/second either.

L2/L3 have bandwidths around 1-1.5 TB/s these days. Which pretty much is 10 TBit/s ;)


Surely you don't want to stream through cache, though.


In today's processors all data goes through the cache. There isn't really any other alternative on the horizon.


The radeon VII already has 1TB/s memory bandwidth, using HBM2, with HBM2E offering almost double the bandwidth.

Also if we're looking a bit forward, Intel recently demoed, with Ayar Labs, a 2.5D chip with a photonic chiplet that can do optical I/O at 1tbps/mm2.


This is what RDMA solves. you are only limited by the amount of pcie switches you stack up in your topology, and not at all by the processor anymore. all of your data is either handled directly by the nic, or offloaded to an accelerator card. Modern systems can support about 1.6Tbps (100G NICs). When pcie 4 comes out, this should double.


You forget we need network performance to be wasted by badly designed software.


In principle you can use P2P-DMA to shunt the bulk of the data to a specialized device (e.g. GPU, FPGA, storage) without it ever touching main memory or the CPU.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: