But memory bandwidth (bottleneck for LLM inference) is only marginally improved,... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ivankra 26 days ago \| parent \| context \| favorite \| on: MacBook Pro with M5 Pro and M5 Max But memory bandwidth (bottleneck for LLM inference) is only marginally improved, 614 GB/s vs 546 GB/s for M4/M5 Max - where is this 4x improvement coming from? I think I'll pass on upgrading.

singhrac 26 days ago | [–]

It’s prompt processing so prefill - that’s compute bound not memory.

0x457 26 days ago | [–]

4x is on Time To First Token it's on the graph.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact