Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is assuming you’re using a really big LLM behind a paid service. There are plenty of smaller open source models. Not sure at what point it’s not “large” but when fine tuned they are capable of matching the largest LLM in performance on narrow tasks.

Some of these open source models can even be run on your local machine. It’d be very inexpensive to run thousands of pages through it.

https://llm-leaderboard.streamlit.app/



The "smaller" open source models with adequate capabilities are still rather large and thus compute-intensive. Running thousands of pages through it on a random CPU won't happen in minutes but rather in days, and extracting emails from only thousands of pages is not very valuable.


Thousands of pages is pretty good and what I’m coming to expect on the low side for cheap (single consumer GPU or NPU) throughput with the 5…8GB models now. Heck, with some of the optimizations that Llama.cpp has made, with SafeTensors and GGUF, you can reduce the actual memory usage down.

A cheap Mac mini with apple’s neural cores is good enough that it roleplays smut with a human at human speed. We’re going to see a rapid increase in throughput to price. We’ve already got small LLMs that run on mobile phones.


Scraping is about hundred of millions/billions of pages, not thousands.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: