Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The people that lose are the ones left with bandwidth charges and overloaded servers.

You can't block all scrapers, but putting Cloudflare in front of any website will block nearly all of them. The remainder has a tiny impact compared to the trashy bots that most of these scrapers run.

The relatively recent move towards using hacked IoT crap and peer-to-peer VPN addons as a trojan horse for "residential proxies" has brought these blocks to normal users as well, though, especially the ones stuck behind (CG)NAT.

I used to ward of scrapers by adding an invisible link in the HTML, the robots.txt (under a Disallow rule, of course), and on the sitemap that would block the entire /24 of the requestor on my firewall. Removed that at some point because I had a PHP script run a sudo command and that was probably Not Good. Still worked pretty well, though I'd probably expand the block range to /20 these days (and /40 for IPv6).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: