> We’re going to use WireGuard – we have the Intel AES-NI crypto instruction set
WireGuard doesn’t actually use AES, as far as I know.
In general, it seems like the author somewhat overestimates the CPU requirements for TLS encryption (or equivalently underestimates modern single-board computers):
> The CPU requirements to decrypt and re-encrypt HTTPS traffic greatly exceed those available to Raspberry Pis.
I'd be really surprised if MITMing TLS on an RPi 4 was actually infeasible, even when using RSA cryptography purely in software.
There are Android phones still in use with weaker CPUs than that of the RPi 4, and these use TLS too.
I think you're underestimating the CPU requirements. If a weak Android phone is only able to decode 50Mb/s of TLS traffic, that's not a big problem in practice. It's a slow phone, usually connected to slow networks. On the other hand, if you have a gigabit internet connection at home and it is being bottlenecked to 50Mb/s by that weak device sitting between all of your computers and the internet, then that is a big problem.
The CPU requirements for TLS are extremely dependent on the desired bandwidth. At even higher bandwidth, offloading onto accelerators becomes important to be able to do it at all. The cost of handshakes is also nontrivial, and can limit the number of connections per second. For a single device, rarely a big deal. For an entire network of devices, it can be a bigger problem.
> If a weak Android phone is only able to decode 50Mb/s of TLS traffic
That's a lower, not an upper bound.
An RPi 4 can encrypt/decrypt AES-256-GCM at more than 300 Mbit/s, according to my rough measurements. That's per core, of which it has four.
RSA can be much more expensive, but that's besides the point – the author was claiming that AES-NI makes a meaningful difference here, which I'd really doubt even in the case of TLS. (As mentioned above, it can't help at all for Wireguard.)
Which only matters for multiple concurrent connections... a single download would still be a sequential task on a single core at 300Mb/s, which I would find to be an unacceptable bottleneck on my gigabit connection.
In reality, it would probably only be 300Mb/s for up to 2 connections, since it needs to both decrypt and reencrypt, which could be parallelized onto 2 cores, otherwise 150Mbps for 4 connections if each connection was handled only on a single core.
Either way, it would not be possible to MitM 1Gbps of traffic on a Raspberry Pi 4, with the numbers you provided, only 600Mbps total, and only across multiple connections. It would be an extremely noticeable bottleneck.
If you've got a Raspberry Pi 4 as your proxy, aren't you already struggling to pump more than 600Mbps over your network? Even if so, are you really pulling down more than 300Mb/s over a single TLS connection?
> If you've got a Raspberry Pi 4 as your proxy, aren't you already struggling to pump more than 600Mbps over your network?
The Pi 4 is capable of a full gigabit connection, unlike previous Raspberry Pis. So, no, not fundamentally.
> To me, it seems like a pretty narrow set of scenarios where you'd not have the processing power to decrypt/encrypt at the speed of your network.
The whole scenario was set up by the comment at the top of this thread: "I'd be really surprised if MITMing TLS on an RPi 4 was actually infeasible, even when using RSA cryptography purely in software."[0]
I consider it "infeasible" if it is a significant bottleneck on the network. It could be infeasible for multiple reasons, as you're alluding to, but that only strengthens my argument.
> Even in that scenario, AES encryption/decryption can be parallelized
An AES implementation that no one uses is not a very compelling argument, except as a hypothetical. Do trusted AES implementations do the encryption in parallel? That's all that matters, IMO.
> An AES implementation that no one uses is not a very compelling argument, except as a hypothetical.
The entire whole scenario is hypothetical!!!
Yes, there aren't a lot of AES implementations that use CPU & GPU for decryption, but if you're setting up a multicore network device a CPU parallel AES implementation isn't unreasonable.
> I consider it "infeasible" if it is a significant bottleneck on the network.
So there's a lot of vague terms and hypotheticals, as you say.
I would presume it is possible to have a network where data over even a single connection traveled so fast over a Raspberry Pi 4, where you had no access to a parallel implementation of AES, where the performance impact of routing everything through the Raspberry Pi were deemed acceptable, but the consequent slowdown in performance might be deemed "infeasible" by some, yet if you were to drop in a comparable device with an AES-NI capable CPU, the consequent ~4x performance improvement would allow for it to be deemed "feasible". Another "feasible" solution would likely be to spend roughly the equivalent of two months of what you were paying for the Internet connection on the bottleneck you've created in your network.
Yes, it's possible to construct the necessary hypothetical, but it's not exactly a common scenario.
I do not agree at all. Some people may actually want to use the technique detailed in the article.
Most people do not have a powerful, enterprise-grade router they can run software on, so they would reach for another device. A Raspberry Pi is frequently used for PiHole and similar functions, so it is logical that someone would reach for a Raspberry Pi 4 here.
What part of this seems hypothetical?
An AES library that might not even work (since no one actually uses it), let alone is likely difficult to integrate into the software stack described in the article is extremely hypothetical in a way that the actual project would not be. That parallel AES implementation is not some proven library with great documentation... it's a random github repo that hasn't been updated in 5 years. If the feasibility of the project depends on that, that seems like a bad place to start.
> where you had no access to a parallel implementation of AES
You don't. Unless you're saying the author has already integrated this into the described software stack? And proven that it works.
> yet if you were to drop in a comparable device with an AES-NI capable CPU, the consequent ~4x performance improvement would allow for it to be deemed "feasible".
That does not seem hypothetical. That appears to be extremely real. Of course, the speedup would likely come from multiple factors, not just AES-NI, given that you can't find a Raspberry Pi with AES-NI to have a pure apples-to-apples comparison.
> Yes, it's possible to construct the necessary hypothetical, but it's not exactly a common scenario.
I have no idea what you're talking about. This scenario is not convoluted like you're trying to make it out to be.
"Hypothetical" joins a list of a lot of other words like "infeasible", "random", "real", etc. where we apparently have entirely different semantic interpretations.
> An AES library that might not even work (since no one actually uses it)
The library I provided was the first result I got when I searched for "parallel AES", and it's not used because there aren't a lot of scenarios where people need the extra performance extracted by splitting workloads between CPUs & GPUs. Ways to improve the parallel processing of AES was still the subject of some research a decade ago, but there's not a question as to whether it is feasible today. There's just not a lot of call for software that does it because aside from brute-force attacks, in practical scenarios the hardware is already fast enough.
> You don't. Unless you're saying the author has already integrated this into the described software stack?
So now the scenario you've got here is someone with requirements and means at their disposal to regularly pull down data over a single connection at gigabit speeds from the Internet, but doesn't invest in their network proxy enough to get hardware that can decrypt at performance that was available for commodity hardware over a dozen years ago, who is hacking away on a Raspberry Pi to MITM their Internet access, interpret layer-7 protocols, develop software to manipulate those protocols in ways that don't break the functionality they require but do break ad platforms, but don't have the resources to swap out their encryption library?
Presumably, it has to MitM all traffic going to/from the WAN in order to MitM YouTube traffic.
Encrypted Client Hello / Secure SNI / Encrypted SNI prevents the hostname for each connection from leaking in plaintext. DNS-over-HTTPS prevents anyone on the local network from snooping on the DNS lookup to realize which connections are for a given domain name. I guess a sufficiently advanced implementation would stop MitMing a connection once it is not talking to YouTube, but as a broader ad-blocking technique, this would apply to more than just YouTube.
Even just focusing on YouTube, lower bandwidth means that you have longer pauses when you skip around any video that isn't super short, as it attempts to buffer that section of the video.
> Encrypted SNI prevents the hostname for each connection from leaking in plaintext.
True, but almost nobody uses that yet. Youtube certainly doesn't.
> DNS-over-HTTPS prevents anyone on the local network from snooping on the DNS lookup to realize which connections are for a given domain name.
The author of TFA is MITMing their own Apple TV. In that scenario, they could just configure their own DNS proxy as well. But given that there's no eSNI, it's not even necessary.
And even if you'd need to MITM all flows to and from YouTube on your local network – that would still be only a few Mbit/s per device, given YouTube's (non-premium) potato-quality data rates.
I think most people with gigabit internet at home never max out their pipe with a single connection, not even close. The real value of gigabit internet, for most people, is being able to handle numerous family members all doing their thing online at once without stepping on each other's toes.
True, and I'd argue that the most significant benefit Gigabit home internet provides is (at least in many cases) a meaningful upload data rate.
Upload congestion, together with massive Bufferbloat powered by horrendously configured CPEs, is what makes home internet connections feel slow most of the time.
The author also seemed to think parsing a <2MB protobuf was CPU intensive. Even for a cheap embedded network device, you'll never convince me that is true.
But... can you do it at 1Gbps on a single core of a Raspberry Pi? You have to both parse and then reencode it. 1000Mbps = 125MBps. 125MBps/(2MB/message) = 62 messages per second.
62 messages per second means that you have 16ms to do 5 things: decrypt the TLS, parse the protobuf message, filter the message, encode the protobuf message, encrypt the TLS traffic. If you take more than 16ms, you cannot achieve 1Gbps.
We've already established[0] that you can't even hit 1Gbps with just the TLS traffic. The protobuf messages might be fast to parse... but they will still slow things down even further.
> But... can you do it at 1Gbps on a single core of a Raspberry Pi?
Probably not, but if you've only got a single Raspberry PI core at your disposal and you're trying to pump 1000 Mbps of network traffic through said Raspberry PI, you've already got significant challenges.
> 62 messages per second means that you have 16ms to do 5 things: decrypt the TLS, parse the protobuf message, filter the message, encode the protobuf message, encrypt the TLS traffic. If you take more than 16ms, you cannot achieve 1Gbps.
Let's just say you have a system that can do all that in 16ms. I would estimate significantly less than 1ms of that time would be spent parsing and encoding the protobuf message.
The CPU on the Pi 4 isn't that weak. Consider that on processors from well over a decade ago, protobufs were being parsed & encoded at data rates that were easily an order of magnitude more than what gigabit ethernet can support. I don't have a Pi 4 to test on, but I've benchmarked protobuf parsing on machines with far less CPU power, and we measured the parsing times for protobufs in nanoseconds, not milliseconds. Considering the limitations of the I/O subsystem, you're going to have a hell of a time sending data to & from the CPU fast enough to keep up with the rate it parses protobufs.
It's a "given" in the sense that it's a "given" that Raspberry Pi 4's can saturate a gigabit ethernet network.
WireGuard doesn’t actually use AES, as far as I know.
In general, it seems like the author somewhat overestimates the CPU requirements for TLS encryption (or equivalently underestimates modern single-board computers):
> The CPU requirements to decrypt and re-encrypt HTTPS traffic greatly exceed those available to Raspberry Pis.
I'd be really surprised if MITMing TLS on an RPi 4 was actually infeasible, even when using RSA cryptography purely in software.
There are Android phones still in use with weaker CPUs than that of the RPi 4, and these use TLS too.