Google Publishes Latest Linux Patches So Only Trusted Tasks Share a Core

jeroenhd · on Nov 19, 2020

I wonder if this is because Google knows about an exploit that hasn't been released yet or because it just makes sense given the side channel attacks we've already seen on CPUs.

Is there any information on the performance impact of this change? The article says they've reduced the impact and gives a single example but I'd be interested to see a proper benchmark of a system with these patches installed.

I can imagine most of a powerful core's performance going unused if the operating system isn't doing anything special and a multithreaded workload is trying to get a lot of work done. The same can likely be true for systems still rocking a dual core design, where "one core for important stuff and one for applications" can be quite taxing on system responsiveness.

chousuke · on Nov 19, 2020

This looks like a mechanism that allows more efficient use of CPU resources. You would still be able to schedule workloads on multiple cores as usual, but if they're all in the same "trusted" set, you can skip side-channel mitigations and recover performance. The system would still fall back to using the mitigations when you have too much work to be able to keep workload sets isolated via scheduling.

bjornsing · on Nov 19, 2020

Sounds like the right approach. Hasn’t there been anything like this up till now?

dx87 · on Nov 19, 2020

You could do it by setting boot options that tell the Linux kernel to only make some of the CPU cores generally available, then run programs on the unused cores. That's a technique that developers of software with real-time requirements have used to make sure they have resources dedicated to their program.

ohnoesjmr · on Nov 19, 2020

How is this different from CPU pinning? I assume the side channel attack mitigations only happen during a context switch, so if you exclusively pin the core to an application, they shouldn't run anyway?

chousuke · on Nov 19, 2020

It seems to be dynamic.

CPU pinning means the pinned process always runs on the specified cores. If you instead group processes into trust domains, workloads can still be scheduled on arbitrary cores, but the scheduler can use the grouping information to arrange things such that it can skip the side-channel mitigations.

bflesch · on Nov 19, 2020

I think it's in fact CPU pinning while taking into account some basic security requirements.

zekrioca · on Nov 19, 2020

In CPU pinning, the threads do not leave the assigned CPU-cores. So this is not exactly CPU pinning, it is a new "trusted" thread group scheduling implemented through cgroups where threads can move around CPU-cores as long as they end up in "trusted" groups of threads/processes. So, if two non trusted threads are to be scheduled, they won't be in the same CPU-core/CPU/resource/etc.

* Edited for clarification

kzrdude · on Nov 19, 2020

Is it that easy to turn mitigations on/off? I thought much of it required recompile.

brunoqc · on Nov 19, 2020

I think it's just some kernel command-line switches. Like `mitigations=off`.

detaro · on Nov 19, 2020

there is the handy https://make-linux-fast-again.com/

kzrdude · on Nov 19, 2020

That requires reboot and is just the kernel part - so still not easy enough to flip on/off depending on task scheduling.

bluGill · on Nov 19, 2020

In embedded systems we often want to dedicate one core to the safety critical parts. It is a major bug if that core is every more than 40% busy (exact number varies, the important part is to leave plenty of headroom in case the real world is more complex than your test situations). We don't care is the ui core gets busy and as a result has lag. (sometimes there is safety ui running on the not busy core)

c0l0 · on Nov 19, 2020

You can to that on Linux today quite easily. All one has to do is booting the kernel with an appropriate "isolcpus"-parameter on its cmdline, and then "manually" scheduling processes onto the isolated cores/CPUs.

richardwhiuk · on Nov 19, 2020

isolcpus sadly isn't complete - some kernel tasks still get scheduled.

There was some stuff to improve this - https://lwn.net/Articles/816298/ - but it wasn't merged into mainline.

Deukhoofd · on Nov 19, 2020

There's benchmarks in the mailing list.

https://lore.kernel.org/lkml/20201117232003.3580179-1-joel@j...

kmeisthax · on Nov 19, 2020

We've known about side-channel attacks on SMP/hyperthreading implementations since Intel started selling chips with the technology. That's back in 2005. The rest of the industry more or less slept on it until someone gave side channel attacks a cool name and logo to worry about.

SnorkelTan · on Nov 19, 2020

Amazon had just recently or was about to release S3, so cloud computing wasn't as much of a thing back then. The oh shit factor of today is because now business that have absolutely no relation to each other are now sharing CPU cores.

radus · on Nov 19, 2020

Shared hosting has been a thing for a lot longer though. Wouldn't that have been exposed to the same set of vulnerabilities though? I suppose there were lower hanging fruit to exploit though.

rodgerd · on Nov 19, 2020

In 2005 if your bank rocked up to their regulator explaining that they wanted shared hosting of their financial systems on commodity hosting providers they would have been told to reconsider or lose their banking license.

Now it's carte-blanche to have core financial systems on Amazon, running alongside EC2 instances purchased with stolen credit cards by folks in non-extradition countries.

baybal2 · on Nov 19, 2020

Any way, this just confirms the long disregarded notion that it is not possible to execute untrusted code without extreme level of considerations for hardware isolation.

In other words: any kind of "cloud" thing, or any hosted computations are at inherent risk of abuse from ISA level attacks.

gitweb · on Nov 19, 2020

AMD EPYC enables this and Google has GA'd it, soon will also GA confidential GKE.

ralph87 · on Nov 19, 2020

This method was talked about since the Spectre/Meltdown days. It has often taken Google much longer to release patches they've been sitting on, e.g. many original containers patches were like this.

bjornsing · on Nov 19, 2020

Why didn’t Intel do it then?

ralph87 · on Nov 19, 2020

Do what exactly? This is essentially a performance optimization for folk bagholding oceans of vulnerable hardware

rurban · on Nov 21, 2020

It should work. Who would trust Intel in this?

ploxiln · on Nov 20, 2020

OpenBSD was sufficiently aware of this kind of thing to disable SMT completely back in mid 2018:

https://www.mail-archive.com/source-changes@openbsd.org/msg9...

The vulnerabilities (MDS etc) were published in early 2019. There were some mitigations, in both kernels and browsers, but they were and are not 100%, the only 100% mitigation is still to disable SMT. Do you really want to do that? Most people it seems aren't willing to give up SMT.

This patchset for Linux was originally proposed over 20 months ago:

https://lwn.net/Articles/780703/

After refining and optimizing this patch set for quite a while, it's just about as fast as ... just disabling SMT completely (which is a lot simpler). The complexity this adds to the scheduler costs that much.

People/companies really love the _idea_ of getting _all_ the performance of SMT and _all_ the security of mitigations, and don't care as much that you actually still can't have both, so this will be merged eventually.

zozbot234 · on Nov 19, 2020

> I can imagine most of a powerful core's performance going unused if the operating system isn't doing anything special and a multithreaded workload is trying to get a lot of work done.

The documentation for this patchset seems to say that threads running in the same address space will be allowed to share core resources as a default, even when core scheduling is enabled - which seems sensible. A separate mechanism is provided to alter this behavior.

Despacito2019 · on Nov 19, 2020

Thanks Google! This is the kind of good impact Google has on the OSS industry

rfvrgvegbegn · on Nov 19, 2020

Google literally killed Linux on consumer devices. They, on purpose, made every single produced phone non-compatible with Linux and made them Android-only to prevent any competition.

Now Purism has to reinvent a phone from scratch with automobile spare part (I.MX processors) and Pine64 have no choice but to use old & crappy SOCs.

THAT is the impact Google has on FOSS.

Surely that specific contribution is welcomed, but overall I don't think those company ever deserve any "thanks" from anyone given the harm they are doing to FLOSS, Competition, Privacy, Standards, and Society in general.

Raymonf · on Nov 21, 2020

Android is now developed with the mainline kernel: https://source.android.com/devices/architecture/kernel/andro...

Android has been Linux, and it's even more Linux now than before.

jasonvorhe · on Nov 20, 2020

I'd love to see some sources for these claims.

jasonvorhe · on Nov 20, 2020

While you're at it, how do you explain Chrome OS? That's still Linux and you even get a semi-native shell on it. It's almost completely open source and there's even a couple of distributions out there that are build on Chromium OS (which is the fully open source version) that's being used to give life to old laptops in schools. Almost all of the limitations of Chrome OS either arise from being a web-firest OS or for security reasons.

Even their new OS Fuchsia is being developed out in the open.

anonuser123456 · on Nov 19, 2020

They aren’t doing it out of the goodness of their heart. It’s a pain in the butt to maintain out of tree changes. Getting code upstreamed means less maintenance work.

buttersbrian · on Nov 19, 2020

So? If it benefits OSS and Google, what do you care?

Companies, who make up a sizable portion of OSS commits, do it because their interests align. Why is this different, and suspicious or malicious?

zhobbs · on Nov 19, 2020

In my experience the types of developers building this stuff are personally fans of open source and companies are willing to let them submit upstream due to PR wins and limited downside.

mav3rick · on Nov 19, 2020

So ? Upstream also benefits. Just because it's Google bring on every negative point of view

jasonvorhe · on Nov 20, 2020

So you're basically saying that all of OSS is primarily held together because the interests of various companies align somewhere?

Sounds perfect.

asah · on Nov 19, 2020

Naive q: given this, can we [begin to] relax other performance-impacting security measures aka re-enable SMT features?

https://www.extremetech.com/computing/276138-is-hyper-thread...

ploxiln · on Nov 19, 2020

But notice the link in the other thread, to some benchmarks provided by the authors: https://lore.kernel.org/lkml/20201117232003.3580179-1-joel@j...

With the "kernel protection" aspect enabled, the performance is the same as just disabling SMT. And this is their hand-picked benchmark.

Changes like this were first proposed a hear and half ago I think, or longer, and rejected because the performance cost of making the scheduler consider this constraint cost more performance than just disabling SMT, so it wasn't worth the implementation complexity cost. But it's true that companies and customers still _want_ it, they want to get all their cores/threads for performance, and they want all the security... they just want to know they got those things, even if they really didn't, because who can really notice...

yencabulator · on Nov 22, 2020

It probably benefits workloads with fewer syscalls much more. Say, a virtual machine that can talk directly to hardware (vfio etc), or a data crunching task. Neither of those would make a benchmark Google would be willing to talk about...

Deukhoofd · on Nov 19, 2020

That is the primary goal of the changes yes.

corbet · on Nov 19, 2020

This is just another rev of the core scheduling patches; see the articles indexed at https://lwn.net/Kernel/Index/#Scheduler-Core_scheduling for a lot more information on how this works and why it's useful.

loeg · on Nov 19, 2020

I'd suggest changing the URL to the LWN page.

pyentropy · on Nov 19, 2020

The title was confusing to me. This has nothing to do with sharing a core by context switching i.e. giving multiple programs a timeslot on the same core.

It's about HyperThreading/SMT, a feature that allows multiple programs to run in the same instruction cycle by using idle execution units within a shared core.

ngneer · on Nov 19, 2020

Could VMMs employ the approach for scheduling mutually trusting VMs on sibling hyperthreads?

loeg · on Nov 19, 2020

Sure.

alpb · on Nov 19, 2020

I'm expecting a LWN article about this.

loeg · on Nov 19, 2020

There are three so far: https://lwn.net/Kernel/Index/#Scheduler-Core_scheduling

jbb67 · on Nov 19, 2020

i hope this can be disabled because why would you ever run tasks you don't trust on your computer at all, let alone sharing a core

sangnoir · on Nov 19, 2020

Most people browsing the web are running untrusted (Javascript) code on their computers and devices. There have been plenty of browser-based, side-channel attack proofs of concepts written in Javascript, including a version of Rowhammer[1]

1. https://youtu.be/YniqBaSK-Eg

swebs · on Nov 19, 2020

>why would you ever run tasks you don't trust on your computer at all

Sometimes you have to use proprietary software for work. Or maybe you just really want to play a game.

macksd · on Nov 19, 2020

This refers to tasks that trust each other, not tasks that are trusted in general. For instance workloads on a shared server from 2 different users. It's for isolation for each other to mitigate side-channel attacks.

aflag · on Nov 19, 2020

My understanding is that cloud providers will share the same machine among several different people, therefore allowing tasks that don't trust each other to run in their machines.

easytiger · on Nov 19, 2020

If you are doing anything that matters security wise on a shared host, you are doing it incorrectly, would be a thought

codeulike · on Nov 19, 2020

If you are doing anything that matters security wise, you are doing it incorrectly

Amended. This seems to hold true in the long term.

globular-toast · on Nov 19, 2020

If you are doing anything you are doing it incorrectly

jedimastert · on Nov 19, 2020

You also be running an untrusted program on any other computer of your own, for whatever definition of "untrusted" you happen you subscribe to

macksd · on Nov 19, 2020

One could also say the same of anything exposed on a shared network, or anything that has a USB port, because those have been the vectors of many attacks too.

There's a spectrum of risk people are willing to take and this provides another way to share a host but remain a little more isolated.

gitweb · on Nov 19, 2020

Checkout GCP confidential computing.