Because C is slower than C. That is, they are compiling C with GCC, but Rust use...

emmericp · on Sept 11, 2019

Thank you for your thoughtful comment, recommended reading: https://news.ycombinator.com/newsguidelines.html (edit: thanks for editing your comment, we can have a civil discussion here)

To answer your question: Yes, that particular benchmark was done with GCC and yes, it would have been better to use clang/LLVM for a more fair comparison.

We of course also compared gcc with clang/LLVM and found that clang was 0.8% slower, so it does not explain the difference entirely.

fluffything · on Sept 11, 2019

> We of course also compared gcc with clang/LLVM and found that clang was 0.8% slower, so it does not explain the difference entirely.

I think it would be quite useful and interesting to figure out why there is still a small difference between clang and Rust. There might be some low hanging fruit in how Rust generates LLVM-IR that could be fixed.

emmericp · on Sept 11, 2019

Another thing to point out: our 6% - 11% difference is already quite low. Netbricks [1] has a similar comparison between a Rust and a C network function (only the NF, driver in C in both cases) and they find 14% (LPM) to 20% (synthetic NF) difference.

Fun fact: we've a system where Rust is faster than C despite still using more instructions, so yeah, neither instructions nor cycles tell the whole story...

[1] https://people.eecs.berkeley.edu/~apanda/assets/papers/osdi1...

kbenson · on Sept 11, 2019

> Fun fact: we've a system where Rust is faster than C despite still using more instructions

I think Bryan Cantrill did a pretty good explanation on differences you might see when rewriting something in Rust[1], and one of the things he looked at to see what was going on was Cycles Per Instruction. Instruction count itself means little if the instructions themselves have very different performance profiles and require different amounts of cycles to complete.

Edit: You are tracking and reporting that, so it's not like I'm telling you anything you don't know. I still think the included article is well worth reading though.

1: http://dtrace.org/blogs/bmc/2018/09/28/the-relative-performa...

stavros · on Sept 12, 2019

Not only that, but your tight loop that takes up most of the execution time might have fewer instructions, whereas the rest of the program might have more, making the overall total higher but "instructions per time spent in function" lower.

StreamBright · on Sept 12, 2019

Thanks for sharing this, I haven't seen this video before but it i mind blowing.

fluffything · on Sept 12, 2019

Since the resources of that paper are not available anymore, do you happen to know if they used Clang for compiling their C code ?

A 10% perf difference between LLVM and GCC for different applications is in the order of magnitude of what all the hundreds of Phoronix benchmarks show every time a new version of these toolchains is released for general applications (e.g. not for micro-kernels).

ynx · on Sept 12, 2019

doesn't it also use an older fork of LLVM rather than branching/rebasing off of master? That could account for losing out on optimizations.

fluffything · on Sept 12, 2019

The Rust toolchain distributed by the Rust project bundles an LLVM version that's very close to LLVM master. IIRC, this version has some patches on top to allow Rust to query some backend information, but I'm not sure if this information is still up-to-date.

Rust can, however, work with an external LLVM. When installing Rust through a package manager, e.g., on Linux, typical Linux distros like Debian, RH, Ubuntu, etc. configure Rust to use the system-wide LLVM binaries. So if you have Debian, and say LLVM 8.0 installed, you can just compare the installed clang 8.0 with the Rust version installed by the Linux distro.

That will not compare the "bleeding edge" clang vs Rust toolchains, but it would be a fair comparison of Clang vs Rust at a particular point in those toolchains lifes.

donio · on Sept 12, 2019

It could be argued that using the best available implementation for each is the more fair comparison.

fluffything · on Sept 12, 2019

I think that comparing the best implementation of Rust vs the best implementation of C is an interesting thing to compare.

But if that's what this post is comparing, most of the content is probably incorrect, because the main reason C binaries perform better is because the C backend of the C implementation used is better than the Rust backend _for this particular application_. This isn't really news. There are hundreds of benchmarks comparing C and C++ using the GCC and LLVM backends, and each is slightly better than the other, depending on the application. You don't even need to write code in two languages to show this.

The authors appear to be aiming to compare how language differences affect the quality of the generated code. For that, you want to compare two equivalent-quality language frontends using an equivalent-quality optimization pipeline and code-generation backend.

This is, in general, not feasible. But for all languages sharing the same LLVM or GCC backend, doing this comparison is trivial, and would limit the differences to either the frontend implementation, or the language itself.

rejschaap · on Sept 12, 2019

That depends on the purpose of the comparison. But for many practical reasons you are right.

It makes this statement seem a bit dishonest though:

"Our Rust driver is a few percent slower than our C driver, why? Well, it's of course because of safety features in Rust, but can we quantify that?"

m0zg · on Sept 12, 2019

That actually depends. I'm some cases I've seen Clang pull off superhuman feats of vectorization, especially on ARM64. You paste code in Godbolt Compiler Explorer and GCC produces fairly readable assembly, and Clang uses every trick in the book to vectorize. In those cases Clang can be substantially faster. But code needs to be written with vectorization in mind for that to happen (short loops of known size, and a few other restrictions like that). Most of the time it's a few percent behind GCC.

bjourne · on Sept 12, 2019

I similarly found that C was slightly faster than Rust in a microbenchmark called LPATHBench: https://gist.github.com/bjourne/4599a387d24c80906475b26b8ac9... That was with clang which performed significantly better than gcc. clang appears to generate very good code tree traversals.

But this was way back in 2016. Numbers from microbenchmarks that old are hardly relevant anymore.

013a · on Sept 11, 2019

Yeah, their initial explanation is that its the "safety features", but I was under the impression that literally everything related to rust's safety happens at compile time. Their big selling point is the "zero cost abstractions".

Zarel · on Sept 12, 2019

Their initial explanation is bounds checking specifically. While most abstractions are zero-cost, bounds checking isn't.

I do wish there were a way to track expectations about integer ranges in a type system, but there currently aren't.

steveklabnik · on Sept 12, 2019

Bounds checking is a “zero cost abstraction” in the sense that it is both “pay for what you use” and “you couldn’t implement it yourself any better.” That said, the implemention is not as fast as it theoretically can be, because the compiler isn’t always removing bounds checks that are provably true.

When const generic lands, you will be able to make those assertions! A basic implementation is in nightly.

dbaupp · on Sept 12, 2019

This is taking zero-cost abstraction to the extreme, and I think waters down the concept to the point that it almost isn't useful.

One can argue that any feature is a zero-cost abstraction for the exact set of trade-offs it has (in the limit: "you couldn't implement a feature that executes this exact sequence of instructions 'mov ...' any better if you did it by hand").

I think focusing on a hypothetical perfect coder is more useful, someone who never fails an assertion or bounds check. This person does not need any safety/convenience features (unlike us real humans), but they can still use some such features without a penalty. Bounds checks are not one of them. (But... At least they don't have much of a pervasive/whole program cost.)

imtringued · on Sept 12, 2019

Bounds checking is not an abstraction. Therefore you are the one who is twisting the word "zero-cost abstraction" to cover things it doesn't cover.

haberman · on Sept 12, 2019

GC isn't an abstraction either. By that argument, you could claim that you have zero-cost abstractions even if you have stop-the-world GC.

iopq · on Sept 16, 2019

It does abstract over memory management quite a bit taking out the implementation details. What is an abstraction?

dbaupp · on Sept 12, 2019

Hm, I'm not sure I understand. I don't think I was the one to call bounds checking an abstraction in this thread.

steveklabnik · on Sept 12, 2019

Maybe! I think that the concept is a bit more focused than most people think it is. I can appreciate this perspective though. It's a fuzzy concept, so there's decent room to poke and prod around the edges :)

dbaupp · on Sept 13, 2019

Yeah, that's fair. I think there's a strong argument that bounds checking satisfies the "don't pay for what you don't use" rule (although one could say that every slicing having to carry around its length, doubling it's size, is a pervasive cost, but that's fairly weak, given the other things it enables), but I think the other rule is much, much less clear.

jnordwick · on Sept 12, 2019

how do you not use bounds checking?

(unsafe like get_unchecked doesn't count because then you can say that about almost every rust feature).

steveklabnik · on Sept 12, 2019

You're mis-understanding what "you don't pay for what you don't use" means. It means that a program that does not contain [] in it does not include code for []. There's no "mandatory runtime" code for []. It doesn't mean you can use [] in some special way and somehow get less code.

And even if we were talking about your understanding, I don't see why get_unchecked would be disqualified. unsafe is a part of the language for a reason.

jnordwick · on Sept 12, 2019

I know, the meaning of "zero cost abstraction" is really "zero cost to not use the abstraction."

But that's like saying if you don't use rust you don't pay for it. Just because there is the unsafe escape hatch in the language, you don't get to label the language as zero cost abstraction. Because practically speaking there is a lot more runtime cost to rust than people will tell you.

Many langauges (almost anything without a runtime) meet that definition of zero cost abstraction then.

lilyball · on Sept 12, 2019

You still seem to be missing the point.

If you use a Rust construct that does bounds-checked accesses, you're explicitly using bounds checks. You're not paying for anything beyond the bounds checks that you're using.

If you use a Rust construct that elides the bounds checks (e.g. get_unchecked), there are no bounds checks, and you don't pay any cost for the fact that Rust supports a bounds-checked subscript operator.

If you want to compare Rust's subscript operator, do so with an explicitly bounds-checked version of the C code. That's the appropriate comparison for "zero-cost abstraction". Because the whole point of "abstraction" is it's not a change to the observed runtime behavior, it's just a language abstraction.

comex · on Sept 12, 2019

I don't entirely agree. As an analogy, suppose that you want to pass a pointer to an object as a function argument. As the programmer, you expect the object will necessarily remain allocated as long as the function uses it, since it's kept alive elsewhere, but you might have made a mistake. Before Rust, your options would be:

1. Use a C-style raw pointer. This has no overhead but is unsafe.

2. Use a reference counted or garbage collected pointer. This provides safety but is a non-zero-overhead abstraction. In other situations, reference counting can be necessary to manage unpredictable object lifetimes, but in this case we're assuming it would only be used to guard against mistakes, so the cost represents overhead. Even if some language implements reference counting just as fast as you can implement reference counting in C, that doesn't make it zero-overhead.

But Rust adds a new option:

3. Use a borrow-checked reference. This is safe, yet zero-overhead compared to the unsafe option, as long as your usage pattern can be expressed in terms of Rust lifetimes.

Going back to array indexing, the analogous situation is that you want to access an index that you, the programmer, expect to always be in bounds. Option 1 corresponds to unchecked indexing, and option 2 corresponds to bounds-checked indexing. But Rust brings nothing new in this area: there is no option 3.

Yet an option 3 is theoretically possible: safe unchecked indexing if you can prove to the compiler that the index can never be out of bounds. This can be done in languages with dependent types or built-in proof systems. (It can even be done in Rust with a certain neat hack [1], but with a lot of limitations.)

I'm not saying Rust is bad for not having dependent types. As far as I know, it's an open research problem how to make those features feel ergonomic and easy to use, even to the extent that Rust lifetimes are (i.e. not totally). And on the flipside, bounds checks usually don't cause very much overhead in practice, thanks to CPU branch prediction, plus the optimizer can sometimes remove them.

But I'd say that Rust's choice not to go there (...yet...) justifies calling its current approach a non-zero-overhead abstraction.

[1] https://github.com/bluss/indexing

lilyball · on Sept 12, 2019

Bounds-checked indexing isn't an abstraction, it's a safety feature. The abstraction is compared to manually doing bounds checks like you would in C.

  // C safe subscript
  if (i < size) { return buf[i]; } else { abort(); }

  // Rust safe subscript
  return buf[i];

That's the abstraction, and Rust introduces no overhead here.

jnordwick · on Sept 12, 2019

This is such a useless definition of "zero-cost" abstraction that almost everything qualifies for it.

lilyball · on Sept 13, 2019

It's literally the definition. And very little qualifies. An abstraction is zero-cost if there's no runtime penalty, including if the code you'd write by hand to accomplish this goal is no better than what the compiler synthesized for you via the abstraction.

If the use of the abstraction forces your data model into a suboptimal representation, it's not zero-cost. If the compiler emits code that's worse than the straightforward manual implementation, it's not zero-cost. If the emitted code involves runtime checks that aren't necessary without the abstraction, it's not zero-cost.

For example, reference-counting is an abstraction over tracking the lifetime of your object. In some cases (where ownership isn't clear) the retain count introduced by reference counting is required and therefore not a cost, but in most cases the point at which the object ends up freed is actually predictable, and therefore the cost of all the reference counting is something that would have been avoided without the abstraction. Therefore, reference-counting as a replacement for manual alloc/free is generally not zero-cost.

Or how about iteration. Rust has an external iterator model (where you construct an iterator and run that, rather than passing a callback to the collection). In most languages an external iterator model is a non-zero cost, because you need to construct the iterator object and work with that, which is a bit of overhead compared to what the collection could do with an internal iterator model. In Rust, external iterators are frequently zero-cost because they usually get inlined. So you can write a high-level loop that includes a filter and map and the end result is very frequently just as good as (if not better than) what you'd have done if you wrote the iteration/filter/map by hand without Rust's iterator abstraction.

BeeOnRope · on Sept 12, 2019

I understand what you're saying, but that's not how I understand people use the term "zero cost abstraction". That term usually refers to things you can use and pay zero cost for using it. It does not usually refer to abstractions that impose a cost for use, but no additional cost if don't use them, as you suggest.

A quick Google search of the top hits for the term seems to align with my understanding.

Maybe there is a term for what you are talking about, like "pay for what you use".

jnordwick · on Sept 12, 2019

What part of my comment made you think I didn't understand that?

Was it when I wrote: "it really [is] zero cost to not use the abstraction"? Or when I wrote: "if you don't use rust you don't pay for it"?

I understand the concept fine. It is rust's marketing gimmick and not a useful tool to describe the language.

Rust is no more "zero cost" than C++, Fortran, Ada, Objective C, and even D can even make the claim to some extent (you can opt out of the GC and use malloc/free directly). I'm there are plenty more I'm missing. If you use the Java GC that doesn't collect (used for real-time programs), even that probably fits the description.

Under your description an opt in generational GC is zero cost. You can't have the vector be checked, (along with other small performance issues), but use the excuse that you can write code in a way that doesn't use checked access or have that feature's cost. I can do that in a lot of languages.

Also to some extent you should take the ecosystem into consideration where a large chunk of it goes to lengths to not use unsafe such as using a vector for homogeneous objects to get around pointer semantics. That incurs a huge cost, and that should be counted against the language because it makes the other ways too difficult or the main libraries used rely on those techniques. (and if you don't count rust's ersatz standard library, the you can't count Java's standard library since you can always write Java code that doesn't use a GC or extra features - I've done it a few times).

I saw so much promise in rust, but is seems to have gone to complete crap.

lilyball · on Sept 12, 2019

> What part of my comment made you think I didn't understand that?

This part:

> But that's like saying if you don't use rust you don't pay for it. Just because there is the unsafe escape hatch in the language, you don't get to label the language as zero cost abstraction.

Rust isn't "zero-cost" because of the unsafe hatch; that's completely orthogonal. It's zero-cost because if you don't use a feature you don't pay for it. The fact that you need unsafe to get non-checked subscripting isn't particularly relevant to the fact that using non-checked subscripting in Rust means you're not paying for the existence of checked subscripting.

> Under your description an opt in generational GC is zero cost.

You're conflating implementation with semantics. If you have a choice between different allocation strategies that all result in the same observable runtime behavior, using a garbage collector over manual alloc/free is a cost. With manual alloc/free there's no runtime tracking to determine when an object should be freed, it's entirely static. Using a GC dramatically simplifies this from the developer's perspective and avoids a whole class of bugs, but comes with a runtime cost. Meanwhile for single-owner models, Rust's Box type has no runtime overhead compared to alloc/free, since there's no runtime tracking, the alloc and free points are determined statically by the compiler.

Tomte · on Sept 12, 2019

> Rust is no more "zero cost" than C++

C++ popularized the term. And if C++ is commonly described as having designed its features to be zero-cost (or you only pay for what you choose), there's nothing wrong with Rust describing the same concept with the same term.

jzoch · on Sept 12, 2019

What languages dont have a runtime? Even C has one (albeit a very small one). Nobody labels Rust the language as a zero cost abstraction (thatd be silly - there is a cost to learning it!). Rather, they try to provide zero cost abstractions. A great example is that there are no green threads in rust because they consciously removed them as they penalized rust performance regardless of whether said people used green threads or not.

michael_j_ward · on Sept 12, 2019

What exactly does "you couldn't implement it yourself any better" mean if the current implementation isn't as fast as it could be?

Or are you saying that the current implementation fails to live up to the "zero cost abstraction" goal?

steveklabnik · on Sept 12, 2019

The implementation of bounds checking couldn't be written any better: https://doc.rust-lang.org/1.37.0/src/core/slice/mod.rs.html#...

    #[inline]
    fn get(self, slice: &[T]) -> Option<&T> {
        if self < slice.len() {
            unsafe {
                Some(self.get_unchecked(slice))
            }
        } else {
            None
        }
    }

This is exactly how you'd write the feature, by hand, if you were implementing the language.

That the optimizer could, but does not, do as much optimization as it theoretically can, means that it has more work to do. But that's different than the feature being written in a sub-optimal way.

michael_j_ward · on Sept 12, 2019

I don't know if you edited your comment, or if it was just my pre-coffee reading comprehension missed the part "because the compiler isn’t always removing bounds checks that are provably true."

I understand now, gracias.

steveklabnik · on Sept 12, 2019

I don't think I did, but I am very empathetic to pre-coffee reading comprehension woes :) Glad we got it sorted.

rkagerer · on Sept 12, 2019

This got me wondering why we don't include dedicated hardware for bounds checking in CPU architectures. Intel made an attempt with MPX but from a brief glance on Wikipedia it looks like a fail (slower than pure software approaches like AddressSanitizer, among other issues).

imtringued · on Sept 12, 2019

Probably because it doesn't matter. The only important thing is to make sure that the hardware doesn't mis-predict the bounds check.

wbl · on Sept 12, 2019

Might I intrest you in dependent types? Sadly we don't even have a self hosted compiler yet.

Zarel · on Sept 13, 2019

I really like dependent types as a concept! But they're not really in any mainstream languages, so I haven't had much opportunity to play around with them. :(

bjourne · on Sept 12, 2019

There are! It is done in compiled Lisp and Scheme runtimes such as SBCL to remove provably redundant bounds checking. For example, in (when (< x 4) (when (>= 0 x) (...))) SBCL will infer that if x is an integer, it must have one of the values 0, 1, 2 or 3. The knowledge is useful because if x is used as the index of an array containing 3 or more items the bounds checking can be elided.

tom_mellior · on Sept 12, 2019

Compilers do that, yes. But I think the grandparent wanted to express this at the source level, as an annotation to a type.

fluffything · on Sept 12, 2019

FWIW it isn't hard to write safe Rust code that elides bound checks, but one needs to write proper code.

If one pin points a performance issue due to bound checks, it is also always trivial to disable them in a safe way by writing proper `unsafe` code when it makes a difference.

kzrdude · on Sept 12, 2019

It's never claimed that every abstraction in Rust is zero cost. Just that these are often possible.

atoav · on Sept 11, 2019

While you are spot on with zero cost abstractions a bounds check in Rust is not without penalty (if enabled, should be off in release mode)

__s · on Sept 12, 2019

Bounds checks are always on. Integer overflow checks are disabled in release mode. Bounds checks are necessary for memory safety, whereas integer overflow isn't

kjeetgill · on Sept 12, 2019

I believe release mode keeps bound checks. It's integer overflow checks it gets rid of.

jnordwick · on Sept 12, 2019

you can't turn bounds checking off.

there are unsafe method for vec that allow you unchecked access to the array, but you need to to write it in unsafe block (some code that is performance sensitive does this).

rstuart4133 · on Sept 12, 2019

The explanation I heard from a Rust core dev (explain why Rust tends to be slightly slower than C++) is the borrow checker makes programmers to use a slightly different coding style. Were C programmers tend to pass references around Rust programmers tend to take copies of to avoid the borrow checker.

Passing the smaller reference will be faster in single threaded code. In multi threaded code I'd expect making a thread local copy to be a win, and indeed multi threaded Rust programs tend to be quicker than their C/C++ counterparts.

dataflow · on Sept 12, 2019

I don't know... while it may not be the case here, I wouldn't be exactly shocked if the fact that Rust restricts your ability to do some things ends up having a performance penalty, even if any safety checks themselves happen at compile time.

paulddraper · on Sept 11, 2019

> the downside of Rust vs C here is that C has a GCC toolchain that happens to be faster for this use case, but Rust does not.

Yes, this is the "legitimate" criticism to be made here.

I say "legitimate" because LLVM has a bright future and I doubt many people see that as an actual issue.

masklinn · on Sept 12, 2019

The very different load/store amounts, higher rate of instructions being retired and seemingly better L1 behaviour seem interesting though, does compile their C code under clang yield the same behaviour or is it closer to GCC's?