Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Zig has full spatial memory safety, which is already a huge improvement over C.

I don't believe this is true. Zig has pointers to unknown numbers of items, which don't seem bounds checked: https://ziglang.org/documentation/master/#Pointers

They prefer slices idiomatically, but that's not "full spatial memory safety". C also prefers you to pass the length of every array whenever you pass a pointer to it, in that correct code must do this. But the entire point of language memory safety is that we don't trust programmers to consistently do the right thing.

You might be able to say something like "Zig minus features X, Y, and Z has full spatial memory safety". I'd be interested to see what features those are: it looks like at a minimum you would have to get rid of multi-element pointers and extern unions.

> But it can still have that safety in ReleaseSafe mode, at least - for example, by not reusing memory addresses.

The overhead is extremely high, because it leaks an entire 4kB page if a single allocation from that page is still alive. In the worst case, it's equivalent to rounding every allocation up to 4kB. If you're OK with that overhead, you could just link in the Boehm GC and get the same safety with less memory usage, and as an added benefit you wouldn't have to call free anymore.

> Also, while in many domains safety should be the #1 factor when when choosing a language (like building a web browser), that's not universally true. Other factors exist.

In the vast majority of those domains, you could just use a GC'd language. I don't see much room for a new language that isn't memory-safe in 2022.



> They prefer slices idiomatically, but that's not "full spatial memory safety"

[*] is a syntactically delineated unsafe feature. Rust has the same thing. It's like saying Rust prefers safety, but doesn't enforce it because it has unsafe features; same goes for Haskell.

> I don't see much room for a new language that isn't memory-safe in 2022.

That statement [1] is about about as silly as "I don't see much room for a new language in 2022 that is obviously too complex to see wide adoption." If a language becomes popular then obviously there's room for it, and if it doesn't, then the question is moot. Maybe what you mean is that you don't think Zig would ever become popular -- and you may well be right -- but you work on a language that is subject to similar scepticism and doubt.

Both Rust and Zig have some huge challenges to overcome if either one of them is to have "room", and I wouldn't bet on their chances, but it's good to have some widely different approaches and see which, if any of them, turns out to have "much room".

[1]: Even disregarding the hard question over which of Zig or Rust make it easier to write correct programs, which could go either way; that sound memory safety is a better path to correctness than a balance of less soundness combined with simplicity is your opinion, and few would claim that Rust's flavour of safety doesn't come at a cost.


> Rust has the same thing.

In Rust all of those features are delineated by unsafe. For one, you can disable unsafe with a compiler switch; Zig has no such equivalent feature.

Moreover, we shouldn't assume that Zig pointers are the only feature that breaks spatial memory safety. It was simply the first one I found after like 2 minutes of looking through the docs. After like 5 more minutes I found another: extern unions. Just now I found another: sentinel-terminated pointers, since you could delete the sentinel.

It feels a bit like these arguments are going like "Zig is spatially memory safe!" "Well, what about X?" "OK, other than X, Zig is spatially memory safe!" "What about Y?" "OK, other than X and Y, Zig is spatially memory safe!" At this point the burden of proof isn't on me anymore.

It's obvious that full spatial memory safety just isn't a design goal of Zig. It seems like Zig's goal is simply to add tools, like slices, that reduce memory errors. Which is the same goal that, for example, the C++ STL has. It's a great goal, but it shouldn't be confused with Rust's goal.

> [1]: Even disregarding the hard question over which of Zig or Rust make it easier to write correct programs, which could go either way; that sound memory safety is a better path to correctness than a balance of less soundness combined with simplicity is your opinion

It's my opinion in the same sense that it's my opinion that wearing seatbelts results in fewer deaths on the road. The idea that memory-safe programming languages improves real-world safety is backed up by decades of experience. Anyone arguing otherwise has a massive burden of proof. And the arguments I've seen by people arguing that you don't need temporal memory safety are weak. The idea that a simple language that doesn't have ironclad safety guarantees reduces errors over a more complex language with those guarantees sounds nice in theory, but it hasn't actually turned out that way in practice.


> Moreover, we shouldn't assume that Zig pointers are the only feature that breaks spatial memory safety. It was simply the first one I found after like 2 minutes of looking through the docs. After like 5 more minutes I found another: extern unions. Just now I found another: sentinel-terminated pointers, since you could delete the sentinel.

You're right on each of those points. But (AFAIK) all those types are explicitly there for C interop. When interfacing with C, yes, Zig will inherit much of C's unsafety. But the big picture is that

1. Idiomatic Zig should not use those things, and

2. Those things are (IIUC) mechanically-identifiable at the type level. A compiler flag could warn about them. You can check if a codebase uses them, and find out exactly where. No such thing is possible for C.

Maybe you'd prefer that an unsafe block be required to be placed around each of them, but in the end that's a matter of notation. In theory, Zig could decide to mandate such explicit unsafe blocks, but C couldn't - there isn't a useful safe subset to delineated.

I could be wrong, however. Possibly the Zig community will decide that say sentinel-terminated pointers are good to use in general (and not just for C interop). And maybe I've missed something and those types cannot be discovered mechanically for auditing purposes. If either of those is the case then I'd agree Zig lacks spatial memory safety.

(Btw, you seem downvoted atm. Not me, of course - there's probably no one else I respect more on the topic of memory safety than you! - and it's disappointing that people downvote comments just because they disagree.)


I do think the 'unsafe' notation is the crucial bit here. Rust has stuff that is heavily (although not completely) motivated by C interop. For example, 'union' in Rust is something that is rarely used outside of C interop. (With some notable exceptions in the standard library.) But, crucially, in order to actually use a Rust 'union', you do have to utter the 'unsafe' notation.

I'm not sure I have an opinion as strong as pcwalton here, but I do think that you can't call yourself memory safe (even on a particular dimension) if you have to delineate all of the various language features that are unsafe without having to somehow explicitly annotate it as such in the source. Or at the very least, I don't think you can say that Zig's spatial memory safety is on even footing with Rust's (assuming pcwalton's characterization of Zig's unsafe features is correct).

The annotation really is the point here, because you can pin everything (modulo bugs and OS features that permit subversion) with regard to memory safety down to an unsafe annotation somewhere. That's a powerful tool that crystalizes what it means to be a "safe API."

As I said, I don't necessarily share pcwalton's strong opinion (although I do weakly agree with him) that having an annotation like 'unsafe' is the only way to go. I do think it's possible to go the Zig route and reduce memory safety bugs. But we should be very clear eyed about the claims being made and how comparisons are drawn.


I don't think the annotation itself is actually the point. The point is being able to find all sources of unsafety and reason about them. Annotations do that, but other things can too.

Imagine if you could run a tool on a codebase, and it statically found all the unsafe locations, and in idiomatic code there were very few of those, and you could reason about them. That would be equivalent to writing "unsafe" in the code, except you need to run the tool to "see" the annotations. That would be as good as annotations - it could prevent the same number of bugs. (Though I guess an argument could be, maybe some people forget to run the tool; fair enough.)

No such tool can exist for C or C++. But such a tool could exist for a new language like Zig and Carbon, if they design themselves in certain ways, at least for spatial memory safety. Concretely, unsafe things like raw pointers in Zig have different types than safe slices, so a tool can actually find them; and unsafe things are (AFAIK) not idiomatic either, so they'd be rare.


I think we'll just have to agree to disagree. I'm not certain you're wrong, so we'll have to see how it shakes out. And I love what the Zig project is doing, so I wouldn't at all be surprised if they find some other way here that works well in practice.

Otherwise, I do think you underestimate the differences between an annotation like 'unsafe' and what you can find easily only through a static analysis tool. The key bit of the 'unsafe' annotation is the concept of "safe API" that it inspires, and also gives one a vocabulary for talking about things like 'soundness.' For example, you can mark Rust functions as 'unsafe' and then document the preconditions for avoiding UB. If you didn't mark that function as 'unsafe', then we would call it unsound because there exists an input that could cause UB, but it does not necessarily result in UB for every input.

The annotation gives people a way to talk about and crystalize precisely what it means for an API to be safe for all uses. It funnels everything about memory safety down into that one concept and makes reasoning about it much much easier. Without an annotation like that, it's cumbersome or downright difficult to even talk or communicate about soundness.


Well, let's turn this around. Is C++ spatially memory safe? C++20 has slices (ranges). They're bounds checked, via the .at() method. You can get the integer overflow semantics of Zig with "-fsanitize=signed-integer-overflow -fsanitize=unsigned-integer-overflow -fsanitize=float-cast-overflow". You could write a checker that enforces that only these features are used (in fact, this checker basically exists--ISO Core C++ Guidelines).

I don't see any reason why modern C++ wouldn't be just as spatially memory safe as Zig is, if we're allowed to subset the language in ways more trivial than "disable the unsafe keyword". The main thing that distinguishes Zig at that point would be that it's a simpler language. That's true. But I don't actually think that makes for safer programs, empirically speaking--otherwise, C programs would be safer than C++ programs, and they generally aren't.


Yes, technically there might exist a subset of C++ as safe as idiomatic Zig. But in practice inertia and legacy code and lack of ergonomics etc. work against that subset of C++ becoming popular - does anyone constantly type .at()? Zig can do better, if it doesn't squander the opportunity.

In other words, this isn't about theoretical subsets of a language. It's that there is an opportunity when designing a new language to use better defaults and have better idiomatic patterns. If Zig and Carbon do it properly, I think they can get to a place where they have practically no spatial memory errors in the real world. That is measurable, in principle, so we'll see how it plays out I suppose.


I think it's worthwhile to point out that there are two threads of argument here. One is what it means to be spatially memory safe. Another is whether and how effective various mitigations are.


Instead of typing .at() you do #define _ITERATOR_DEBUG_LEVEL 1, done now operator[]() does bounds checking as well.


Note that STL containers allow you to bounds check "[]" indexing even in release mode if you set the right preprocessor directives. The flag differs between compilers and stdlib implementations, though. For libstdc++ (default on linux), use _GLIBCXX_ASSERTIONS. For example, the following code aborts on release builds too:

  // g++ -O3 -D_GLIBCXX_ASSERTIONS=1 file.cpp
  #include <iostream>
  #include <vector>

  int main() {
    std::vector<int> vec{1, 2, 3, 4};
    std::cout << vec[4];  
  }
MSVC provides similar functionality using the _ITERATOR_DEBUG_LEVEL directive. The point is that there is no inherent limitation that one has to use the .at() method for bounds checking STL container access.


Good to know, thanks!


> I don't see any reason why modern C++ wouldn't be just as spatially memory safe as Zig is

Well, you'd need to avoid arrays and pointer arithmetic, some of the most basic language primitives, but being safer than C++ (even though it is) is not Zig's only or even main differentiation from C++.

> But I don't actually think that makes for safer programs, empirically speaking--otherwise, C programs would be safer than C++ programs, and they generally aren't.

Once again you're begging the question by trying to draw similarities between C and Zig and using extrapolations that you yourself know to be wrong.

We both agree that the sweet spot for correctness is somewhere on the spectrum between C and Idris, but we really don't know more than that. No one is claiming that any language X that's simpler than another language Y will be more effective at producing correct programs, just as no one is claiming the same for any language X that can offer more sound guarantees than Y. In fact, we know that both of these statements are wrong.

What we know is that simplicity and soundness are both sometimes better for correctness but neither is always better for correctness. I.e., we know that we cannot make the extrapolations that you're making.


> In Rust all of those features are delineated by unsafe.

They're also delineated in Zig, just not with a single keyword. Extern unions, unknown-length arrays, and sentinel-terminated arrays are features, with clear syntax, used for C interop only.

> It's obvious that full spatial memory safety just isn't a design goal of Zig.

It's as much of a goal in Zig as it is in Rust; both allow circumventing it with clearly marked unsafe features.

> Which is the same goal that, for example, the C++ STL has. It's a great goal, but it shouldn't be confused with Rust's goal.

You're confusing means and end. No one's ultimate goal is having this feature or another. Both Zig and Rust have, as one primary goal, helping write correct software. They just try to achieve it in different ways. Why is that good? Because we have no idea what is the most effective way of achieving that goal, so we try different approaches.

> It's my opinion in the same sense that it's my opinion that wearing seatbelts results in fewer deaths on the road.

I don't think so, because seatbelts have few accident-related downsides, while a complex language has many.

> Anyone arguing otherwise has a massive burden of proof.

I'm not arguing against seatbelts. Anyone who claims that X leads to more correct programs than Y -- not by looking at this or that property, but as a whole -- has a massive burden of proof. I am not claiming Zig is more effective at writing correct programs than Rust. After years exploring issues in software correctness -- including with different formal methods -- my claim, which is closer to consensus than controversy, is that we simply don't know. We don't know which of Zig's or Rust's approaches leads to more software, and therefore we cannot say which of those approaches is preferable, even if correctness is the main thing we care about.

> And the arguments I've seen by people arguing that you don't need temporal memory safety are weak. The idea that a simple language that doesn't have ironclad safety guarantees reduces errors over a more complex language with those guarantees sounds nice in theory, but it hasn't actually turned out that way in practice.

We simply don't know either way. We might know that a language with ironclad guarantees leads to more correct software than C, but given that Zig is as different from C as Rust is -- and as different as they are to each other -- there's nothing that allows us to extrapolate.

Your argument amounts to, "my way is the best way", and your evidence is irrelevant extrapolations. One could then ask, if Rust's design is so great, how come, at its not-so-young age -- which is quickly nearing the age at which all languages (with the possible exception of Python) have reached or approached their all-time peak popularity -- so few people use it? Could it be that there's no "room" for it? (I'm not saying that's the case at all -- I think Rust, like Zig, is a very interesting language.) Its low popularity certainly makes claims along the lines of, "the design of this language is obviously the only design for which there's room in 2022," ring quite hollow. There's no room for anything other than an approach whose success is so uncertain? Really? Given that Rust has proven quite the opposite of an overnight success, I think some humility about its design choices is appropriate.


Part of the issue is that Zig is in a near category to Rust, and its areas of focus are more restricted as opposed to other general purpose and convenient to use languages. Rust already has major corporate backers and much more established popularity in the media. It can easily be seen how it can just simply overwhelm Zig, and particularly by emphasizing safety. Even if those very knowledgeable about the subject can argue to what degree safety should be the issue or is relevant with a particular project, Rust already has the hype and reputation behind it as being "safer".

Zig has to make stronger and more compelling arguments as to why people should use it, instead of just reaching for Rust.


> You might be able to say something like "Zig minus features X, Y, and Z has full spatial memory safety". I'd be interested to see what features those are: it looks like at a minimum you would have to get rid of multi-element pointers and extern unions.

Well, `[*]` pointers and extern unions exist for C interoperability. I'm sure that Rust has to do something comparable when interfacing with C.

These pointers actually do help making C interop more safe. In C there is no distinction at the type level between a pointer to a single element and a pointer to the first element of an array, so if you at some point get it wrong, the compiler can't help you.

In Zig, if you annotate `extern` function signatures correctly, the compiler will be able to tell you when you're wrongly trying to iterate over a single-element pointer (or viceversa).

That's a good feature to have, and IMO probably the worst supporting argument for your usual complaint about Zig's existence.


I didn't say that multi-element pointers were a bad feature, simply that they're a counterexample to the claim that Zig has spatial memory safety.

Generally in memory-safe languages the corresponding features are behind some kind of "unsafe" marker: in Rust and C# they're behind "unsafe", in Java they're behind "sun.misc.Unsafe", etc. I don't see any such marker in Zig.


Is the marker the critical thing here, though? I just wrote in another comment that I think the Zig compiler can warn about using an unsafe pointer.

The critical thing, I believe, is that unsafe can be discovered and audited. Unsafe pointers have a different type in Zig, so they are separable from the rest of the language, unless I'm missing something.

The bottom line is that in a memory-safe language it should be possible to find all the unsafety and reason about it. For spatial memory safety that's not possible in C, but it is in Zig.


I would want to see an example of such a tool before comparing the two approaches or giving credit to Zig. As you admit in the other comment, even if such an "after the fact" tool can exist, still the net safety will be less. A compiler that does not force the developer to consciously think about safety at the time of using the unsafe feature increases the probability of misuse and safety defects.


> They prefer slices idiomatically, but that's not "full spatial memory safety".

I don't think people say a language is unsafe if it has opt-in unsafety in controlled, non-idiomatic ways. Java, C#, Rust, etc. all have unsafe things you can do, but they are still memory-safe languages.

Specifically for Zig, I don't see a reason the compiler couldn't have an optional warning on using an unsafe pointer, so codebases can be audited for this risk, making this a controllable form of unsafety (unlike most of the unsafety in C!).

> The overhead is extremely high, because it leaks an entire 4kB page if a single allocation from that page is still alive.

In general you are right, and in a long-running process that would be the case. But consider a short-running wasm event handler: such overhead is generally not going to be significant there. So this rules out some uses cases, but not all.

> I don't see much room for a new language that isn't memory-safe in 2022.

Yeah, I agree the space for a language like Zig is limited: GC languages are the right answer for most things anyhow, as you said, and when they are not, often memory safety is crucial (like in a web browser) and I'd strongly prefer Rust over Zig there.

Still, there are use cases where Zig seems nice, like wasm event handlers that I mentioned: they're sandboxed anyhow, binaries are small, and it's nice to not have GC overhead. Zig's simplicity and fast compile times are a bonus.

Another use case are low-level things that you'd need lots of unsafe in Rust anyhow. I'm not sure if I'd prefer Rust or Zig in such a case myself. I'd prefer either over C, though.


> I don't see much room for a new language that isn't memory-safe in 2022.

IMHO these are the only valid reasons for unsafe code in a systems language:

1. interfacing with unsafe external code (e.g. call the kernel directly)

2. implement the language runtime/GC itself

3. access hardware capabilities not exposed in the language (e.g. SIMD, GPU, memory-mapped IO, memory barriers)

These are semi-valid to bad reasons for unsafe code:

4. generate new machine code to, e.g. implement a guest language

5. explore the performance ceiling for a very specific code pattern that would otherwise be inline asm

For Virgil, I am at reasons 1 and 2. For Wizard, I am at 4 and 5.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: