As someone who practices and encourages others to employ formal methods in softw...

sevensor · on April 5, 2019

There are certainly a lot of people who feel that way, though. Especially when the language has an expressive type system. Would you say these languages merely swap one kind of error for another?

pron · on April 5, 2019

There are certainly quite a lot of people who feel homeopathy is effective. In both cases, that feeling is a result of a complex process [1]. Explaining why it is unlikely (from a theory perspective) that languages like Haskell have a significant effect on correctness is easier [2], but it, too, requires much more space. What is simple to show is that organizations that develop in, say, Haskell do not produce more correct software more cheaply (or some combination of the two) than those that use other languages.

[1]: For example, in this case, what you call "a lot" may just be the effect of some sort of enthusiasm that results in publications rather than an actual great number of people. Also, the feeling can be the result of programmers facing an unfamiliar language that is challenging them, hence they are required to focus and think more, and that gives them a feeling that they're writing something that's more correct. That the languages catches early some errors may also strengthen this feeling. They're also likely writing smaller programs. I've programmed in many languages, and one of the languages where I get it right most quickly is assembly, probably because I write smaller things, and because I need to focus much more.

[2]: It's relatively easy to classify which program properties can be assisted by the language and which cannot. Those that can (e.g. memory safety and type safety) are called inductive (or compositional). They can be helpful, but the vast majority of correctness properties aren't inductive. Inductive/compositional means that the property is preserved by all primitive operations in the language.

sevensor · on April 5, 2019

Interesting! Thanks for a detailed reply. Since you bring up the cost of software, do you believe that using formal methods helps reduce costs?

pron · on April 5, 2019

Formal methods encompass a very wide range of techniques that vary wildly in cost. When the appropriate technique is applied to the appropriate problem, then it can certainly reduce costs, even significantly. But like I wrote in another comment, most of the cost of building software is determined by what we build, not how. Tools and techniques, even those that are effective, can only help so much.

marcosdumay · on April 5, 2019

There is a large amount of evidence that shows that homeopathy isn't effective. As far as evidence goes, there is more evidence pointing that FP leads to correctness than otherwise, although none of it is conclusive.

pron · on April 5, 2019

> There is a large amount of evidence that shows that homeopathy isn't effective.

That's not quite how evidence for the existence of effects works. Evidence for lack of an effect is the attempt, and subsequent failure, to find it.

Few studies have been able to substantiate the hypothesis that homeopathy is very effective, just like the case for FP. It's just that there have been far fewer studies overall testing the second hypothesis than the first. So while we can be more confident that, indeed, homeopathy isn't effective than in the case of FP, there is still no weight to the claim that FP is effective when it comes to correctness.

> there is more evidence pointing that FP leads to correctness than otherwise

There isn't.

It's perfectly OK that people write about how much they like FP or OOP because the paradigm fits well with how they think, they like its aesthetics, or any similar reasons. It's not OK to try and justify the preference by making up completely unsupported empirical claims.

And by the way, even when there is no evidence one way or the other, you still cannot claim there is an effect.

Tehnix · on April 5, 2019

>> there is more evidence pointing that FP leads to correctness than otherwise

> There isn't.

But there is. Take this paper for example, http://web.cs.ucdavis.edu/~filkov/papers/lang_github.pdf "A Large Scale Study of Programming Languages and Code Quality in Github".

"The data indicates functional languages are better than procedural languages; it suggests that strong typing is better than weak typing; that static typing is better than dynamic; and that managed memory usage is better than un- managed. Further, that the defect proneness of languages in general is not associated with software domains. Also, languages are more related to individual bug categories than bugs overall."

pron · on April 6, 2019

That study:

1. Found a statistically significant, but a rather small effect (low single digit % IIRC) that cannot justify language/paradigm choice (choose FP for 1.5% fewer bugs!). If anything, it's evidence against a large effect.

2. Had most of even that small effect disappear on reproduction, which increases the evidence against a large effect: https://arxiv.org/abs/1901.10220

cannabis_sam · on April 6, 2019

Which puts types in a different category than homeopathy, despite your attempts to associate the two.

pron · on April 6, 2019

1. When did I mention types? I was talking about FP vs imperative.

There are typed and untyped languages in both paradigms. There actually have been a couple of studies that found a positive effect for types on correctness. The largest effect (15%!) was reported in this paper: http://earlbarr.com/publications/typestudy.pdf but it compared only TypeScript and JavaScript.

2. Why the different category? Both effects were looked for and not found. Homeopathy was just looked for much more, so the certainty it has no effect (beyond placebo) is more certain. Again, the effect of FP on correctness is currently between completely unknown and some evidence against. Repeating this claim without supporting evidence is ridiculous.

cannabis_sam · on April 6, 2019

1. Here: https://news.ycombinator.com/item?id=19586915

> Explaining why it is unlikely (from a theory perspective) that languages like Haskell have a significant effect on correctness is easier [2]

> It's relatively easy to classify which program properties can be assisted by the language and which cannot. Those that can (e.g. memory safety and type safety) are called inductive (or compositional). They can be helpful, but the vast majority of correctness properties aren't inductive. Inductive/compositional means that the property is preserved by all primitive operations in the language.

2. Because homeopathy's claims and method of action can be dismissed a priori, due to the fact that they don't follow the laws of nature or the chemical principles of solvation.

Further, your are conflating the question " could (typed) FP be use for correctness" with " should (typed) FP be use for correctness", and while the answer to the first one is trivially "yes", you appear to be chasing the answer you want ("no") to the second question. This puts your approach closer to that of the homeopaths.

pron · on April 6, 2019

I mentioned type safety in passing, but I wasn't talking about the effect of types on correctness in this discussion. That's a more nuanced one.

> your are conflating the question " could (typed) FP be use for correctness" with " should (typed) FP be use for correctness", and while the answer to the first one is trivially "yes", you appear to be chasing the answer you want ("no") to the second question.

I am not conflating anything, and I certainly didn't say which programming paradigm people should or shouldn't use. I did say that if people care about correctness, they should look elsewhere than to change their preferred paradigm to or from FP, and focus on techniques that seem to have an actual effect on correctness (BTW, switching languages -- among several reasonable choices -- is often costly, and language in general is one of the choices with the least drastic effect on most important software metrics, except maybe when it comes to performance, where most of the time what matters is the runtime, and not so much the programming paradigm).

BTW, while I won't get into the discussion about types (which, again, is complicated, as there are different kind of types and usage of types, each with very different cost/benefit), the largest effect types -- and language choice in general -- have on correctness was found in a study I linked in another comment to be of 15% (in bug reduction), when comparing TypeScript and JavaScript. In contrast, several studies and case studies have found code reviews to have an average effect of ~60%. So the largest effect language choice was ever found to have in a single study is 4x smaller than the effect of techniques that are far cheaper than a language change (well, maybe not in the case of TypeScript/JavaScript). Really, language is one of the costliest and least impactful factors on correctness (except maybe when the language is unsafe).

In summary, no connection between FP and correctness has been established, period (and what little evidence we have seems to suggest there is none). People should use whatever paradigm/language they like; the chosen paradigm doesn't seem to have a large positive or a negative effect on correctness. People shouldn't make empirical claims that are unsubstantiated.

> Because homeopathy's claims and method of action can be dismissed a priori, due to the fact that they don't follow the laws of nature or the chemical principles of solvation.

I wasn't trying to say that people who believe FP has an effect on correctness are exactly the same as those who believe in the effectiveness of homeopathy. I only said that the fact that some people make the claim enthusiastically in blog posts should not be taken as corroboration to its validity, just as the number of enthusiastic posts about homeopathy is not evidence in its favor. People often enthusiastically believe things that aren't true because personal experience can be misleading.

But anyway, the claim that FP improves correctness similarly does not follow any law of software correctness. If either homeopathy or FP were to have an effect, it would have to be through some as-of-yet undiscovered process. As any program could be mechanically translated between pure-FP and imperative, there is simply no CS theory to explain why FP would have an effect on correctness (as opposed to, say, using a model checker), and there is no such programming-process theory, either (as opposed to, say, code reviews).

cannabis_sam · on April 6, 2019

>I mentioned type safety in passing

Thank you, that's what I meant.

> The claim that FP improves correctness similarly does not follow any law of software correctness. If either homeopathy or FP were to have an effect, it would have to be through some as-of-yet undiscovered process.

Again you are conflating could and should. Let's simplify:

Homeopathy CANNOT be used, since the concept fundamentally don't work and hence SHOULD not be used.

FP for correctness CAN be used, since we do have examples of functional languages that can be used with formal verification tools, type-based or otherwise. Whether it SHOULD be used, does not follow from whether it CAN/CANNOT be used. That's a question for the sociology of software engineering, and like you say, that's complex.

>People should use whatever paradigm/language they like

I agree, which is why I dislike when people try to associate certain preferences with pseudoscience.

mpweiher · on April 7, 2019

> who feel that way

You hit the nail on the head:

The Safyness of Static Typing

https://blog.metaobject.com/2014/06/the-safyness-of-static-t...

(This is obviously for static typing, not FP, but the mechanism appears largely the same and the overlap is substantial)

logicchains · on April 6, 2019

Can you point to a project as complex as CompCert (formally verified C compiler) that's been written and proven correct in an imperative language? Note I mean proven correct, proven to do exactly as specified, not just proven to lack certain classes of bugs.

pron · on April 6, 2019

Absolutely, but first, CompCert is not really a good example for verification. It is a very small program (roughly equivalent to 10KLOC of C; i.e. 1/5th the size of jQuery), it has taken years of effort by world experts, and they had to cut quite a few corners so your classification of what exactly has been proven is also exaggerated (watch talks by Xavier Leroy [1]). It is more a heroic story of people who climbed the Everest naked than some generalizable technique. But heroic and impressive it was.

Now, CompCert was verified using deductive proofs. In industry, this is an approach that's hardly used (in part, CompCert was done to show that it could be used at all, at least in principle). Verification in industry is normally done using model checkers, that are fully automated. There are tradeoffs between deductive proofs and model checkers (and sometimes deductive proofs are used to close some holes left by model checkers, or tie together some modules verified by model chekcers [2]), but they scale better. Model checkers used in industry include Spin[3], nuSMV[4] and others.

So model checkers check software that's larger than CompCert every day, but they don't get papers written about them, because their use is more mundane than heroic. They are also used for imperative languages, because formal verification of functional programs is behind (here is Simon Peyton Jones saying that [5]). Model checking functional languages is hard, because higher-order functions are hard to verify. Instead, there are model checkers for C, Java, hardware definition languages and tool-specific languages.

So, this mentions a 1MLOC Java application by Fujitsu that's been model checked[6] with NASA's Java model checker (that's 10x bigger than CompCert), and here's a report about model checking a many small C programs[7]. Again, software verification of imperative languages is almost mundane; what you read about are exceptional things.

Also, people who are interested in programming languages are normally introduced to formal methods through its small intersection with programming language theory, and that intersection includes uses of proof assistants and dependent types. Because functional languages are liked by language theorists (for many reasons I won't go into), then that's what you see. But that is a very small part of formal methods research. Most of it is research into model checking and sound static analysis, and most of that work is done in imperative languages. Not because they're easier to verify (both functional and imperative are just as hard, but pose different challenges), but because they want to check programs in languages many people use, and language doesn't make much of a difference.

Finally, I should add that even though language doesn't have a huge impact (there can be some exceptions; e.g. unsafe languages such as C do add some challenges), there are languages that are more natural for verification; they are not functional or classically imperative, though, as those two are both mediocre. Rather, those languages are synchronous languages (usually imperative, but not like what you're used to). Normally, those languages are used in hardware and real-time software (Esterel[8], SCADE[9], but they are now starting to make their way into the mainstream with Céu[10], "behavioral programming" in JavaScript [11], and, perhaps most famously, with Eve [12] (RIP). To see why these synchronous (but imperative) languages are a good fit for verification, take a look at how easy it is to specify a correctness property in Eve: [13].

Finally, my own favorite specification and verification language is TLA+ [14], which is neither functional nor imperative (but it is synchronous; then again, it's not a programming language), has been used to verify very large programs at Amazon and elsewhere [15].

To get back to where I started, CompCert (and seL4) are not really examples of anything. They were both huge, heroic efforts (that didn't quite prove as much as you think they did) by academics in the lab on quite tiny programs. While these efforts are important for some niches, you can't generalize, just as you can't generalize from the design of particle accelerators used to turn some lead atoms into gold atoms to learn about how best to design metalworking tools. Full, end-to-end verification, is something that we simply aren't yet capable of doing in any generalizable or scalable way, regardless of language.

Real-world verification efforts are quite different, and are done with imperative languages both because that's what people use and the language doesn't have a big impact, and becuase there are still issues with model checking functional programming languages.

[1]: http://video.upmc.fr/differe.php?collec=S_C_colloquium_lip6_...

[2]: http://events.inf.ed.ac.uk/Milner2012/J_Harrison-html5-mp4.h...

[3]: http://spinroot.com/spin/whatispin.html

[4]: http://nusmv.fbk.eu/

[5]: http://events.inf.ed.ac.uk/Milner2012/Monday_Panel-html5-mp4...

[6]: http://javapathfinder.sourceforge.net/events/JPF-workshop-05...

[7]: http://www.csl.sri.com/users/ddean/papers/ndss04.pdf

[8]: https://en.wikipedia.org/wiki/Esterel

[9]: http://www.esterel-technologies.com/products/scade-suite/

[10]: http://ceu-lang.org/

[11]: https://youtu.be/PW8VdWA0UcA , https://vimeo.com/298554103

[12]: http://witheve.com/

[13]: http://witheve.com/#correct

[14]: https://lamport.azurewebsites.net/tla/tla.html

[15]: https://lamport.azurewebsites.net/tla/industrial-use.html

logicchains · on April 15, 2019

I said "Note I mean proven correct, proven to do exactly as specified, not just proven to lack certain classes of bugs.". Model checkers don't prove functional correctness, they just prove the absence of certain kinds of bugs. Sure, if you don't care about correctness model checkers are great, but if you do care about proving a program does exactly as specified then you'll have a much easier time with a functional language.