Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
We Can Do Better Than SQL (edgedb.com)
414 points by kristianp on Aug 11, 2020 | hide | past | favorite | 461 comments


It's pretty arrogant to complain about the syntax being inconsistent across versions and databases and then present your own weird offshoot, as if every other version wasn't introduced for the exact same reason with the exact same lofty delusions of grandeur...

SQL is messy because describing the underlying data relationships are messy. The orthogonality example is a great illustration of this. What exactly should the result be if there are multiple dept heads? Should the result rows be duplicated? It's not clear how edgeQl would handle either (their edgQl orthogonality examples were constructed to only have result per sub query), but it seems like they would be kept together as sets. In that case, the result set is no longer a table, it's a dataframe, which is a useful data structure but is also not what relational databases do.


> SQL is messy because describing the underlying data relationships are messy.

> SELECT extract(day from timestamp '2001-02-16 20:38:40');

SQL is messy because all the syntax was decided on before there was a community that really understood what good syntax is. The 'from' in that extract does nothing and I can't easily identify if extract is a function or some sort of crazy parsing construct - what are the arguments? Is "day from timestamp '2001-02-16 20:38:40'" the argument? Are the arguments "day", "timestamp" and "'2001-02-16 20:38:40'"?

Are these annoyances crippling? Yes. Yes they are crippling. It should be possible for an amateur to quickly write an SQL validator as a starting project; relational algebra is not complicated. Any fool can write a validator for lisp. Relational algebra isn't that much more complicated - we don't have loops or flow control to contend with here.

Tidyverse's dplyr [0] implements the relational model for real dirty data and, as might be expected for something implemented this century, does a much better job than SQL. Not because the operations are that much different (although gather() & spread() are welcome additions) but through ingenious innovations like, as mentioned, functions having arguments instead of I-don't-even-know-what-that-is.

And the pipe operator which is legitimately ingenious. Great operator for data.

[0] https://dplyr.tidyverse.org/reference/index.html


> It should be possible for an amateur to quickly write an SQL validator as a starting project

...why? SQL has been wildly paradigm-definingly useful for decades. It has driven hundreds of billions, perhaps trillions, of dollars of value. None of this hinges on the ability for an amateur to be able to write a validator for the language. It just seems like such a non-sequitur to me, such a strange thing to call out as a criticism.

SQL hasn't been wildly successful either because of or despite its syntax, it has been wildly successful because organizing data in the relational model and querying it declaratively is extremely powerful. The syntax is just not the interesting part. Any syntax that meets those criteria would do.


I second this emotion SO HARD. Is sql weird? sure. But (IMO) the hard part implementing|using a sql database is _not_ the syntax, it's the storage, query planning, consistency. Don't like SQL? use linq or one of the bazillion DSL libraries that "compile" to sql. SQL databases are so unbelievably extremely useful, and the minor syntax quirks between different DBs are a minor issue compared to the huge amount of commonality that they share.

Also FWIW, I'm not sure that writing a sql validator _is_ all that hard. The grammars for any dialect are readily available and really not so hard to turn into an AST with any of your favorite tools (ANTLR, lex/yacc/bison, any of a million PEG parser generators). In many cases you don't even need to, we do static analysis of spark SQL using their own parser, for postgres can use the parser straight out of libpg.

I guess the OPs point is that if you can validate it easily, you can understand it. I'm certain I could write a lisp parser in 10 minutes, but I don't think that would mean I immediately understand all Clojure code (which is often pretty obtuse). You still need to learn all the underlying concepts and data structures.


Your last paragraph is non sequitur. It is perfectly possible for the syntax to not be the interesting part, and for SQL to have been successful despite bad syntax. The underlying idea is brilliant, so the first syntax that exposed it sufficiently well got baked in as a path dependency, even though the syntax sucks.


The point I was trying to make was just that the syntax was irrelevant; it was neither a major advantage nor hindrance. I think this is the same as your point.


One thing I have experienced is that often times developers are coming from a world of object oriented programming, or just coming from a JavaScript world.

Then often the storage and retrieval of data is an after thought during the development of an application. This is definitely true for CRUD apps. Then maybe the person used a framework to handle database interactions and now after a year or so, it’s not scaling well because the data access was not really given priority during development. So then the problem is the database engine or the syntax instead of proper design and planning.

That’s not to say there are no drawbacks to writing most dialects of SQL. I just believe most of the hate over SQL is coming from a procedural mindset to a relational one.


> It should be possible for an amateur to quickly write an SQL validator as a starting project

I think the point he is trying to make here is the same as the post was making concerning orthogonality. Having a smaller set of special syntax, and therefore an easier validator to write, means easier queries to write for the user. I don’t think he was implying that everyone who makes use of the language should know how to write a lexer for it.

> The syntax is just not the interesting part. Any syntax that meets those criteria would do.

I have to disagree with this. If this was the case we would still be programming everything in BASIC or C because they’re just another imperative programming language and it gets the job done. Having sugar syntax, a consistent language, etc.. all makes it easier for a programmer (or data analyst) to get the job he needs quicker (and therefore reduces cost), makes a program easier to maintain, and so on.


The syntax is not the interesting part for programming languages either. Programming languages now have better semantics - things like runtimes, concurrency, data encapsulation, etc. - in addition to better syntax. SQL is not a programming language, its semantics boil down to the relational model. Don't get me wrong, there is room for improvement in the syntax, I have plenty of gripes with it, I just don't agree that it is a fundamental problem. It's already super useful as it is.


Indeed semantics is what matter but I think for example lack of composability is semantic problem too.

<snarky> Ability to manage persistency and correctness of SQL engines is very usefull while language interface not so much. </snarky>


>It has driven hundreds of billions, perhaps trillions, of dollars of value.

The same can (almost) be said for Javascript. A language with a lot of foibles can still be successful if it is the only realistic mechanism to interact with the system.


No disagreement from me there! Though I personally like SQL more :)


>> It should be possible for an amateur to quickly write an SQL validator as a starting project

> ...why?

I don't know about amateurs, but if the typical intended user of the language does not have a good grasp of what is and is not valid, they will often be reduced to trying one thing after another until they hit upon something that seems to work, with no real understanding of what it does. This is not how robust, correct software is written.


That seems like a completely different point. (Unless I misunderstand what a "SQL validator" is, which is definitely possible.)

I do think this way of utilizing SQL is a problem, but I would lay the blame for it at the feet of a lamentably widespread anti-intellectualism in the field. It is true that the relational model is not trivial and must be learned before SQL makes much sense. But it is not too much to expect from someone who wants to be a practitioner in this field.


I agree that someone who wants to be a practitioner in this field should understand the relational model, but the issue here is that SQL introduces inconsistencies and complications that are not inherent in the relational model, and, to some extent are avoidable. A justifiable emphasis on professionalism does not absolve SQL of these faults and does not render irrelevant attempts to find better language choices.


Sure, I don't disagree. My point is that the perfect is the enemy of the good, and SQL is the good.


> all the syntax decided on before there was a community that really understood what good syntax is

Here you got me thinking: do we even understand that now? There was quite a bit of contemplating about of semantics, data types and different kinds of abstractions in the programming languages for the last couple of generations already, but the general consensus about the syntax is that "nice syntax is nice, but it isn't really that important". Most modern languages loosely follow either something C-like or Algol-like, most people like what they are familiar with. There are sometimes claims that a good syntax should be successfully parsed by some relatively simple parser (which is totally not obvious, TBH, because while it is clear why C++ is a good counter-example, we don't really have that many good examples: classic LR-parsers or something like that are really not that powerful, and most real-world programming languages implement something very non-generic of their own). Some people claim that "good syntax is no syntax" (Lisp), and some say that Haskell has a good syntax (ikr). The bottom line being that this is just a matter of taste, unlike most of what we can say about types and data structures.

So, to summarize: I never actually heard a compelling general theory of good syntax.

This is completely offtopic, BTW, I agree that SQL is trash and it is even kinda funny that somebody tries to defend it like that, since for a long time it was sort of a textbook example of "why a language created with the idea to be used by non-technical people is a failure from the very beginning".


> So, to summarize: I never actually heard a compelling general theory of good syntax.

I started to think along these lines when I got serious about learning a foreign language. Humans appear to have some innate language ability that’s reflected, among other things, in commonalities between disparate languages. As far as I can tell, there’s been no serious effort to design a computer language to take advantage of this.

My pet theory about why the object.method(args) call syntax won is that it really is the most natural to read, as evidenced by the fact that most human languages follow subject-verb-object word order.


> My pet theory about why the object.method(args) call syntax won is that it really is the most natural to read, as evidenced by the fact that most human languages follow subject-verb-object word order.

Most western languages maybe, but according to wikipedia[1] the most common is actually SOV rather than SVO.

[1] https://en.wikipedia.org/wiki/Word_order#Distribution_of_wor...


Most common by number of languages maybe, but if we look at most common by number of people who speak the language, the top 3 most popular languages in the world(Mandarin, Spanish, English) are all SVO.


> [...] as evidenced by the fact that most human languages follow subject-verb-object word order.

Nope. [1]

[1] https://wals.info/chapter/81 (Note that while this is the language count only, the same comment applies to the population using such word order as a native tongue: SOV and SVO are both common.)


Though I program in languages that most follow the obj.method(args) pattern, I really prefer the args |> function |> function pattern and I wish it was the norm in every programming language.


In what language is that used? It reminds me of pipes in bash. I can see it being useful in circumstances where you have lots of function calls and fewer arguments (just like pipes), but I think it would look really ugly and hard to parse with a long arg list.


It's a common functional thing. I know F# has both `|>` and `<|` to go both directions. It also has `||>` and `|||>` variants which take two- or three-value tuples and spread them to arguments.

https://docs.microsoft.com/en-us/dotnet/fsharp/language-refe...


Elixir for one.

Yes it works best when you have one argument to pass, as the first argument is passed implicitly.

https://hexdocs.pm/elixir/Kernel.html#%7C%3E/2



See also Clojure's threading macros: https://clojure.org/guides/threading_macros


These are two different use cases. In the first case, which method the call actually dispatches to depends on obj, so it kinda makes sense for it to be syntactically distinct from other args. The pipeline syntax is normally used with free-standing functions that are statically dispatched.

Now, yes, there are languages with multimethods, where the distinction is less obvious. For those that always dispatch on all arguments, a uniform syntax makes more sense. CLOS is a good example.


The commonality you are referring to sounds like Chomsky's concept of Universal Grammar [1].

The object.method(args) "won" because it is a low barrier to entry from English to Programming indeed, but as you become more and more experienced and you shake off the imperative way of thinking in favour of declarative thinking functional languages becomes more expressive than imperative languages. They become better at managing complexity because they allow you to build higher and higher towers of abstraction.

And then homoiconicity [2] shines because of its 'universally grammatic' property.

1. https://en.wikipedia.org/wiki/Universal_grammar

2. https://en.wikipedia.org/wiki/Homoiconicity


I was thinking on a more basic level than UG; just the presence of linguistic universals [1]. I agree that functional languages are in a certain way more expressive than imperative ones, but that seems mostly orthogonal to the syntax of the language.

I’m thinking about basic stuff like improper nouns, denoting a clause’s role by inflection or adposition instead of word order, and having a clearly defined spoken representation- none of this is specific to which abstract structures the language is describing.

https://en.wikipedia.org/wiki/Linguistic_universal


>The object.method(args) "won" because it is a low barrier to entry from English to Programming indeed

BASIC and C didn't have a concept of objects* , which didn't seem to hinder adoption at all. object.method() became popular when OOP languages became popular, and that was because IDEs offered better autocompletion with that syntax.

* We could emulate objects with C in some gnarly ways. Almost no one did at the time, excluding C-with-classes/C++.


It's not that simple, D has UFCS (Uniform Function Call Syntax), which makes object.method(args) and method(object, args) equivalent calls.

Also there's a non-language related advantage of dot notation. Type the name of the object, press dot and IDE will show you what things you can do with that object. You don't have that with other notations.


That assumes a very OO mindset. method(object, args)/function(data, parameters) works better in other languages, and IDEs can handle this quite well without dot syntax.

Which goes to your point-- form follows function, in syntax as with many other things.


Why is dot special? Fortran uses % (if I remember properly) IDE's can complete after a % as well as after a dot.


Dot is not special. Putting the receiver objects first is special, because it allows the IDE to only show verbs (methods) that pertain to that object. If you put the verb first, code completion lacks context, so it would necessarily have to show you all verbs that can apply to any object that could conceivably be acted on in that scope.


> SQL is messy because all the syntax was decided on before there was a community that really understood what good syntax is.

It's not that they didn't understand it. There are languages predating SQL that have better syntax.

But SQL was one of the so-called "4th generation programming languages", that were supposed to be operating at a much higher level. And because of that, there was the notion that, if they also have a more "natural" syntax, it would allow people who are not programmers to use them effectively. Hence why SQL was originally SEQUEL - Structured English Query Language. It was meant to be written by domain experts, not database experts.

Of course, that failed (like countless other similar attempts since then), and so now we're stuck with this syntax for no good reason whatsoever.


Please use pivot_wider and pivot_longer instead of gather and spread. The true horror, even admitted by the the dplyr team ...


Oh, they're new. Great. Thanks for the tip.


As a point of minor historic curiosity, the pipe operator in R actually predates dplyr and was invented independently multiple times by various people who noticed that the Unix model translates well to data transformations in R (F# also has a pipe but I'm not aware of anybody using F# as an inspiration for the pipe in R).


Except that dplyr is horribly slow and memory inefficient, and (if you were minded to actually put R code into production) it more or less forces you to do data prep in a database anyway.


> SQL is messy because describing the underlying data relationships are messy.

No, the relational model is beautiful and consistent! SQL is messy because the syntax is not consistent and elegantly composable. It could have those properties and still present the same underlying data relationships.

See Linq in C# as an example for how a more composable query syntax can expose the same data model.

For example in Linq you can chain arbitrary many select/join/where/group by in arbitrary order. In SQL you need nested subqueries to achieve the same which is a much more convoluted syntax.


Slightly tangential but I really like Linq syntax. The only problem I have with it is that it is really hard to debug if there is a logic error. I often see developers write linq find out the record set they get back is incorrect and then break up the linq query to a nested if clause to get what they want.


I've seen so many errors around orderby/limit. which most people want at the end, but linq lets them define it, everywhere. (for a good reason) but this is probably the biggest problem with linq, it's a chain of operators that are ORDER dependant!!


That's a quality of tooling issue. And in Visual Studio, for example, you can set breakpoints on individual subexpressions in the query.


Yeah I feel like that happens whenever you have a mapping from one language to another. Eventually you learn the intricacies of the transformation. There are very few situations where the Linq I write does not translate into the query I expect any more. But that took time to learn.

The only criticism I have of Linq is that it is a more general language targeting different back-ends including plain old objects. If Linq did not allow things that do not make sense in SQL this would be a little less jarring.

I also wish it supported full outer joins, unions, and window functions.


I was amazed the first time I used the MongoLinq driver. It makes sense intuitively, but the idea that you can use the same Linq expression and it will be translated to either Sql or MongoQuery based on the provider you send it to was amazing.

I was able to hire a four or five contractors who never used Mongo just by requiring Entity Framework/Linq experience and less than 4 hours of training.


>For example in Linq you can chain arbitrary many select/join/where/group by in arbitrary order. In SQL you need nested subqueries to achieve the same which is a much more convoluted syntax.

WITH statements alleviate some of this issue by allowing you to write subqueries in any order.


WITH (CTEs) make queries so much more readable and digestible. As a programmer who now does data and SQL, I latched on to these as soon as I found I could reduce repetition in a query with them.


Make sure you understand what optimization fences are and how they affect your performance. CTEs are nice to read but routinely destroy the performance.

[1] https://thoughtbot.com/blog/advanced-postgres-performance-ti...


As of Postgres 12 this has changed substantially. Instead of being materialized, CTEs are inlined and optimized with the rest of the query.

Exceptions:

1. If the results of the CTE are used more than once then it is materialized by default, though you can override this by adding "NOT MATERIALIZED" to the call.

2. Recursive and INSERT/UPDATE/DELETE CTEs are always materialized.

https://paquier.xyz/postgresql-2/postgres-12-with-materializ...


>CTEs are nice to read but routinely destroy the performance.

This may be true on archaic versions of MySQL and Postgres but is not the case today, barring some esoteric edge cases (bugs) where the optimizer gets thrown out of whack. Once while doing data science consulting I rewrote a ~1000 line query in Aurora (MySQL flavored) which had a ~2.5s runtime, which was far too slow for the client's use-case.

After rewriting all the CTEs (there were many) into subqueries, there was a 2-3% increase in query speed, barely (on the order of under a tenth of a second). There was a very tiny improvement far smaller than the normal variance of the runtime.

Then I rebuilt the query and the joins, and was able to get the query to consistently run in the range of 0.8 - 1.2s. For my own purposes I then duplicated the query, re-implemented the CTEs, and did validate that indeed there is only a negligible increase in query time when using CTE.


For monster queries (thousands of lines of SQL) I find that temporary tables are also great. You can index them and, depending on your DB and its settings, they're usually held in RAM so they're super fast.


That is implementation specific. In some db's putting an index on a temp table does nothing. In later versions of the same db it does but only in particular cases. Make sure you read the docs around that.


CTEs do not get cached though, so they are actually quite bad for repetition without also using a temp table.


Depends on DBMS and version. Some are smart enough to pipeline/tie or materialize it internally.


Ecto.Query in the Elixir universe seems like it was inspired by linq


WITH's are great and definitely solves some of the issues with SQL. Some implementations unfortunately don't optimize them as subqueries, but that is not the fault of SQL.


> No, the relational model is beautiful and consistent! SQL is messy...

What is beautiful and consistent is the relational algebra. The relation model relies on this formalism to model data by making some rather strong assumptions about how tuples and relations represent things and how relational operations are used to process them. And these assumptions are precisely what propagates to SQL and what some authors (see references in the article) consider controversial and messy. Then the question is whether and how these controversies can be fixed (or whether they are bugs or features).

A radically different approach to fix these controversies is to introduce a different formalism and different data model (as opposed to fixing only syntax) which is based on using functions. In other words, instead of using sets and set operations, we use in addition functions and function operations [1]. Here you can find an implementation of this approach:

https://github.com/prostodata/prosto Prosto is a data processing toolkit radically changing how data is processed by using both sets and functions and being a major alternative to map-reduce, join-groupby and other set-oriented approaches

[1] Concept-oriented model: Modeling and processing data using functions: https://www.researchgate.net/publication/337336089_Concept-o... -> Read introduction (two pages) for why having only sets is not enough and why functions are important


I disagree, the relational model is very incongruous with the object model that nearly all LOB applications use. Which is where 99% of the usage issues lie.


That doesn't make the relation model bad, it just means there's an impedance mismatch.

FWIW, I find the relational model to be much nicer than typical object models. Even in Java, which I write every day at work, I've found it to be much clearer to use immutable objects representing records. Not just for interacting with the database, but just for handling information in general.

Of course, it would be extremely painful to do without Lombok's @Value, @Builder, and @With annotations.


I agree that the mismatch is not a direct failing of SQL or the relational model - but we continue to use it without adopting a better approach or developing a common abstraction layer. With all of the syntax growth, SQL could certainly have a join syntax that understands object composition and returns structured data.


> With all of the syntax growth, SQL could certainly have a join syntax that understands object composition and returns structured data.

I mean, many RDBMS's have support for JSON and/or XML, so you can sort of get it.

However, I wonder if we'd be better of as an industry if we dealt with our data in code in a more relation-y way than an object-y way.


It's the problem with the object model. Yes, I think ORMs are an anti-pattern, ActiveRecord-style especially.


And then you end up inventing your own ORM and “helpers” because eventually the results end up in objects.

I do hate most ORMs. The only oneS I like are Linq based ORMs. The “ORM” is actually well integrated into the language.

You might as well use something like Dapper.

But yes agreed about ActiveRecord.


You have piqued my curiosity, but what would a linq solution to this exercise look like?


Which particular exercise?


The multiple dept. heads case.


A very simple, basic SQL query would be something like "select * from users where foo=bar;"

Already, we're introducing a weird inversion of syntax that, in my experience, trips up people learning it: data in SQL is stored as "rows" with "columns" inside "tables". More formally, we've got a hierarchical relationship where Tables > Rows > Columns, yet we write the query as Columns > Table > Rows.

There are far more consistent and beautiful querying languages than SQL: I would point to MongoDB's query language, which is less of a query language and more of a static javascript-interpretable library, but is still far easier to learn and more consistent than SQL. The same query in MongoDB: "db.users.find({ foo: "bar" });". How is this better? It embeds the operation in the statement ("find"); reading it hierarchically follows how the data is stored (Collection > Rows); the filtering operation is the same shape as the data being stored; and it naturally disallows most injection attacks.


I always thought a SQL query has a very sensible layout, I was never confused as to which I was doing or in what order. For me a query can be simplified to: select <projection> <selection>


> db.users.find({ foo: "bar" })

I don't know MongoDB query language, but gah! that looks horrible. It uses three different syntaxes; dot notation, curlies/brackets and colon key value. Full of punctuation and doesn't read like english.

There is no distinction between noun "users" and verb "find". There's extraneous "db". does foo: "bar" mean equal or is it find() that determines the operator, maybe combo of both? how do I do other operations.

.Only if you are familiar with programing language that has that same syntax does any of it make sense. Otoh even educated non-programmers are gonna be able to read the SQL as SELECT "these things" FROM "this table" WHERE "these conditions are true".


> Only if you are familiar with programing language that has that same syntax does any of it make sense.

I'd argue that most relational DB users are familiar with a programming language, and therefore most likely familiar with the C-style syntax. It's better to build on something that most of the potentials users are familiar with already.

> There is no distinction between noun "users" and verb "find".

There is no such distinction in natural language either (if you see words purely as sequences of characters), you have to know what is what and infer it from the context.

> even educated non-programmers are gonna be able to read the SQL as SELECT "these things" FROM "this table" WHERE "these conditions are true".

Yeah SQL looks a bit more like natural language at first glance, but that's about it. That familiarity is a false friend, it doesn't really help with the learning curve.

This kind of thinking reminds me of the ruby community trend a decade ago when DSLs were created to look beautiful and like written language. It's useless and confusing for long-term, practical purposes. Same with BDD style testing languages. The promise that non-technical people will feel right at home and can start contributing rarely lives up to reality.


> It uses three different syntaxes; dot notation, curlies/brackets and colon key value.

Is it an issue? using symbols makes it easier to understand syntax.

> There is no distinction between noun "users" and verb "find".

Weird criticism from SQL. It can be distinguish by call with parenthesis.

IMO SQL keyword is harder to recognize.


Why would you want your query language to read like English? Most people don't speak English.


That doesn’t prevent injection, and the solution to injection attacks is using parameterized queries and prepared statements, not switching to MongoDB. Plus ORMs (really query builders) already provide behavior like this against SQL databases anyway.


That's fair, but I didn't say it prevents injection: I said it prevents most injection attacks.

MongoDB is absolutely still capable of being vulnerable to injection; its just harder, because it requires the client to provide an object which is parsed by your application with no data validation. In other words, SQL is vulnerable to injection by-default, because everything is a string, while you have to opt-in to being vulnerable with MongoDB, by writing your application to parse user input with no schema.

In reality, do applications do this? Hell yeah. Wire up a basic Express API, have it auto-parse any JSON its given, pass it straight to mongo, you'll be vulnerable. But, a backend which has any kind of type safety or API schema or GraphQL or something like that will be safer on mongodb than one with all that, on a SQL database with no ORM or parameterized queries or prepared statements.


The difference as minimal. If you know the first thing about what you're doing with SQL you will use prepared statements. It's not some sort of arcane feature that nobody understands.


And those ORMs have to deal with the SQL composability issues as well, often to the effect of dramatically poorer performance.


You’re right, using a query layer to access an SQL database is still using an SQL database. You need to know what you’re doing with your queries. The important part is not the object mapping and model tracking part, it’s the part that allows you to build typed queries in the native language. Diesel for Rust is a great example, as is Ecto in Elixir.


I think Elixir ECTO does a very good job at that

https://hexdocs.pm/ecto/Ecto.Query.html#module-composition


> I would point to MongoDB's query language

seriuosly?

    db.orders.aggregate([
       {
          $lookup:
             {
               from: "warehouses",
               let: { order_item: "$item", order_qty: "$ordered" },
               pipeline: [
                  { $match:
                     { $expr:
                        { $and:
                           [
                             { $eq: [ "$stock_item",  "$$order_item" ] },
                             { $gte: [ "$instock", "$$order_qty" ] }
                           ]
                        }
                     }
                  },
                  { $project: { stock_item: 0, _id: 0 } }
               ],
               as: "stockdata"
             }
        }
    ])
VS

    SELECT *, stockdata
    FROM orders
    WHERE stockdata IN (SELECT warehouse, instock
                        FROM warehouses
                        WHERE stock_item= orders.item
                        AND instock >= orders.ordered );


Another huge win for SQL is that it's easy to construct from parts. You can very easily run and debug your subquery or common table expression on its own before combining it into a larger, more-complex query. If you (as I usually do) create plenty of views while analysing a dataset, the approach can be extremely powerful.

Doing the same in JavaScript is possible, but it's slow and cumbersome by comparison.


The original article addresses the deficiency in construction from parts, in its "Lack of Orthogonality" section.

MongoDB queries, while being interpretable by javascript, aren't really javascript. You can't interact with the data using javascript (well, you can, using eval, but you shouldn't). You interact with the data via the query language, which is, again, expressed in JS, just like SQL is expressed in English.

It's more accurate to consider the Aggregation Pipeline as being the "composable" system to get at data in MongoDB. And its exceedingly composable; far more than SQL. It's literally a pipeline; a series of steps which fetch, mutate, filter, map, limit, calculate, correlate, relate, and otherwise interact with the data in a database. Each step operates on the output of the previous step, in series. You can programmatically swap steps in-and-out, in production, with no string manipulation or ORM, debug each step in series, remove steps, see the output, get performance characteristics on each step. There's no complex black-boxed query execution planner or compiler, because the query plan is the pipeline.


> with no string manipulation or ORM

OR, another way to look at it, using MongoDB's ORM, which is quite bad if you ask me.


If you try to directly translate SQL to MongoDB, without changing your schemas or query, then yes, it's gonna look bad. That doesn't make for a good comparison, and I think you know that.


> That doesn't make for a good comparison, and I think you know that.

That query is not my invention, it comes directly from MongoDB documentation

If it looks bad, it means it is bad by design

https://docs.mongodb.com/manual/reference/operator/aggregati...


MongoDB is better because it's Web Scale!

http://www.mongodb-is-web-scale.com/


you're comparing $lookup, an operator that nosql isn't designed for to a join, an operator sql was designed for


Lookups are very common in MongoDB; Starting with SQL, lifting the data as-is into Mongo, and translating the queries 1:1 will just result in garbage queries, like that one.

An "idealized" NoSQL schema is far, far more complex than anything anyone used to SQL would arrive at ([1]), but most of that is because in a "pure" NoSQL/Document-oriented database, the query engine simply isn't that powerful (think Dynamo). MongoDB has an inordinately powerful array of tools to get at data in a performant way, and $lookup is available as one of those tools. Can it be misused? Yeah; just look at the parent comment to see clear misuse. But generally, it's very common to see.

Modern thinking around MongoDB schema design is closer to SQL than NoSQL/Dynamo. Arrays are bad, denormalization can be valuable but use sparingly, that kind of stuff.

[1] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...


I wouldn't say arrays are bad, and the entire paradigm of data modeling in mongodb is to store your data based on your application usage patterns. if you have to query across multiple collections via a $lookup, then maybe you'd benefit from embedding the smaller of those collections into the former.


Maybe. But, as a general rule, I advise against arrays of unbounded size on documents (arrays of a known bounded size, say, containing enums to act as a multi-value flag, or email addresses on a user's account, or something like that, are fine).

One example: We used mongodb to track the state of a general CSV import system. So, we'd have a document for each csv file a user imported, and on that document, we were storing errors which occurred during the import to later display to the user. Of course, in an array. Worked great for years, until one day, a user uploaded a very bad CSV, non-maliciously, with hundreds of thousands of lines, with dozens of errors on each line, generating an array millions of items large. The failure condition here was wild: the import just got slower, and slower, and slower, until eventually the (modestly provisioned) db cluster started failing. We immediately normalized that array into its own collection, re-ran the import, still generated millions of errors, but with no problem.

As always, a general rule doesn't apply in every situation, but in my experience, unless you have a really strong grasp on how a system's use will scale, years into the future, unbounded arrays are icky. Lookups across two collections are only a modest performance loss over a direct query, and if the arrays get up there in size, they can actually be faster.


Interesting, we ran into a very similar thing at my current place where we have a CSV importer storing results in mongo. Importantly it also stores the errors that occurred during the import.

In our case someone uploaded a _huge_ CSV with some misalignment in the columns so every row had an error.

The resulting the mongo document was larger the max document size (16MB?), so it couldn't even save to the database.

Mongo is painful to work with and I feel like I keep finding more reasons to hate it.


> SQL is messy because describing the underlying data relationships are messy.

Not sure that is true. You can imagine having a smaller query language with clean semantics able to capture messy data relationships. Small functional expression languages come to mind. Where it lies on the spectrum from purely-table-form to can-hold-everything is a design choice.

Your query language doesn’t need to capture the underlying data model exactly. SQL is an example of this itself.

I’ve run into the orthogonality issue myself several times. You don’t need to drop SQL today, but for some data domains more expressive query models can work really well.


Right, it's like criticizing python, or English, for being inconsistent, or "large". Turns out that doesn't matter -- what matters is that the language is useful because it has a wide base of users and libraries, just like SQL does.


Python and English are meant to be general purpose languages, so they are kind of expected to be large and occasionally inconsistent. SQL is (by definition) a domain-specific language which has grown out of proportions.

Are there really any SQL libraries in the traditional sense (i.e. reusable/composable SQL code with a well specified API)? SQL "libraries" typically focus on hiding the inconsistencies and the abhorrent syntax under the carpet.

And the user base is there, mostly because of the database engine properties and features. The query language itself is just a bad side-effect. And TFA does make a point that NoSQL abandoning the underlying RDBMS model is in fact a regression.


SQL is designed around databases. Python is designed around objects. Clojure is designed around expressions. A table is analogous to a list of objects. None of these languages is more domain-specific than any other.


Perhaps libraries doesn't really apply to SQL per se, but instead you have tooling, ecosystem, DB engines, ORMs, extensions etc, that all speak SQL and would be hard to do without.


SQL is messy because describing the underlying data relationships are messy.

The article presents several examples where SQL's messiness cannot be plausibly attributed to underlying data relationship messiness.


I think the real problem is that SQL text is the only interface to most relational databases. There is a SQL standard, of course, but it's difficult to get vendors to implement the standard, because SQL is the user interface to the database.

Instead, perhaps a second format should be standardized, which is machine readable/writable, that more or less represents relational algebra + whatever extra features SQL supports. It can be clunky and verbose, as long as it's straight-forward and easily composable. It would effectively be like a compiler intermediate representation.

Once you have something like that, you can have as many front-ends as you want with whatever syntax you prefer.

Of course, getting all of the RDBMS vendors to agree to it is still a problem, and they'll probably all still include their own vendor-specific extensions and differences, because they want to lock you in to their system.

But at least maybe the open source ones could agree, which would still be quite beneficial.


I don't see how that makes them arrogant. SQL's implementation inconsistencies are a real problem. Their query language doesn't claim to be another version of SQL - it's clearly very different.

Also that is only one of the things they are hoping to fix. It sounds like you're saying "We have 5 variants of SQL already so nobody is allowed to write any other database query languages ever. We must use SQL forever." which is stupid.


In a world where there's a new programming language every other month, what's wrong with suggesting a new query language?

At least we don't have to change our whole data model or give up consistent data to try it!


Most relationships, I have encountered are just one-many, many-one, many-mane, one-one, indirect, direct or graphical in nature.

You want flat result, aggregated result. Why does this have to be blamed on the data relationship and not SQL language itself?

For any graphical type of relationships, I find SQL utterly hard to express my queries properly. It feels like a assembly language at that point.


yeah the edgeql syntax just looks messy and the doc doesn't even explain how to do a group by


It is very common to hear from new developers who are not familar with declaritve programming to complain about SQL. They don't tend to see the SQL syntqax fit into their 'programming' paradim. It is very simple, SQL is magic box, you ask it what you want, and it gave it to you. It is around for so long for a reason.


I was hoping for something more left field myself. If SQL is based on tables, what about a QL based on relations only. Columns of data that are related, the "TABLE" implementation detail doesn't need to factor into it.


I think you might be misunderstanding what a relation is in the relational algebra or relational db. It's not a relationship. A relation is nothing more than a set of sets. Tables are relations, and views are relations. The results of queries are also relations.

This is a common misunderstanding of what the relational in "relational database" means. It's not (primarily) about relationships, except insofar that relationships can be described using relations and queried using relational algebra. But the key concept is that of the mathematical relation, as per Wikipedia:

"In mathematics, an n-ary relation on n sets, is any subset of Cartesian product of the n sets (i.e., a collection of n-tuples)"

So the goal of relational db languages is to provide tools for querying these sets of sets. Joins, Unions, Selections (Restrictions), and Projections are some of the tools that are provided. SQL provides some of these, though often calls them confusing names.

For example, in SQL the "select" keyword begins the query statement, but selection proper (as per relational algebra, also called restriction) is actually what is expressed in the "where" clause. What comes after "select" in your query is technically "projection" (choosing which tuples/columns exist in the output). Many ORMs and similar tools get this messed up, because the authors of them know SQL, but not the relational algebra on which it is nominally based.


Isnvt that true for SQL? SQL treats everything as a relation and does not make any difference between a table, a view, a set returning function, a sub query or a CTE.


> If SQL is based on tables, what about a QL based on relations only.

Aside from DDL (which I'm not sure how you would do without distinguishing between base relvars [tables], derived relvars [views], and other classes of relations), SQL doesn't particularly treat different kinds of relation differently beyond what is minimally necessary (you can't write, via insert or update, to a relation that isn't a relvar, for instance.)


This is what we are trying to do with Cayley https://github.com/cayleygraph/cayley


The website needs a link to the documentation, I just couldn't find it :/


> If SQL is based on tables, what about a QL based on relations only.

Try Sparql.


Came looking for the reference, and found it - but there are fewer and fewer mentions of SPARQL with each of these HN discussions. I think people just think "oh some semantic web crap" and move on without realising how powerful RDF etc. really can be when used as simple tools, without the 90s hype of changing the world with linked data.

At least the old ideas are resurging in the form of Neo4j (essentially a triplestore) and Cypher (essentially SPARQL).



Are you thinking of Datalog?


>In that case, the result set is no longer a table, it's a dataframe, which is a useful data structure but is also not what relational databases do.

What do you mean here? The difference between a table and a dataframe is that a dataframe is a construct held in memory, while a table is persisted storage written to a database.


I completely agree with both the title of this post and your comment. The solution is in diagrammatic "languages"/tools like Airflow, not more query languages.

Selecting columns before tables always felt weird to me. Doesn't it make more sense if you had a graphical view this way? (imagine boxes around the below items where you can drag lines to make connections between tables/inputs)

    USERS
           --user_id-- [data processing] => ...
    GROUPS
In SQL that would be,

SELECT * ,groups.* from users INNER JOIN groups ON users.user_id = groups.user_id

SQL is so messy and full of details that can be much better represented via dataflow diagrams.


Given our source control tools' limitations, I will take a textual format over graphical representation any day.

Either way, queries need to be composed at runtime a lot of the time, so for many uses of SQL you can't just have the queries as blobs that you prepare ahead of time in some other tool - they must be objects you can work with programmatically.


Somebody at work just stepped a Microsoft Dynamics "integration". The network admin is trying to recover the integration database from a snapshot. An integration maps data from external files into accounting structures: it uses a GUI, with names that make sense to accountants. And the database that holds it is an Access database with the integration details in a BLOB. I'd pay money for a tool to save these out to a textual representation we could put under source control.


There's no reason a diagram can't be in source control. You just need a tool that can convert the diagram to code and the code to a diagram. You probably also want to store the diagram's layout in textual form too so it can be converted back to the original human-created layout.

Then you can use git and optionally build custom tools like a visual diff of diagram v1 vs v2.


The diff tool is the key here. Without that, there is no meaningful version control.


With a visual representation, diffing may be as easy as clicking next/previous between versions as you would images in an album. You could of course add more bells and whistles with highlighting.


Sorry, I should have been more specific. When I say diffing tool, I mean one that can do what diff does - help Git do automatic merges, and help me do manual merges when auto merge finds conflicts. Two pictures side by side can do neither.


I don't see why it couldn't be built. Git merge doesn't always do all the work for you, you still need to evaluate the changes.


There are graphical query generators in at least Microsoft's management tools. Most devs use straight SQL as far as I know because once you get into more complex queries the graphical interface becomes a hassle.


> There are graphical query generators in at least Microsoft's management tools

The dev still needs to type the code first, which does not address any problems with SQL.

> Most devs use straight SQL as far as I know because once you get into more complex queries the graphical interface becomes a hassle.

Those interfaces may be limited, or the devs may not have enough experience writing "code" via a higher-level graphical interface. Going from writing SQL to creating dataflow diagrams may be a bit like switching from imperative to functional programming.


"Selection" in the relational algebra is actually everything that comes after the WHERE clause. SQL confuses people here because "SELECT" seems to imply "select these columns" when in fact that is technically called "projection."

The "Selection" is the set of predicates that restrict the resulting relation. Projection is choosing which tuples ("columns") to use in it.

The relational algebra has no "tables", this is, again, a SQL thing. It has relations (sets of sets) and operations on them. In SQL "tables" are one kind of relation, and views are another.


the first paragraph of your response was unnecessary, or perhaps could have been worded more constructively


XKCD’s comic on competing standards feels appropriate here:

https://xkcd.com/927/


I don't think treating query languages as a standard is the right comparison here. I think it's great that programming languages have evolved from C or PHP quality languages. I view SQL like PHP, I can write it if I have to, but it's certainly not a language I enjoy working with. EdgeQL has its flaws, but it still looks like a big improvement over SQL to me.


I must be in the minority. I enjoy writing SQL and figuring out clever ways to construct queries to get what I need out of the data. Conversely, I am not a fan of PHP at all, but as you said, will write it if I have to.

EdgeQL looks... interesting, but I need to see more to decide for sure. Would be cool if they built out a translator, so you could pass in SQL and get back EdgeQL.


It's a translator, but in the opposite direction. EdgeDB is a postgres frontend that translates EdgeQL and its schema definition language to SQL which it then sends to postgres. (It can also translate GraphQL to EdgeQL to SQL)

I don't think translating from SQL to take a look at the EdgeQL it produces makes much sense, since it'd result in very unidiomatic queries. SQL prefers joins and flat rows, while EdgeQL is based on following links and has great support for nested data output.


is xkcd ever not appropriate ? there's probably an xkcd about it


Not a munroe original, but I like this one https://thomaspark.co/2017/01/relevant-xkcd/


oh I thought he did that :)



>It's pretty arrogant to complain about the syntax being inconsistent across versions and databases and then present your own weird offshoot, as if every other version wasn't introduced for the exact same reason with the exact same lofty delusions of grandeur...

Obligatory XKCD https://xkcd.com/927/


Am I the only full stack dev that likes SQL?

SQL is an incredibly expressive and flexible way to read, store, and update data. It's ubiquitous, so the SQL skills I learned six jobs and three industries ago are still relevant and useful to me today. Relational Databases and SQL are heavy lifters that I often relay upon to build projects and get things done.


No, I'm with you and prefer SQL for many tasks.

SQL got a bad rap in many ways due to security issues, databases in general, and "web-scale".

SQL as a language within other languages is a nightmare from a security standpoint, and if language integrated query was more common across languages earlier on then this wouldn't have been an issue.

Databases generally depend on normalization, but normalization comes with interesting scaling problems and how do you replicate normalized schemas. Thus denormalization became a thing, and then the emergence of NoSQL and document stores started to infect everywhere. The JOIN was a killer too, and then the discipline required to do sharding made it annoying to manage, so easier to manage solutions became a thing.

I'm looking at databases in a different light these days with more appreciation, but now the hot new thing is GraphQL makes things... interesting. I don't view GraphQL as a server-side solution, but a client solution to overcome the limits of HTTP/1.1. However GraphQL clients are exceptionally complicated, and I'm not sure they are worth it. The only problem is that to overcome them requires engineers "to know how to do things", but that is a hostile stance. People want to go fast and make progress, and GraphQL enables that.


GraphQL and SQL are not mutually exclusive at all.

GraphQL is, in no way, faster than any REST alternative in terms of implementation speed. If anything, it is slower, as you need to be extremely methodical with your API changes, as (same with REST I suppose) deprecating fields / entities, for mobile clients specifically, is a PITA unless your clients have really nicely built out forced upgrades.

What GraphQL _does_ give you, is type safety and extreme client flexibility . It is a better solution than REST in almost every scenario, other than the initial learning curve, which takes a couple weeks and then you know it forever.

Would I recommend some startup write a GraphQL API for their MVP? No, just get something working. Are you at a more medium-sized company looking to build out your much more permanent API? Then yes, you should probably strongly consider GraphQL.


Too bad nobody thinks whether or not they need the flexibility in the first place. MVPs with single clients using GraphQL for flexibility that is not needed or used are common.

GraphQL is a hammer, and now every project is a nail.

Anecdotal, of course, but the first thing all FE developers I've ever worked with do when they start/join a project is add/suggest GraphQL/Apollo.

Nobody considers how awful things look on the back-end when you need to cache or make magic happen to avoid thousands of N+1 queries.

I believe unfortunately it has become the only way many front-end developers learn to interact with any back-end, and now everyone's forced to use it regardless of its drawbacks.

Same with React. Facebook managed to get free training for all of their future hires. I hate the company but that was a genius move.


I have seen startup using GraphQL for mvp, precisedly because they could change and crank ui fast. They already knew GraphQL, I did not when I joined and learned to modify already existing one in a day or so.


MVP? Just use your apollo/GraphQL server as the node monolith and be done with it.


Recently I tried using SQL directly on the frontend: https://medium.com/@unodgs/sql-on-the-frontend-react-postgre... as an alternative to GraphQL which I also find too complicated. You might find this interesting.


I've been experimenting with SQL as an API language - including client-side SQL constructed in JavaScript - fir a couple of years with my Datasette project.

I'm using similar security tricks to you: read-only queries with a time limit, against SQLite rather than PostgreSQL.

More here: https://simonwillison.net/2018/Oct/4/datasette-ideas/ and https://github.com/simonw/datasette


Great to see that! Did it work well for you?


It took me a solid month before graphql finally "clicked" but there's no going back now, I absolutely prefer it over rest now. 10/10 would do again and do suggest everybody give it a try.


This makes a ton of sense to me, thanks for sharing your learnings.

I see very little value in using GraphQL, when you can just write SQL on the client!

We desperately need frameworks to better facilitate this, like Hasura or Django -- DB migrations, permissions, authentication, real-time subscriptions, Admin UI.

My most wanted feature is SQL type providers, like Rezoom.SQL - https://github.com/rspeele/Rezoom.SQL


What are examples of those "easier things"?


MongoDB could be an example, but it introduces yet another empire.

Redis is also easier.

The key is what are you designing against. If you design against a DB, then you may find that scaling beyond a single host with gotchas. But, if you have the discipline to keep everything within a document, then you can scale up easier as the relationships between documents is more relaxed.

However, cross document indexing and what-not creates more problems, and that in and of itself is an interesting challenge.


What is a good "web scale" solution?


Given that I’ve seen 25TB MySQL databases, RDBS scale up juuuuust fine. Just... have someone on hand who understands them if you’re going to go that high.


These days, SQL.

It's true that in the 2000-2010 period many SQL implementations struggled to scale with growth in websites (many other parts of the webstack did too).


These days, anything.

The question is what is it going to cost in either licensing solutions or engineering effort.


Do you love SQL, or do you love relational algebra? Because actual SQL, the language, is pretty shitty.

Perhaps the best querying language I've ever used is Q-SQL, integrated into kdb+/q. Unlike SQL, it's actually part of the language (q/k) and, most importantly, it's modular and more expressive than SQL.

If you're interested in how we can do a lot better than sending strings to remote databases using an inexpressive and non-turing complete language, check it out: https://code.kx.com/q4m3/9_Queries_q-sql/


Not parent poster but I love relational algebra. But so far I have yet to use an alternative to SQL which is less bad. Most of them seem to be designed by people who do not understand SQL.


I agree with you, I haven't found an alternative that is less bad. But that doesn't mean that I like SQL. It is so inconsistent and uncomposible, with too many awkward ways to do different things.

I love the idea of a functional algebra for querying a database, and while SQL is acceptable, it is far from great in my opinion.


Take a look at Suneido. As a product, it's a weird thing that kinda stands by itself, and isn't particularly useful because of that. But the database and the query language is straight up relational algebra.

https://suneido.com/info/suneidoc/Database/Queries/Syntax.ht...


Same for me, even more after venturing into the NoSQL hype of the early 2010s, I implemented pretty successful systems using non-relational databases but over time I stick to a RDBMS as most and as far as I can.

Not only I prefer to work with SQL nowadays I also prefer SQL over any ORM in older codebases, ORMs are pretty useful for getting up to speed without caring about your persistence layer too much but after 17 years in this industry I've had my fair share of issues with ORMs to avoid them whenever I can.

Native SQL queries with placeholders for my parameters in their own files, loaded by my database driver to execute and return data is my go-to solution for data access, it's flexible, maintainable and readable if you treat SQL as your normal code (code reviews, quality standards, etc.).


Same for me. I love it. I always avoided the NoSQL things so far since the use-case is mostly unstructured data in comparison to normal SQL databases. I think what most engineers struggle with is just thinking of rows and columns as data. It is a separate thing to create schemas, work with them, inserting data, manipulating data and reading data. These are all different skills. Creating the proper queries is like learning an own language. Most programming languages come up with their own ORMs like the LINQ style of C# (EntityFramework) or all other ORMs like TypeORM for TypeScript, the Django ORM or Hibernate. So overall, to learn how to handle data you have to understand SQL basically + the abstraction layer. Sounds a lot harder than having data in a simple object.

Personally, I was lucky because I had great courses, even in highschool, regarding SQL including: How are rows working, what is normalization, how does it help with data and so on. So naturally I developed some feeling on how to handle database tables.


It sounds like you are more in love with relational algebra than SQL itself. Relational algebra is sound mathematics, but SQL used to express those ideas is horribly old and outdated. We have learned a lot about language design in the half century since its inception. We can do better.


I love SQL but I still think we can do better! The strength of SQL is the underlying relational model and relational algebra. Syntax wise SQL is somewhat clunky.

Linq in .net shows IMHO how queries can be expressed in a more consistent and composable syntax while still conforming to the relational model.


The work I do lately tends to be missing an sql layer and yes, I miss it a lot. I loved organizing my data at that layer and having such powerful ways to query it. It felt like I could eternally find ways to optimize it, and I really enjoyed learning year after year.

Lately I use nosql for very light data and otherwise our API outputs heavily cached and extremely simple data. Adding a database as a middleman wouldn’t make sense. Still fun, but I miss Postgres!


When I started at a company there were using MFC in windows and started transactions in a CDialog::OnOK().

Me and my companion moved everything to stored procedures. Now every IT-department says that the product is VERY stable.


I like relational algebra. I dislike SQL. It harkens from a similar era as COBOL, and I dislike COBOL for procedural applications for the same reason. We've learned a lot about language design over the years and it is a shame that we've put very little effort into adopting new languages to address this particular problem space.


No, you're right, SQL is awesome. I mean, the syntax itself is pretty creaky, but it doesn't matter, the power to declaratively query structured data is the key. Everything that provides that power is awesome; SQL is just at the top of that heap in the breadth of its utility.


Seconded. Every language has it's use and quirks. Does SQL get tricky in tricky situations? Sure. But what doesn't.

This piece reads like Joey being unable to open a carton of milk [1], "there's gotta be a better way!".

[1] https://www.youtube.com/watch?v=wwROPN3Fir8


I really like the relational model, and I think its the best way to model data. Referential integrity is an awesome thing that removes the possibility for many bugs to exist.

As for SQL it has its warts, but im pragmatic when it comes to programming languages, like for example C, JavaScript, its also "ugly" but its often the best option anyway.


At work we almost exclusively use pure stored procedures[1] and everything is normalized very well. It is an absolute joy to write SQL, because of how terse it is while still being very readable.

Trying to implement business rules about data relations outside of the DB is a nightmare.

[1] We use dynamic SQL within stored procedures for pivots.


You lose source control on your procedures. How you deal with that?


Why? When I worked doing infrastructure and CI/CD automation we created a pipeline for deploying stored procedures, not that different from any other code lifecycle process.


> You lose source control on your procedures

Why would you do that?


We use daily backups of the whole server, so if we really need to rollback, we can. And if we absolutely need source control, we could write a procedure in combination with a trigger to automatically write the stored procedure to some source control.

Also there are tools out there that provide that functionality. Found with a few seconds of searching. https://host.apexsql.com/sql-tools-source-control.aspx


You can use Liquibase or Flyway and an automated deployment process to keep your SQL code in sync with non-SQL code (if needed.) For bonus points, you can make your stored procedures be callable by other stored procedures, create/teardown mock data, and do TDD where your test suite of stored procedure tests runs on build during deployment and either has a PASS and deploys or hits a FAIL and the deployment aborts.


If you use a DBMS to manually update these, yes. Otherwise you can do this in code whenever you want in your build/deploy pipeline.


Just wrote a stored procedure in a text file and committed it to git. Seemed to work ok.


I love SQL. I'd love someone to make it better for complex queries. Have you seen the enterprise SQL monstrosities. Why do we have ORMs if SQL is perfect?


We have ORMs because our programming languages object models are not relational, they are hierarchical - from C to Haskell, everyone goes for highly non-relational data representations.

So, when we interact with a relational DB we need some kind of layer to map between the world of relations in the DB and the world of objects in our program.


I would say they're not even hierarchical in many cases, but an unconstrained graph.


Correct but I didn’t explain well. I mean the SQL generation is left to the ORM and I think that’s no accident.


Not for the reason you're implying, actually for the opposite reason. I'm from the days of yore, just before ORMs became popular and it basically replaced a lot of boilerplate code, but it wasn't the SQL that was the bulk of it.

It was mainly to save time in writing code to map columns to object properties really, the sql statements themselves were trivial even if you weren't lazy and just used select *.

Also, here's a now mainly historical pain most devs never encounter any more: Before ORMs and the various migrators, your object properties might not have the same name as your SQL columns. Yes, it was dumb when you did it, yes, it caused loads of bugs, yes, it actually happened quite a lot.


> It was mainly to save time in writing code to map columns to object properties really, the sql statements themselves were trivial even if you weren't lazy and just used select *.

I am running a side project and yes, writing the native SQL statement to take your object and put it in the database is not a problem, put in the parameterised values and off you go.

But getting the data back from the database? Oh the horror. So much boilerplate in order to see if there are any records returned it all, if there are enough columns with the correct name for the kind of object you are making, if there is data or not in each column as appropriate for that specific column, if a given field can be coerced into being a string or an integer or a date or similar, then they're all marshaled into a dto object which is passed to the create new object validator. 800 lines of code later, and you may have an object back!

Dapper appears to be the sweet spot for me, I am still writing SQL queries and still designing the SQL tables myself, no orm magic here, but it handles the actual marshalling to and from an in-memory dto object versus data in the table for me, and that is very valuable time savings.


Most SQL libraries handle this all for you? You really don't have to do any of that.

It's perfectly fine to just 'know' that the data is going to be an int and just do `var id = rs.getInt(id")`.

I'm on my phone so it's too fiddly to write my own brief example, but if you get rid of the silly comments in this you'll see you can do it all in a few lines, just very boilerplate lines:

https://thedeveloperblog.com/sqlconnection


Completely agree. For example, why isn't there functionality to define aliases for complex expressions and then reuse those through a query. Simple query-local SQL functions would be nice too.

I'd love to see a "SQL-like" language that compiles down to SQL itself, much like Babel or TypeScript in the JavaScript world. I think the tricky thing is that there is no single SQL target.


> Completely agree. For example, why isn't there functionality to define aliases for complex expressions and then reuse those through a query. Simple query-local SQL functions would be nice too.

What about common table expressions? Or custom defined functions?

    with X as ( select... )
It's widely used for analysis


CTEs can do that, but it is quite clunky for when you just want to reuse a partial per row computation.


>functionality to define aliases for complex expressions and then reuse those through a query

What's an example of what you're trying to do? I think views, user-defined functions, routines, and select aliases would cover all the bases...


> I think the tricky thing is that there is no single SQL target.

So the jquery of SQL, then.


Multi sql targets means it’s even more of an advantage to compile


Linq


Same here. Every experience I've had in the alternatives reminded me why SQL was superior.


how many devs accept it for what it is, limits included ?

it's alright, it's consistent enough, it's good

I'm reading about datalog and prolog more and more but sql is ok


Do you like relational algebra, SQL the language, or both?

To me relational algebra is the beauty queen, and SQL is the "beauty mark" that prevents it from reaching perfection.


I use it a lot, because I know it fairly well. Not sure I "like" it; it's like a lot of food - I know what of the food varieties I prefer over others, even when I didn't immediately "like" any of them. More of a "got used to" type of thing.


Totally with you, long live SQL!


Team SQL. There have been few data models Ive encountered that I couldn’t express with some functionality of Postgres.


What you (and I) like is not SQL per se but what comes through of the underlying relational model, that SQL hasn't screwed up. SQL as a syntax and semantics is a mess. It could have been better.


I'm not impressed for two reasons:

1. Anyone striving to build a better SQL should make a comprehensive list of common (but difficult!) database tasks for OLTP and OLAP workloads. This will expose the weakness of their language. SQL has had 50 years and myriads of improvements to cover all these common cases. This is not a fair fight, so come prepared.

2. It's not enough to be just "better than SQL" to replace it. SQL has such a huge momentum that a new language needs to be absolutely better _and_ it should have many features that SQL cannot possibly have. My nice-to-have list would contain predictable performance, lock ordering, ownership relations (for easy data cleanup), and a standard low level language which the query optimizer would output.


Not disagreeing with your point (if purist language nerds had their way over practicality, we'd all be writing Haskell and Prolog) but most of your nice-to-haves strike me as properties of the database engine, not SQL itself.

Predictable performance - this will always be not only implementation-dependent but data-dependent as well. In order to know whether a join will be efficient or not, you need to know things like relative sizes of the tables, which is not necessarily a language problem. I have worked with SQL implementations that had extensions to let the user annotate joins with relative sizes on each side, but I don't think that's quite what you mean.

Lock ordering - again, good databases should have defined semantics (Postgres, for instance, does take locks in order when using ORDER BY), but I'll grant that this one could be stronger. That said, I think this is pretty niche. How often are you doing large multi-row transactions where lock order is a serious problem? If I have enough volume that deadlock is likely, I probably have enough volume that I want to be breaking up the process into a sharded or two-phase commit anyway.

Ownership relations - I think this is a DDL problem rather than a SQL problem.

Low-level language - I don't think you'll get a portable low-level language here (at least not for any definition of "low level" that's much lower than the SQL AST) because, again, the basics are implementation-dependent. What kind of scan is the base atom of a query? Well, it depends - is your database distributed? sharded? row-store-based? column-store-based? I do wish more open source database drivers would let you play with the AST in memory (Postgres has ways to print it out, but I don't think there's a good API). That would tend to solve the most significant problem raised in the article (composability) - plugging together SQL clauses automatically is hard, but plugging together subtrees can be much easier.


> Predictable performance - this will always be not only implementation-dependent but data-dependent as well

This immediately jumped out at me from the parent comment. It would be entirely possible to implement a query language where you specify a plan for your query. But then you’d immediately lose the “better than SQL” competition, because your complexity and maintainability problems would skyrocket.

I’ve had to deal with this problem as an Oracle DBA, and it’s a complete nightmare. It starts with a statistics refresh ruining a couple of execution plans, so you start specifying them manually with the plan manager. Then it gets worse over time, because stats refreshes become a big risk and you don’t want to do them anymore. Eventually you get to the point where you pretty much only run verified plans. Then your verified plans slowly degrade overtime, because the underlying cardinality of every table is constantly changing. You’ve replaced the query optimiser with yourself, which is not only tedious work, but it’s simply not possible to do the job as well as any mainstream DB engine could.


I wish that I could upvote your comment more than once, because this rings so true.

There certainly are (rare) situations, where you need to provide hints in one form or another, but it's really a bloody nightmare to maintain and may completely bork, when you - say - upgrade to a new version of the database engine.

I work with relational databases since the early 90s and can give you a no-bullshit money back guarantee that you (not you personally, obviously) are not smarter than the optimizer.

Usually there are weird data patterns involved if you absolutely must provide hints. But basically:

Don't do it!


I think one of the challenges with sql is that beginner developers can create naive sql queries that "work" but are extremely complicated for the optimizer to "get right". So in some cases (talking from own experience) the developer can, with the use of hints, "be better" than the optimizer when the problem all along was the overall structure of the query.

Edit: don't do it


The relational model can _usually_ save the day here, without a huge amount of effort. SQL certainly has its share of anti-patterns and footguns. But most of the awful SQL I’ve seen over my career hasn’t come from poor mastery of SQL, it’s come from poorly normalized schemas. If you have a properly normalized schema, then you can do a huge amount with very simple SQL. When it’s poorly normalized, you end up with all sorts of strange and inefficient design patterns in your SQL.

This could come across as me saying “well it’s easy if you do it right”, but the thing is, normalizing a schema is incredibly simple. I would expect a relatively inexperienced software engineer to be able to pick it up literally just from reading the Wikipedia page. In my experience, the more common underlying problem is that inexperienced engineers (even if they’re only inexperienced in terms of SQL and RDBMS knowledge), don’t actually know what normal forms are, or why they’re useful.

Data structures and concurrency control is just fundamentally useful computer science, but for some reason it seems to be a topic a lot of people don’t pay enough attention too. Maybe it’s just my personal pet peeve, but I’ve seen too many projects start with “wow NoSQL is great”, and a few months later end up with giant nested loops in their lookups, and some poorly built custom implementation of MVCC in their business logic.

(NoSQL is great btw, just not for relational data)


>> But most of the awful SQL I’ve seen over my career hasn’t come from poor mastery of SQL, it’s come from poorly normalized schemas. If you have a properly normalized schema, then you can do a huge amount with very simple SQL. When it’s poorly normalized, you end up with all sorts of strange and inefficient design patterns in your SQL.

This is the crucial insight that has made tons of money for me over the last 3 decades. I have all these trite HHOS jokes about it, like telling people denormalizing from a schema not in a normal form is actually the process of "abnormalization". And then there's generic EAVil, where there's nothing that can't be stored, not that nothing really ever means something, heheh.

For every well designed and useful schema I've seen, there were 999 awful ones. For example a physical data model where the query writer has to use string manipulation for joins is going to result in all kinds of suckage. The developer will conclude NoSql is a perfectly reasonable alternative. Even though a modern RDBMS provides all sorts of nifty features to identify and correct such issues ex-post-facto.

For a relational model to work well there must be an a priori data design performed with significant discipline. This seems like too much like Big Design Up Front for the average developer or technical manager to stomach these days. It is true that a well-designed data collection system will have a simpler data design more amenable to a distributed NoSQL system and will support emergent schema and relations which may be divined via machine learning. It will also make Big Ball Of Mud more convenient to implement, but that's a posteriori observation, heheh, like that damned halting problem...


>footguns

Ran into one recently. Where a table was joined either to one or the other table, based on if a value was null in the first one.

This was fine, until we added a where clause to a, through multiple joins, base table for both options. This tanked the performance >1000x.[1] If we just returned the value it had basically no impact.

We tried solving it with using the result set as a base for a select where we did the filtering. This also resulted in the slow performance. In the end I solved it by wrapping the column in a function call, which solved it. And I still don't know why.

My guess is that somehow without the function call, it optimizes it into one query, which results in basically the original case, while a function forces the evaluation of the subquery first.

[1]Sub 1sec to over 15 minutes


Imo, polymorphic associations are one of the key areas that the relational model in general struggles. You can do them in most RDBMS, but they’re always a bit janky. Even when you’re just modelling your schema, you really have to think quite hard about it, and you’ll really struggle to preserve simplicity.


In this case it was a

left join sometable on sometable.someuid = isnull(someothertable.someuid, somethirdtable.someuid)

I guess that is such an uncommon case that it tripped up the optimizer completely.

Also: Thanks for writing "polymorphic associations". Not knowing that probably is why I struggled to find any info on it.

Edit: Both tables were actually the same one, just retrieved via different joins, so different data.[1]

[1]One was a company, the other was the company we need to send money to. This is for when deal with a daughter company but pay the parent company directly, for example.


> Thanks for writing "polymorphic associations". Not knowing that probably is why I struggled to find any info on it.

We might have had a similar experience with this. The first time I stumbled across this problem though I was specifically trying to figure out “what is the relational way to implement polymorphism”, so I pretty much lucked into the a rather productive series of google searches.


It wasn't strictly polymorphism, but the term you wrote led me to a article[1] where it mentioned "alternative parent". This alone instantly made the problem more understandable for me.

[1]http://duhallowgreygeek.com/polymorphic-association-bad-sql-...


> Data structures and concurrency control is just fundamentally useful computer science, but for some reason it seems to be a topic a lot of people don’t pay enough attention too.

Because of the endless articles and comments saying basic computer science knowledge "isn't really needed" for the majority of programming jobs.


I've found with distributed postgreSQL databases that the optimizer needs more help than you might initially expect. After looking at how the optimizer inefficiently implemented a query, I found a different, but equivalent, way to rewrite the query that could then be efficiently optimized. It's more complicated in the distributed case, because the shard distribution rules have a strong impact on performance. The time needed to re-balance data across shards for a query can be significant.


I've had the opposite problem. In a system I've been working on a lot, I know exactly what indices I want used in what order to read the data. Admittedly the database often figures that out first time, but sometimes it doesn't, then I have to go through an infuriating process of changing parameters or tweaking the query into a mathematically equivalent form until it guesses what I want. In one case I was even forced to write a procedural loop in PL/pgSQL (yes, I'm fully aware this is an antipattern).

I don't want to give hints to the optimiser. I want to specify precisely how the data will be read and joined.

I appreciate this is this is the exception rather than the rule compared to how most developers would prefer to work. And this system is a bit unusual in that it's using a database for passing messages, which it isn't well suited for (especially not the way they're storing the messages.)


I don't know if it's necessary for a better language to do both OLAP and OLTP; it seems that they are used by different classes of users with relatively low overlap in language features. But I definitely agree that such a test as (1) would be a critical prerequisite to an improvement over SQL.


Alternatively you can make a compiler from sql to your alternative, which would make it relatively easy to get the "I just want a familiar thing that works" people on board.


At that point you no longer provide a new language, but a new SQL query engine.


3. The evolution of SQL should be an open standard, backed by academics and the community. Not a proprietary solution proposed by a company.


QUEL ( https://en.wikipedia.org/wiki/QUEL_query_languages ), the original query-language for Ingres, was more orthogonal and consistent than SQL. But IBM decided SQL was more business friendly. Who can argue with that.

And before that there was ALPHA. From https://www.labouseur.com/courses/db/s2-Remembering-Codd-2.p... :

"Ted [Codd] also saw the potential of using predicate logic as a foundation for a database language. He discussed this possibility briefly in his 1969 and 1970 papers, and then, using the predicate logic idea as a basis, went on to describe in detail what was probably the very first relational language to be defined, Data Sublanguage ALPHA, in “A Data Base Sublanguage Founded on the Relational Calculus,” Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access and Control, San Diego, Calif. (November 1971). ALPHA as such was never implemented, but it was extremely influential on certain other languages that were, including in particular the Ingres language QUEL and (to a lesser extent) SQL as well."


Ted Codd designed the Relational Calculus as a clean relational-query language. It looks mathematical (scary?) and a little like a set-comprehension. But I think the big mistake is its use of non-ascii chars like ∃ ∈ ∀.

Here's an example from http://arwan.lecture.ub.ac.id/files/2013/10/4.-relationalcal... :

SQL:

  SELECT DISTINCT F.Name
  FROM FACULTY F
  WHERE NOT EXISTS
    (SELECT * FROM CLASS C
     WHERE F.Id=C.InstructorId AND C.Year=2002)
Relational Calculus:

  {F.Name | FACULTY( F) AND NOT
    (∃C ∈ CLASS( F.Id=C.InstructorId AND C.Year=2002))}


I only had a quick look, is it a computer-friendly offshoot of relational algebra[1], the actual mathematical model for relational databases, as an actual query language?

[1]https://en.wikipedia.org/wiki/Relational_algebra


Relational calculus is another mathematical model that is dual to relational algebra: if you have one representation you can always get the other. Relational algebra describes a fairly direct set of manipulations of database rows that can be implemented efficiently. Relational calculus operations are more abstract, but have useful identity transformations you can use to optimize query plans.

So, when you ask a database system to perform a query, you ask it in relational algebra terms because they’re easy to understand. It then transforms your query into a relational calculus-like form to shuffle things around, and then back to a different, but equivalent, algebraic form to actually execute.


Thank you, that makes perfect sense.


Relational algebra is functional programming: you map-reduce your way to a solution set.

Relational calculus is logical programming: you specify what the solution looks like and the query engine figures out the sequence of operations to find a solution.


A useful analogy indeed, thanks.


It goes a bit deeper than an analogy, but regardless I’m glad it helped!

EDIT: To explain, the relational algebra IS a map-reduce functional programming language. The relational calculus IS a logical programming language. They just use tuples and sets instead of the more familiar lists and strings of LISP, Forth, Prolog, etc.

And the sibling comment further up is right to point out that there are mechanical transformations to turn one into the other, just like there are automatic ways to compile Prolog into LISP (and vice-versa, though that is rarely done).


∃ ∈ ∀. quite natural to me as a pure math grad

there exists, in, for all


You don't even need to be a math grad for it to be natural. It's just a natural way to perform set operations, pretty much everybody who knows a little bit of math is familiar with it. End even if you are not, you need like 5-10 minutes to learn it, and then maybe a couple of weeks of using it to become really fluent.

And I feel like "is not ASCII" is pretty silly complaint in 2020 anyways. First of, if these characters would be widely used in 1980, they would be on every keyboard today, same as it was on keyboards for APL. Second, you don't really have them to be literally ∀ and ∃, it could have been &A and &E for example, and then every other IDE, including some fancy product of JetBrains, vim & emacs and most likely even your mysql-cli (at least some wrapper around it written in Python) would turn them into pretty ∀ and ∃ on your screen, same as some editors do for λ.

And third, which is a personal pet peeve of mine (so I probably should keep my mouth shut about that, but I cannot): I really, really wish we'd stop with all this ASCII bullshit already. There's nothing special about ASCII except it's 7-bit. And that's outdated.

It may come as a shock to your average American, but most of the texts in the world are not ASCII, just deal with it. If your product doesn't support that, it probably means it's broken, outdated and will eventually lose to a competition, so you don't do yourself a favour forgetting about that.

And with regards to input, there's nothing complicated about typing ∀ (or anything else like that) on an ordinary US-layout keyboard. For me ∀ is 3 very fluent keystrokes. I use XCompose, which is a blessing, but other operating systems have similar software with similar capabilities as well. I understand that this is not something most people are familiar with, but that's just bad and shouldn't be respected: using a mouse also wasn't familiar to every single PC-user out there. With demand comes the supply.

So I really dream about the day when typing nice, clear syntax like ∀ instead of awkward Perl-like character sequences (to read which you need mentally traumatizing professional deformation and/or an IDE post-processor) becomes the norm of programming.


It's a mistake if your goal is to have the language implemented and become popular. Otherwise I like the notation.

Computer languages would be different if Ascii had provided code points for ∧ ∨ ∩ ∪ ≡ ↑ etc. It's too bad so much real estate is wasted in the control chars.


\ was invented so that ∧ and ∨ could be written /\ and \/. In retrospect, a mistake.

Original ASCII (1963) had ↑, but it was converted to ^ in 1967 so that it could double as a circumflex.

(Also ←, which was a good assignment operator; it's too bad _ didn't replace \ instead.)


There's a lot of negativity here for understandable reasons given the success of SQL empirically. But I'd encourage everybody to read the home page https://edgedb.com/. This project is not trying to replace SQL as its primary goal, it's trying to build a data modeling and query interface on top of Postgres that meshes well with modern applications that have hierarchical data akin to what you'd model with GraphQL. Major kudos for trying to rethink and improve upon things -- I'm sure it wouldn't be too hard to stick an SQL backdoor in there for the laggards too ;).


> for the laggards too

Speaking of negativity ;)


Would be nice, but bazillions of lines of SQL at the core of almost every business system make this as likely as “We can do better than five fingers.”

The article does nicely illustrate many of the well-known shortcomings of SQL. Chris Date and Hugh Darwen unsuccessfully tried to fix SQL with Tutorial D. Never heard of it? Exactly.


You can pry SQL out of my cold dead hands. Its just not that bad.


>>You can pry SQL out of my cold dead hands. Its just not that bad.

I often joke that SQL is the COBOL of the 21st century. HHOS. There's worse things...e.g. COBOL.

Actually it's very difficult to deny the empirically discernible utility of relational databases and SQL. SQLite, for example.


COBOL and SQL are for different purposes, apples and oranges. One can’t be worse than the other. Programmers who have never seen or written COBOL should stop trotting it out as an example of bad. You’d be surprised how many important systems are written in COBOL, and how well-suited it is for the problem domains it was designed for.


I'm alright with that.

I think the problem is that developers love to abstract everything. And with the database often becoming a god object it's often targeted for abstraction.

The problem is, querying data is a complex problem, complicated enough to require a query language.

So the developer goes off to abstract the database for their needs, a helper table object here, a helper row object there and eventually you start to see an Orm appear.

Developers need to be told to stop this behavior.

The abstraction is the thing you are representing in the database, not the database.

Your models should use plain SQL queries to execute behavior.

If you must, setup some helpers to do very basic crud operations, but as soon as you need to really alter a where or a select, drop to SQL, use the language as intended and watch your code suddenly become modular, readable and maintainable.

Yes, this means you will need to write SQL, yes this means you will write a lot of select and where statements, no this is not code duplication, stop freaking out about non problems.

I swear to god DRY is the source of and solution to all developer problems.


It almost seems as* SQL is a bit of a "must" by any data-heavy type of application/system so just spend 30 minutes and learn it already?


Probably more than 30 minutes, SQL may look easy but using it correctly assumes a fairly good understanding of the relational model.

I agree that programmers too often seem to wave database design and integrity aside and create RDBMS/SQL worst-cases with ORMs and terrible queries, and then blame the tools.


> Chris Date and Hugh Darwen unsuccessfully tried to fix SQL with Tutorial D.

Well, just the D class of languages; Tutorial D is (as the name suggests) a pedagogy-focusses implementation of the D requirements, the intent was that there would be one or more Industrial Ds.

(Dataphor is a D—the first implemented, IIRC—and is successful enough that it's a still-living commercial product.)


I assume this is different from the D programming language?


Date and Darwen’s D class of languages unifying the OO and relational models are different from Walter Bright’s “what C++ should have been” D language, yes.


> Never heard of it? Exactly.

"Accrington Stanley"

https://www.youtube.com/watch?v=zPFrTBppRfw


It's 2020 not 1995. There are already lots of good, production-ready SQL alternatives out there. There's full-fledged businesses that have zero lines of SQL.


What are good production ready alternatives for SQL? (That are still relational and doesn't make you rewrite half of the database logic again in each new backend you make -- not talking about key/value stores)


I would like to hear of one such business. Can you give an example?


Anything that uses Activerecord


Isnt ActiveRecord using SQL behind the scenes?


Technically ActiveRecord uses Arel, which in turn compiles, if you will, to SQL. Arel (a strict subset of) could theoretically be an alternative to SQL, if databases supported it as a native language. They don't, so the SQL step is a practical requirement.

This is similar to the browser (especially in the pre-WASM days). No matter what language you actually wrote your software in, eventually you were going to end up converting it to Javascript out of necessity. Would you still say that your program written in C, compiled with emscripten, was written in Javascript?


Stripe's primary datastore is mongodb.


But what does their BI tooling use?


NoSQL databases are non-relational.


Misses the point. Sure, we have alternatives, and always have had alternatives to SQL. But those don't address the bazillions of lines of SQL already running almost every important business system. SQL won't just go away because we can point out it's flaws or come up with something we think works better.


Maybe just have SQL Transpilers? I mean we accept ORMs, so we should accept that.


There’s a lot of that in the GraphQL space at the moment. Prisma, Hasura, Postgraphile, EdgeDB, etc...


Linq is pretty much cool transpiler from composable sql-like dialect to SQL.


Honestly I think SQL is pretty easy. I love it. I can teach the basics to a new person in minutes.

You know what we could do better at? Crappy explains from database engines. Crappy rate limiting capabilities. Poor feedback on keep cache pipelines fed during scans. Poor feedback on column size effects on reading stripes from disk and size alignments between the filesystem and database.


Your brain must work different to mine. SQL is by far the hardest tool I use. I’ve used all the main languages from asm up to js for real work and nothing breaks my brain like SQL.

I use it daily in a business that is heavy on SPs and while I get by and am improving the jump from inner joins and selects to CTEs and the other wizardry is massive.

I want to be better at SQL but so many problems I hit up against and think “well that’s a 2 minute job in js/swift/php”


> I want to be better at SQL but so many problems I hit up against and think “well that’s a 2 minute job in js/swift/php”

The thing about SQL is that it's the fastest way to read&write data in a relational database. Maybe writing the code is faster in js/swift/php, but the code will run faster in SQL. If you need to do something to 100M pieces of data you can do a lot worse than SQL.


You cannot compare SQL to either of those languages, since they are not built solely for data for data manipulation, but also a lot of other things, which SQL is not. SQL can do a lot too, but just because you e.g can do formatting in SQL doesn't Mean it's the correct place or choice.


That's circular reasoning though. Why do you want your data to be in a relational database? Particularly if you're not actually using its features (I don't think I've ever seen a web application that actually got any value out of database-level transactions, for example). A different kind of datastore could offer you better performance and easier querying.


I love SQL for making it possible to almost trivially enforce most business rules. Such as: You can only use one of the in another table specified values in this field.


I struggle to understand this viewpoint. In my experience business rules are harder to express in SQL than in practically any first-class programming language. "Is this value one of this list of values" - whether that list is hardcoded or dynamically obtained - is completely trivial.

(Of course if you apply some double standard where editing your "source code" requires multiple approvals whereas changing your "database" may be done at will in production then you'll find SQL logic easier to adjust, but that's a reflection of your policies rather than any fundamental reality.)


You're arguing that business rules are easier to express in other programming language, but the point of the parent post (and SQL) is that business rules are easier to enforce in SQL.

It's trivial to write some code verifying "Is this value one of this list of values", but writing such code not ensure that this constraint will actually be met in 100% of your past and future data.

It's very difficult to guarantee that a business constraint expressed in your client app is actually enforced - there can be different versions of your app applying the constraint differently, there can be multiple apps accessing the datastore, there could be manual interventions in various ways to the datastore, the app could have code paths that may cause data insertion or alterations without running that verification in certain conditions, etc; so for all intents and purposes you can't really rely on that constraint.


> It's very difficult to guarantee that a business constraint expressed in your client app is actually enforced - there can be different versions of your app applying the constraint differently, there can be multiple apps accessing the datastore, there could be manual interventions in various ways to the datastore, the app could have code paths that may cause data insertion or alterations without running that verification in certain conditions, etc; so for all intents and purposes you can't really rely on that constraint.

If you have business logic in your datastore then that puts you in much the same position though. If you have a trigger you might have data written before it was introduced, or queries that ran with it disabled. If you're gradually rolling out a new version then you might have some data that follows a constraint and some that doesn't. And so on.


For triggers, you can run a procedure that executes it for all past data. Or write a procedure that updates the data to a valid state.

When gradually rolling out, you can have adjusted stored procedures that deal with the different versions, and turn off the old one when it is no longer in use.

So, I fail to see the problems you mentioned.


And with good practices (mainly a decent type system that lets you distinguish between checked and unchecked values) you'll have no problems with constraints in the application layer either.


As long as your application layer is a single homogeneous application running the same code.

That is approximately never the case. Current widespread practices are moving away from it, with multi-services, and old fashioned practices of mixing customized and off the shelf tools basically forbid it.


Current practice is moving away from the idea of having your storage layer be a single homogeneous datastore as well.


So where are you proposing the consensus be placed? A lot (I'd argue most) applications need consensus for them to be stable and reliable in the long term, so either you place it in the datastore or in the logic processing. Where else would you place it?


I prefer to see the whole system as a succession of stream transformations (https://www.confluent.io/blog/turning-the-database-inside-ou...). If you view the sequence of input events as first-class and the "current state of the world" as derived, then a lot of problems go away. You need a datastore that can give you a consistent answer as to what order events occurred in, but it's a lot easier to make appending to an append-only list atomic than to make arbitrary state computations ACID.

For downstream derived computations, basically you either make the causal relationship explicit, or accept that you have eventual consistency. The only case where you can have inconsistency is where you have a "diamond" in your computations (i.e. you compute B that's derived from A and C that's also derived from A, and you compute D that's derived from B and C). So you figure out the business implications and either accept it or eliminate the diamond (by computing (A, B) from A and (B,C) from (A, B) instead of computing B and C separately from A). You will also get inconsistency if you do some totally ad-hoc query that's not part of your existing pipelines, but usually that's the kind of reporting query that doesn't need to be 100% consistent; if you do need a consistent version of that then the best approach is to take a regular snapshot, which can be consistent.

Basically you have a lot more precise control, you have causal relationships where you have explicit dependencies, so you have the level of consistency that you need for all your operational stuff, but you don't have a globally consistent realtime view of everything. It may take more work up front, but IME the notion of that kind of global consistency is a lie in a distributed world; even if you just have a basic webapp then you can't actually achieve the kind of consistency that that model pretends you have, because what the user is viewing in their browser (and then potentially making changes based on) at any given time is not necessarily the same as what's in the central database.

To get back to the question, this means that if inconsistency is due to an actual bug, it's pretty easy to solve: fix the bug and then regenerate from the original events. If you don't understand why data is inconsistent then you can always look back to the source events and figure out what it should be.


I don't think we'll come to an agreement in regards to that, so, I am gonna voice my disagreement and move on.

It was informational that I saw your viewpoint, even though I disagree. In the future I will definitely try and reevaluate my viewpoints to see if maybe I am wrong after all.


With SQL you can enforce it within the datastore. And if some requirement changes, you can instantly enforce it. Eg.: A new foreign key constraint.

Also: It is centrally managed, so you don't need to look all over the source code to find the constraints and don't need duplicates either, if you can, for example,change the data at multiple points.


> With SQL you can enforce it within the datastore. And if some requirement changes, you can instantly enforce it. Eg.: A new foreign key constraint.

Which is to say that you have little control over deployment. Adding a new foreign key constraint might block queries for an indeterminate amount of time while a new index is built, but you have no way to introspect or predict that at the SQL level (if you're lucky there might be a specific way to find out via the internals of your particular datastore implementation). PHP fans used to tout being able to just edit the code on the production server as an advantage, but most of us recognise it as the opposite.

> Also: It is centrally managed, so you don't need to look all over the source code to find the constraints and don't need duplicates either, if you can, for example,change the data at multiple points.

If your data is well architected so that you only have a single representation of each piece of data, sure. Equally if your application is well architected you can have a single API where any given thing happens.


We use exclusively stored procedures, so if we want to add a FK constraint while the application doesn't adhere to it, we do one of two things:

Either add it to the application and push out a new version. Or handle it within the stored procedure.

Additionally, there would be the option of updating the database when pushing a new software update out.

> If your data is well architected so that you only have a single representation of each piece of data, sure. Equally if your application is well architected you can have a single API where any given thing happens.

I think I formulated it badly before. In this case I meant that we have one stored procedure for each thing we do. So that, if the underlying data changes for example, we only need to change the procedures directly acessing it. We can then call those procedures from multiple different parts of the code. And the procedures are all managed within one application, mssms in our case, where you can have a much cleaner structure than within code.

In my experience, in PHP for example, you tend to write VERY similar/same queries in multiple locations, which can cause you to find it harder to all instances.


It makes some things simple and insanely fast at runtime, at the cost of making other things impossible or if possible horribly awkward and just as slow as in any other programming language.


In those rare cases, you can do post processing on the data in code if need be. I personally don't see it fully as either/or but as mostly SQL with enhancing the data in rare cases via "traditional" code.


> (I don't think I've ever seen a web application that actually got any value out of database-level transactions, for example)

You only ever have one user in your web applications at a time?

> A different kind of datastore could offer you better performance and easier querying.

Citation needed.

The so called "NoSQL" systems manage to scale better because they have very constrained query models. So they are only "easier", perhaps, in the sense of supporting less functionality, but in most cases that leads to a big increase in complexity in the rest of the application code.


> You only ever have one user in your web applications at a time?

No, but database-level transactions don't help you deal with concurrent users. You don't keep a transaction open between showing the user an edit page and applying their changes (at least I hope you don't), your transactions can last at most through the web request-response cycle. So if you want actual transactional behaviour (e.g. a wiki with "another user is editing this page") you have to reimplement it yourself "in userspace".

> Citation needed.

SQL datastores are very open about the tradeoffs that they make for the sake of ACID isolation - the fact that the transaction isolation level is tuneable shows that, as does the fact that all serious SQL databases use MVCC.

Also MySQL had benchmarks showing that 75% of the time for a query by primary key was spent on parsing the SQL.

> The so called "NoSQL" systems manage to scale better because they have very constrained query models. So they are only "easier", perhaps, in the sense of supporting less functionality, but in most cases that leads to a big increase in complexity in the rest of the application code.

Not my experience, because the overwhelming majority of the time the code doesn't make any use of the complicated SQL functionality. I've literally never seen a cross join in use. I can count the number of times I've seen nontrivial aggregations in live code on one hand. The few times I've written recursive queries I found poor driver support, incompatibilities between different databases. So you pay the cost of flattening your data into the SQL table model, but most of the time you get no benefit from it.

There are pretty much three different kinds of queries: indexed lookups into raw data, indexed lookups into derived data (secondary indexes being a special case of this) and aggregations over a full table (table scans being a special case of this). A query model that represents those cases separately is easier to understand and work with, especially when it comes to understanding the performance. If your datastore supports server-side map-reduce style aggregations then you can literally execute arbitrary code in a "query", but you'll have a clear understanding that you're doing something different (with performance implications) from looking up a value by its indexed key.


You’re not thinking in sets if you’re constantly inclined to move to js/php/swift. Spend some time trying to break this mental bearier and you’ll find that SQL becomes easier and more sensible.


I had three college classes that really sealed this up for me: linear algebra, a set theory 'light' class, and a class on database mechanics itself. After than I was able to reason about mathematical sets, and scalar functions and what the database engine is doing.

If you "just learn SQL" without understanding the abstraction below it, it'll be difficult to be successful, much the same with anything else.


The problem for SQL is (apart from horrible syntax that makes any query re-factoring tedious) that databases are very strict structures, that through their rigidity make some operations very fast, but some operations impossible (to perform in this fast manner).

When you are crafting good SQL query (crafting is the right word as each non-trivial SQL query is a little puzzle that may take you few days to solve because of the constraints) you need to stay within the bounds of the this fast db world.

Whenever you are forced to open a cursor or use CTE for recursion or even have a full table scan you already left the fast land and landed in the world that all general purpose language inhibit, where you have to iterate and recurse and everything takes ages. And in that land any other language beats SQL because any other language has the syntax designed to make things easier in this world while SQL has the syntax that's just good enough for the fast world where operations are highly restricted and when it ventures into the slow world it's just a horrible mess.


An excellent point: there are so many other problems that deserve our attention regarding databases that talking about something that's worked really well for 50 years is kind of silly. We have bigger fish to fry.


Not only this, but resources GALORE since its been around longer than the internet and has been used by just about every developer. Java, Python, C#, Node, PHP etc... they all know SQL.


SQL was designed by COBOL / NATURAL people and was intended to be used by "normal people". That was part of a larger movement at the time to make programming more accessible. Unfortunately (a) that experiment mostly failed (b) the resulting syntax does not compose well at all (c) the syntax does not represent the underlying semantics and mathematical operations well at all.

The combined effect is a rather tortured language, as it has been extended over the years.

However, replacing it is equally problematic because of the huge installed base.


What is needed is a way to use something like the new language presented here against existing relational databases, preferably in a way that doesn't involve sending queries as plain text strings back to the database engine - query compilation should happen in the client library, and be sent to the database as some sort of bytecode.

Support for this bytecode should be added to Postgres, MySQL etc, not some new database product. Projects should be able to mix old style SQL queries and queries written in the new language at will.


> SQL was designed by COBOL / NATURAL people and was intended to be used by "normal people".

That's not how I remember it. What I recall is that SEQUEL was the result of looking at how databases and set theory could be connected.

https://en.wikipedia.org/wiki/Edgar_F._Codd

That had nothing whatsoever to do with COBOL or NATURAL.


There is a reason it was originally called (SEQUEL)

  Structured __English__ Query Language
see the original paper at

   https://web.archive.org/web/20070926212100/http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf


I didn't say anything about the name. But you made some pretty strong claims about the provenance of SQL that have - as far as I know it - no bearing on reality. The groups behind NATURAL and COBOL had nothing to do with SQL afaik.


One word of caution to those trying to improve on SQL - for the many users of SQL, technology is a secondary aspect of their jobs. Hence any replacement runs into the issue that many of its core users do not have the bandwidth to spend significant effort on learning another querying language.

I will cheer everyone who tries to displace SQL, because I do think it needs to be displaced but would also want to caution such people on the magnitude of the task ahead of them.


I often think that a great way to displace SQL would be to create a language that compiles to SQL similar to how TypeScript compiles to Javascript.


You mean like an ORM?


ORMs I've worked with usually expect you to write the complex queries/joins yourself using a series of functions e.g. orm.select(table1).join(table2).on(keys).where(cond).group(by).having(cond).order(by). It's just SQL with extra steps.

An ORM that understands db schema, indices, and automatically decides where to use outer join or subqueries based on what would be most efficient given the specific schema would be fantastic. Basically an ORM that 95% of the cases comes up with the most efficient way to look up the data.


At minimum some ORMs let you reorder things which makes SQL 50% better.

Smarter ORMs do what you want. Linq is magical.


> I will cheer everyone who tries to displace SQL

Why? What are better alternatives, really?


There's a non-trivial number of live running systems out there that interact with RDBMSes entirely through either ORMs or query builders, without a single line of programmer-written SQL. Those systems seem to hold up well.


Those don't replace SQL in the same way that typescript does not replace javascript. They are also often specific to a certain framework or language in a way that makes them infeasible to compete with SQL in a general way.


I for one would welcome a new alternative to sql. It might not be _this_ alternative, but why not try.

SQL is very hard to learn properly, with all of its gotchas and inconsistencies. There are running jokes for noobs truncating their tables due to forgetting a where clause. I’ve seen junior devs crying in tears and throwing their mice just because they needed to debug / optimise a complex query.

The mare existence of all the ORMs is a testament that people would opt to write (or use) insanely complex pieces of software just so they don’t have to deal with the lack of composition and ease of use.

All of those look to me as signs that something wrong with the core itself. We could do better.

If we settled for good enough in all cases we wouldn’t have Go or Rust, React or Postgres. In fact every software that we have is a product of someone thinking “this is hard/wasteful/unexpressive/etc, lets write an alternative”, SQL included.

This alternative looks quite promising. We can wait to see how they can handle the edge cases, but the core looks a lot simpler to deal with than regular SQL.


> SQL is very hard to learn properly, with all of its gotchas and inconsistencies.

I don't understand why though? I've been using SQL (Postgres for the most part but with a smattering of MySQL thrown in) for around 8(?) years, which isn't much in the grand scheme of things but I have not had anything that couldn't be resolved. I've written small straightforward queries to over 200 loc and never had a problem understanding it if you read it slowly/broke it down into smaller queries.

In fact, ORMs have been a massive headache because I can think in SQL but not in whatever the creator of the ORM was thinking in. Those giant queries that I was talking about - there's no way to represent them in ORM form.

SQL works in a language agnostic way, you can explain analyze your SQL queries and run it through whatever medium you prefer. Typical experience with an ORM goes like so:

1. Lets use an ORM because it'll be easier

2. It's not actually easier and it's a complex mess now, but let's stick with it anyway

3. Figure out a way to log the queries that the ORM made up for you/printf it

4. Run that through EXPLAIN ANALYZE

5. Can't make the ORM do that, file a bug report that'll be buried

6. Use native query while you wait

7. Tech debt etc;


In my experience people who have trouble with ORM usually try to use it for something which it is not well suited for: namely OLAP.

ORM can really get in the way when you want to express groupings, aggregates, and all kind of joins sprinkled with let's say stored procedures.

But this has nothing to do with the ORM itself. It's just a fact that many people don't understand/consider the tradeoffs before jump into acting on something.


> In fact, ORMs have been a massive headache because I can think in SQL but not in whatever the creator of the ORM was thinking in.

Yes, fully agree on ORMs. They DO have one nice feature though: simple CRUD operations are way less verbose than constructing SQL statements.

What we lack is a better integration between the host language and the databse. Constructing a prepared statement from a string, setting parameters, executing, fetching rows from the result set and mapping back to fields... all is a major, repetitive PITA.

And yes, I find it easy to think in SQL and often wonder WTH an ORM is going to generate. Just recently I improved performance of an application by going from ORM to SQL; first I reduced number of round-trips (ORM/efcore first wants you to fetch an entity before you can update it), second, I batched updates into a single session/transaction. Win! :) [Oh, and don't get me started ranting about ORM and transactions.]


Just to let you know, those things are not needed at all if you use a different type of ORM, similar to ActiveRecord or Django ORM. It all depends on what you choose. So probably the first mistake was that you didn't do the homework of exploring ORMs maybe?


I'd rearrange the list a bit to what I see experienced devs do, and I don't see a problem with it:

1. Lets use an ORM because it'll be easier

(skipping 2, because it's not really a mess; skipping 3, because they know that before getting to this point)

4. Run that through EXPLAIN ANALYZE

5. Can't make the ORM do that

6. Use native query

7. Profit

There's nothing wrong with having `Users.active.find(id)` where you want it and writing out the complex query as SQL where it gets complicated. They can live next to each other just fine and still improve your life.

Going extreme in any direction is going to cause problems. (whether 100% ORM, or SQL purity) It's fine to use different approaches where they're appropriate.


> If we settled for good enough in all cases we wouldn’t have Go or Rust

funnily enough, golang is a regression in practically every front compared to established ecosystems like Java and C#.


> There are running jokes for noobs truncating their tables due to forgetting a where clause. I’ve seen junior devs crying in tears and throwing their mice just because they needed to debug / optimise a complex query.

Isn't this the same for most shells including almost all linux distros? The CEO of red hat accidentally wiped his computer becuase he forgot a slash but we don't throw out a whole tool just because it included a footgun.

The same is true for SQL.


I am a big fan of a relational model. SQL itself is OK but far from great. So, I wish you a lot of success!

I was part of a similar attempt - building a better "SQL" and relational DB. This was roughly 8 years a go. You can have a look at our GitHub Projects or look at some further links and may be you get inspired :)

* http://bandilab.github.io/ - introduction to the bandicoot project

* https://www.infoq.com/presentations/Bandicoot/ - presentation of the Bandicoot language on

* https://github.com/ostap/comp - another interesting attempt, a query language based on a list comprehension


bandicoot looks interesting, and it feels spiritually related to a project I'm working on. I'm designing a programming language for board games: http://www.adama-lang.org/


Very interesting. I am not very familiar with many board games, but i'll definitely look at the language more!


I'm not convinced by a lot of the comments here focusing on SQL as an irreplaceable juggernaut. Currently I think Postgres is just fantastic but recent DBs have shown that even SQL can bleed.

There's some neat stuff here and I hope the project well. I would love to see object/hierarchical result set support grow. SQL ORMs feel so kludgy.


SQL orm feel kludgey because they are, entities on rdbms are normalized in their fractional parts and interconnected by a multitude of relations so that you have multiple entry point to obtain different views of the same data and multiple ways to organise and prune results to find interconnection across multiple entities

once you reduce all that to object traversal all your options are lost, your only entry point is the entity and the only connections are direct paths

it's not the underlying query language, it's the flattening to object


> it's not the underlying query language,

SQL and it's implementations do not support nested relations. The parent is suggesting that e.g. nested relations would enable better ORM solutions.


> nested relations

parent was right that hierarchical queries are not well supported, but that's not the same as outright using nested relations; also if you're not going to store data following normal forms for convenience no wonder sql is going to have a hard time querying it.

at which point you're better off with an object storage and a searchable index to the side anyway


SQL has a lot of incidental, accidental complexity, no doubt.

Though when I reason about SQL, I think mostly in terms of functional operators over streams of data: projection, filtering, flat-map, join, fold/reduce. Obviously optimization means looking through streams and seeing tables to find indexes etc., but once you get to the execution plan, you're firmly in a concrete world of data flow and streams of tuples.

I didn't get on well with the example syntax in this write-up. It didn't mesh better with my mental model of relational algebra either at the logical or physical execution level - and the truth is you need a foot in both worlds to write good scalable SQL today.

Aside from the complexities of dynamic construction, my biggest problem with SQL is modal changes in query plans, owing to how declarative it is. It's a two-edged sword: the smart planner is great, up until it's stupid. And it usually turns stupid based on index statistics in production at random times.


Yes, SQL has flaws, and the article forgot to mention one of them: you need to build a string to build an SQL query, rather than a more structured object, leading to flaws like SQL injection vulnerabilities, and difficulties adjusting the query.

Let's say you're building a CRUD app with search and filtering capabilities. Unless you are using an ORM (which has problems of its own), you might be tempted to build the SQL query string like this:

  conditions = " AND ".join(filter_key + " = '" + filter_value + "'" for kilter_key, filter_value in filter.items())
  order_by = column_name + " DESC"
  query = "SELECT col1 FROM tablename WHERE " + conditions + " ORDER BY " + order_by

But this has multiple SQL injection vulnerabilities. Doing it correctly is not just a matter of using SQL parameters, because column names need to be escaped differently than string literals. Linters can't distinguish between correctly escaped queries and incorrectly escape queries in non-trivial cases. Also, the query will throw a syntax error if the number of filters is zero, since you can't have an empty WHERE clause.

I don't think a new query language solves this problem.


Isn't the solution a "query builder" pattern for your preferred programming language, that generates a SQL string and sends that to the database?


I think this is an unfair comparison. Constructing SQL query strings is risky, because it is essentially dynamically constructing and evaluating code from user input. However, there is no reason why one can't use parameters (like you mention) or even call stored procedures.


This problem would arise with every DSL. And it has been solved by using embedded DSLs.

jOOQ or SQLAlchemy look like SQL (and you don't even have to squint your eyes very much) and solve the problems you mention.


jOOQ and SQLAlchemy are libraries for Java and Python respectively. You can't take your knowledge of those libraries and use them in a different programming language. And you still need to know SQL well in order to do any debugging, because the database is still receiving an SQL query string.

What I am wishing for is for the language to be more like JSON, something that matches closely to commonly found structures in programming languanges (like lists, objects, numbers, strings and booleans), and that the database can support natively.


> jOOQ and SQLAlchemy are libraries

While it is true, they are specifically embedded DSLs, which is exactly what you said you needed. Having a (query) language (a DSL) that is not composed by string concatenation. Embedded DSL solve exactly that problem and they are usually just translations from some DSL into concrete syntax in some programming language, ideally supporting type correctness and preventing syntax errors, etc..

And what you "wish for" is, again, just an embedded DSL, you only wish for you constructors/functions to be native instead of some import/module.


> the article forgot to mention one of them: you need to build a string to build an SQL query

IMO this is in the article mentioned as 'poor system cohesion':

> poor system cohesion — SQL does not integrate well enough with application languages and protocols

----

> I don't think a new query language solves this problem.

LINQ ?


You're right that a new query language doesn't solve this. Datalog is over 40 years old, after all.


And we're still researching how aggregation should work. While Datalog is really cool and my PhD research is based on it, the fixpoint semantics really do not work well with aggregation, grouping, etc.


SQL's fundamental misstep was to try to present the relational model in a form they thought was more comprehensible by a) using an "English-like" language and b) renaming core relational concepts (relation -> table/view, tuple -> column, etc.) It has only muddied the waters and led to all sorts of misunderstandings and complaints ("but my data isn't tabular!")

The "English-like" syntax means that what is actually happening is obscured (so many misunderstandings of what "selection" is, for example), and it means that composing multiple operations gets very awkward and hard to read and in fact many things that the relational algebra itself permits are not really expressable.

And renaming core concepts means people means people get confused. They don't understand what the "relation" in relational is, and think it's about relationships. They think SQL is all about tables, when tables are just one way of representing predicates. Etc. etc.

The relational model is a very elegant method for presenting facts about the world and then the relational algebra is a nice functional programming style system for slicing and dicing those facts into information in basically arbitrary and recomposable ways.

SQL has obscured that. It's awful.


> The NoSQL movement was born, in part, out of the frustration with the perceived stagnation and inadequacy of SQL databases.

I would dispute this. The antecedents of NoSQL were the parallel programming models of HPC. They weren’t specifically excluding SQL, and NoSQL was a term that was invented after the fact.


> The antecedents of NoSQL were the parallel programming models of HPC.

Can you elaborate on what you are thinking of? As a refresh, here's when and how the (current usage at least) of NoSQL was introduced: https://subscription.packtpub.com/book/big_data_and_business... in 2009.

> As Oskarsson had described, the meeting was about open source, distributed, non-relational databases, for anyone who had "… run into limitations with traditional relational databases…," with the aim of "… figuring out why these newfangled Dynamo clones and BigTables have become so popular lately."

I was using MongoDB at the time (we became one of their first paying customers -- they didn't even want to take money for support at first!) and HPC wasn't in the air. So please elaborate.

http://2009.drupalcamp.at/sessions/chx-session.html as far as I can remember this was my first MongoDB talk. It's been a long time ago.


I’m referring to Google’s 2004 MapReduce paper.

The functional style that MapReduce derives from had been used in parallel computing models, e.g. the scatter/compute/gather model of MPI, and in turn this was adopted by Hadoop, CouchDB, MongoDB and others.


Agreed. The name NoSQL succeeded because it was concise and attention grabbing but the main goal wasn’t about eliminating SQL.

A big part was document oriented databases like CouchDB and MongoDB made more sense for a lot of web based use cases, where in the end you’re serving a page of content. Building a relational model often little sense for the web and makes managing the content harder; that a lot of websites can be built with a static site generator highlights that.


I'm not sold.

I acknowledge that these are real issues, and commend the authors for attempting to address them. However, these issues rarely cause any real friction for me - I generally find SQL among the most ergonomic languages I use (regardless of dialect).


Started reading believing the article would be utter nonsense but by the end was convinced they might be onto something. My hat goes off to them if they pull this off. It will take a lot to push SQL out of its stronghold.


Datalog is superior to SQL in every single way, except that very few people know it: http://www.learndatalogtoday.org/


SQL is definitely not perfect but it's supported everywhere and integrates with everything, so you can augment it with other languages. That fact makes a lot of the critiques sort of moot.


Yup. The fact that it has been around for 46 years and dominant in its problem space for most of that time is pretty compelling too. In the 90s it was going to be replaced by object databases, said the hype. In the aughts it was up against Mongo and Couch and the like. The pretenders to throne keep coming and going.


I see a lot of Stockholm syndrome in this thread / maybe low expectations.

Many people are saying SQL isn't that hard to learn but as someone who is new to SQL, I disagree.

It takes a max of 15 minutes to understand basic JavaScript/Go/Python primitives and write a program. SQL on the other hand seems much more complex. I might as well be reading Haskell or Lisp. At least those languages are consistent.

SQL does not feel like a language where I can learn a few primitives really quickly and compose them together.


People who learn the nitty-gritty details and all the gotchas and tricks in the book, appear to be experts and their job positions depend on it. It's all a bit "ludditic", people confuse this familiarity with arcane commands and proficiency with real expertise and deep knowledge. When you challenge that, you challenge their existence. Knowing how null behaves (Null = null vs null is null) is one of those shibboleths that differentiates an sql wizard from a lowly commoner, like knowing all the nuances of prototype inheritance makes one a true JS champion. These people charge a pretty penny to optimize badly written sql queries and fix your joins. And you want to take that away from them.


At the other side of the fence we have JS/TS experts writing inscrutable one-liners by chaining 15 lodash commands.

Having tried both SQL and the approach of using a programming language + framework of the day, I prefer SQL for data manipulation. It's far easier to troubleshoot, scale, hand-over or maintain in the long run.


> It takes a max of 15 minutes to understand basic JavaScript/Go/Python primitives and write a program.

Basic SQL can be learnt just as quickly, if not quicker, I'd say, as it is close to english in comparison with other languages. IMO the hardest parts are stuff like pivots and cursors, along with performance problems in complex queries.

I personally wrote my first few queries within 30 minutes of starting to learn it.[1] Of course it wasn't particularily good SQL, but workable enough.

[1]Basically got an apprenticeship and was almost instantly told to write some queries.


Sorry, but what is hard to understand about

SELECT <columns> FROM <table> WHERE <column> = <value>


Since it's so easy, I think you won't have a problem with any of the questions on this page: https://www.toptal.com/sql/interview-questions


Correct, but I have years of experience with SQL so I feel like I'm cheating a bit.


As I read this I wondered:

"Has anyone fixed these problems elsewhere?"

Then:

>The NoSQL movement was born, in part, out of the frustration with the perceived stagnation and inadequacy of SQL databases. Unfortunately, in the pursuit of ditching SQL, the NoSQL approaches also abandoned the relational model and other good parts of RDBMSes.

Yeah that's what I was thinking, they really don't fix the issues listed, just have chosen to solve other problems, but not in a "going to fix SQL" kind of way.


> The NoSQL movement was born, in part, out of the frustration with the perceived stagnation and inadequacy of SQL databases

I'm a little bit young, but isn't this a bit of a revisionist take, by the author?

I thought that Amazon, Google, FB et al moved away from relational databases because the sharding logic they needed to build on top of these databases was approaching the complexity of a RDBMS. They didn't need strong consistency or support for complex queries, on the kind of data they were storing at scale, and so made compromises in those areas while engineering their purpose-built alternatives (Dynamo, BigTable, Cassandra).

It's not that SQL didn't work, but that the persistence layer was too strong and therefore too slow for their very particular needs. It's like comparing a minivan/suv (mysql/postgres) to a drag racer (nosql databases). You don't want to drive your kids to soccer practice in a two-seater with no airbags, and a 5* crash safety rating isn't as important to the pink-slippers as horsepower and 0-60.

Or am I missing something?


> I thought that Amazon, Google, FB et al moved away from relational databases because the sharding logic they needed to build

Note that these companies did not move away from relational DBs until long after the "NoSQL is Web Scale" video. Yes, Google invented Big Table to help power search (and others), but their revenue system, AdWords, didn't move off MySQL until like 2015. And last I checked, Facebook is still a heavy user of MySQL with sharding.

The original NoSQL software had two major value adds: you didn't have to learn a new language, and were faster (typically via disabling fsync -- the DBA equivalent of running with scissors). If you knew SQL or an ORM already, you were really just hoping mongoDB was faster magically.

These days you can even just tune pgsql to support kv store formats: https://www.postgresql.org/docs/9.1/hstore.html. Yes, you'll have to pay someone to know how to DBA pgsql, or pay AWS to pay someone, but I'm comfortable paying that price.


>These days you can even just tune pgsql to support kv store formats

And you can turn off fsync! Though if you do, disable synchronous_commit instead for most of the performance but none of the potential data corruption (you're still risking data loss, of course, just not corruption).


SQL RDBMSes solve a lot of problems / provide a lot of services: locking & concurrency control, computation, consistency, replication, transactions, etc.

Not all of those services can scale horizontally. Concurrency control most especially, but other things which assert global invariants are often too expensive. Most NoSQL systems remove some of these features in order to scale horizontally. But the problems they solved remain, and need new solutions. This has two effects: it forces clients to do more which hopefully means doing less complex stuff (you can write mega expensive computations in SQL where the nested loops might offend you in handwritten code); and it means higher risk of bugs and more engineering effort for correctness (e.g. transactions in application, eventual consistency, reimplementation of transaction log in queuing systems, etc.).

In transport analogies, RDBMS is like a lift helicopter, NoSQL is like a fleet of container ships. Or RDBMS is like a car, and NoSQL is like a train network. NoSQL is inflexible and needs lots of extra attention at the edges, while RDBMS needs a careful operator and doesn't work well beyond a certain scale unless you give everyone (or subgroups) their own instance (which could be sharding, it can work).

And if you don't have a really big problem, NoSQL is probably the wrong choice, not because it's fast with few safety checks, it's because most of them do very little for you, they can just do a lot of that scaled out.

If you're not scaling out, stuffing JSON into Postgres will give you a better experience even if you hate relational algebra.


> I thought that Amazon, Google, FB et al moved away from relational databases

Surprisingly, Spanner has tables with columns and you can run SQL on top of it.


Though notably the SQL interface was not present initially and was added later.


Isn't NoSQL at this point used in 99% of the cases for basically analytics on logs, especially for ads?


That is probably true. But, I'd also suspect that in 99% of cases some form of that data (lots of post processing) ends back in data store where all kinds of people use SQL to analyze it. Two main reasons: 1) join it with other data 2) SQL is so widely understood across functions / roles


I love SQL. I HATE using Elasticsearch because it doesn't support SQL. If you ever used Elasticsearch, you know exactly the nightmares I am talking about.

Products support SQL because everyone knows it and it works, regardless if it isn't perfect. Trying to create a new version of SQL is ruining your capability to have millions of trained users that already know how to use your product.


Agree. Building elastic query AST is a PITA. Check out https://www.elastic.co/what-is/elasticsearch-sql


The last technology that attempted this task was NoSQL, and we all know how that ended.

First, people realized that schemas (just like static types) are extraordinarily important for robust software.

Second, NoSQL lost to SQL over the long run in pretty much all dimensions: query language, performance, scalability, concision, etc...

As a result, not only is NoSQL on the way out, but SQL databases have actually become better at supporting NoSQL features than any NoSQL database.

SQL didn't just win, it absorbed its opponent and became even better as a result. Never underestimate the versatility and adaptiveness of a technology.


> The last technology that attempted this task was NoSQL, and we all know how that ended.

Do you mean billion dollar companies?

Don't get me wrong, wouldn't go near the popular NoSQL databases I've used in the past again, but I sure wish I invented them.


if NoSQL caused SQL databases to improve, then i'd argue it was a success.

maybe EdgeQL can have the same success by demonstrating improvements that can be added to SQL databases.


I do not really agree with the two cons the authors listed.

Null handling isn't intuitive in the beginning, but it makes it harder to let missing data go unnoticed.

The expression / table thing can be solved like we solved it in OctoSQL[0], and I think others have solved it in a similar way. Whenever you have more than a single scalar value in expression position, just create a tuple, or tuple of tuples out of it, which does act like a single value.

[0]: https://github.com/cube2222/octosql


If I could change SQL, I would flip the ordering in the syntax.

Instead of "SELECT ... FROM ... WHERE ..."

I would change it to: "FROM ... WHERE ... SELECT ..."

And you get the idea...


Can you model this in EdgeQL? https://developer.mongodb.com/community/forums/t/is-this-que...

Area of curiousity at the moment as I too agree that SQL is a poor fit, even if the better DSL inputs eventually get reduced to SQL command text and parameter arrays.


> Can you model this in EdgeQL?

Absolutely!

  WITH
    april := <datetime>'2020-04-01T00:00+00',

    NewCustomers := (
      SELECT Customer
      FILTER
        NOT EXISTS (
          SELECT .orders
          FILTER .date < april
        )
    ),

    AprilCustomers := (
      SELECT Customer
      FILTER
        datetime_truncate(.orders.date, 'months') = april
    ),

    NewAprilCustomers := (
      SELECT AprilCustomers
      FILTER AprilCustomers IN NewCustomers
    )

  SELECT
    (count(NewAprilCustomers) / count(AprilCustomers)) * 100;
This assumes the following schema:

  type Order_ {
     property date -> datetime;
  }

  type Customer {
     multi link orders -> Order_;
  }


Thanks. I'm going to dig in a bit more. I've been sold by the above and the homepage.. :)


This was written on my mobile, so haven't had a chance to test it, but here's my first pass at modelling it in SQL:

  select sum( case when prev_cust.cust_id is null then 1 else 0 end) / sum( april_cust_count ) as pc_new_cust
  from (
    /* get unique customers in April */
    select distinct cust_id, 
        1 as april_cust_count
    from orders
    where order_date between date '2020-04-01' and date '2020-04-30'
  ) as april_cust
  left join
  (
    /* get customers with a transaction prior to April */
    select distinct cust_id
    from orders
    where order_date < date '2020-04-01'
  ) as prev_cust
  on april_cust.cust_id = 
  prev_cust.cust_id
Apologies for the lack of code formatting... I find that when SQL is written with a nice formatting (e.g. Nested sub queries with tabs) it reads a whole lot better.


There was a bullet in there regarding "poor system cohesion — SQL does not integrate well enough with application languages and protocols." I was curious to hear the authors thoughts on this. I feel ORMs have mostly solved this even while introducing their own set of problems and learning curves. And when you're done fighting your ORM on those bitchy <= 1% queries, well, you just write SQL.


Completely agree regarding the complexity of various flavors of SQL. The thing is, this complexity came from legitimate need.

My team gets by with a very small subset of PostgreSQL functionality day to day because most of the stuff we're doing with our database is just not that complicated. Simple lookups, writes, joins when our applications interact with the database. Simple joins, grouping, aggregation when we personally interact with the database.

We are not confronted with the full complexity of PostgresQL every day. And on, the flip side, the database itself gives us killer functionality in the form of constraints and transactions. It offers a lot more, but this is all we care about an overwhelming majority of the time.

I am curious how you see it. Is there a compelling reason for a team like mine to leave their comfort zone to work with your new database? Does the new query language really solve any problems for Joe Sixpack, developer?


We could do a lot better than SQL. It's just that nearly every replacement falls into one of two traps:

A) Not being able to express relational semantics.

In general, every programming language replacing its predecessor allowed to express about the same semantics, even if it wasn't natural to the new language.

One can write imperative Java/C++ even if the language doesn't like it, and successful functional languages allow escape hatches for mutable objects.

The various NoSQL languages typically fail this hard.

B) Not offering enough of an improvement.

Minor improvement isn't worth the 'yet another query language' burden. EdgeQL doesn't fall into A, but it may fall into B.

NULL is an annoyance, but not big enough to justify another language. Throwing an exception instead is a very double-edged sword. The author needs to show far more improvement to justify a new language (I'd have liked to see more examples of composability for instance).



thanks!


I feel like SQL is in general "Misunderstood". How often did I see application developers put the same principles and design patterns onto an sql database like they do in js or c# or rust. They use functions to encapsulate functionality, write sequential code, use cursors where not needed.

Then the language is blamed for not performing well or yielding different results than expected.

Some points in this article are valid but I think the main issue is the general notion sql would be like javascript or c#. It is not, it is very different and needs a deep understanding, including the works of the underlying dbms, to perform well.

I guess i'm just not a fan of throwing new languages and tools at problems we identify, which seems to be a trend nowadays.


Overall I like the ideas behind it and the problems with SQL that it solves, at least on a superficially observational level. It could use some real brain bending examples for those of us that deal with difficult data sets.

The tasks needed to actually take over where SQL has left off seem absolutely monumental though. I can't help but think it will never get there, just like every other attempted query language out there.

In my opinion, there are two things that can make SQL 100% better, which wouldn't be a new language, but rather an update to the language standard:

1) algebraic data types, allowing us to get rid of terrible ternary null logic and more closely model real world data domains.

2) a really well thought out date/time API, along the lines of JSR310.


The problem I see with this is that the "What's wrong with SQL" is an excellent argument against SQL the language, but not SQL the engine. The optimal solution to that problem seem to be a syntactic skin over an existing SQL engine.


No.

Having worked with MongoDB for 5+ years now back to PostgreSQL, I do like SQL so much more than a custom query language. I would wish JSON would be better integrated in SQL, it kind of feels like an addon not the core. Otherwise SQL is a fine language.

Also I can leverage 20y of SQL.


EdgeDB is built on postgresql and has excellent JSON integration.

https://edgedb.com/docs/datamodel/scalars/json#type::std::js...


I wish instead of

SELECT to_json('{"hello": "world"}');

it would be

SELECT {"hello": "world"};


> In EdgeQL every value is a set

This is trouble for the example for calculating the average number of reviews across movies:

   SELECT math::mean(
       Movie {
          description,
          number_of_reviews := count(.reviews)
      }.number_of_reviews
   );
Never mind, they are not sets:

> Strictly speaking, EdgeQL sets are multisets, as they do not require the elements to be unique.

The relational model is firmly based on the idea of a relation as a "set of tuples", and a major criticism of SQL has been that it views data as an ordered sequence of tuples.

So I'm skeptical of the claim that EdgeQL is really based on the relational model.

(Not clear whether multisets are ordered - wondering about window functions etc...)


Both EdgeQL and SQL disregard the RM proscription about duplicate tuples for practical reasons:

1. Elimination of duplicates from every projection is prohibitively expensive.

2. Sometimes you actually _want_ duplicates to show up without injecting a synthetic key into every projection.

3. There's DISTINCT.


I can see the improvement of their EdgeQL over SQL...

But (as I haven't read of their blog posts) I am a bit more reluctant about the whole thing when they describe it as an ORM.

Can we leave the ORM and take the query language and implement this as a Postgres extension?


Exactly my question. If improving on SQL also means that I have to throw out all the maturity of Postgres, it's very unlikely to happen.

But if you can define a new query language that can be implemented by existing relational DBs, you might actually have a shot at displacing SQL.


I'm not doing SQL every day but i have rarely any issues with it.

Do we really need to do better than SQL? I don't think so and i also don't think that the chosen new syntax is better.

At the end of the day, most critical is not the language but understanding how it works to optimize indezes etc. If you are only able to write simple SQL because you are not good in SQL/Databases, you will not optimize your Database independently from the language.

If you are good in SQL/Databases, you (or at least i) do not care about syntax details; You just look it up, and get acquainted to your specific underlying Database.


I'm not doing C every day but i have rarely any issues with it.

Do we really need to do better than C? I don't think so and i also don't think that the chosen new syntax is better.

At the end of the day, most critical is not the language but understanding how it works to optimize assembly etc. If you are only able to write simple C because you are not good in C/algorithms, you will not optimize your algorithms independently from the language.

If you are good in C/algorithms, you (or at least i) do not care about syntax details; You just look it up, and get acquainted to your specific underlying microarchitecture.


I do see my comment in context of using SQL for your Queries while your main focus as a Developer (like for java etc.) is not writing SQLs as your main priority.

Also it does state 'we can do better than sql' and i do have a certain amount of practical experience to state my personal opinion that i do not think that their approach is actually better than sql.

They did show quite avg examples; Examples which are leading me to assume certain points like where they would like to replace sql.


I've grown pretty fond of the way Spark SQL queries can be represented with DataFrame operations. There is a more or less 1:1 relationship with SQL, except the commands can be programatically generated and composed. It sure beats stitching together a SQL string when you have a bunch of query clauses that might be optional, or need a generalized way to take 30 result columns, and rename them with a prefix or something.

e.g. result = dframe.select(*[f.col(colname).alias(f"{colname}_old") for colname in dframe.columns]).join(other_df, 'joincolumn', type='outer')

and so forth.


I used to use SAS macros to conditionally generate monstrous SQL queries all the time. It was a bit hacky, but man I could make a thousand lines of SAS do just about whatever I wanted however I wanted. It really feels like a powerful way to tackle messy real-life business logic.


I often find SQL is extremely good for explaining what you are trying to do.

It is of course a declarative language, but more than that it does what a good language should do: explain in both directions.

Languages need to tell the machine what to do, and to tell the person reading the code what it was the original author was supposed to be doing. Many bugs happen when the two don’t match up, and the maintainer is often not the person who wrote the original code.

Well written and handwritten SQL is some of the least commented code I’ve seen because it doesn’t need comments — it is self explanatory.


Mostly, unless the author was enamoured of subqueries, or believes that the only acceptable name for a CTE is cte1, 2 etc.

But yeah, as long as you know what the underlying tables look like and what they enforce, then SQL is really easy to maintain and understand.


I feel like a lot of these concerns are resolved in tools like R's dplyr. You use mostly the same R code, whether your data is a data.frame, or living in a SQL source. dplyr generates the SQL query for you.

By reading the queries it generates it's quick to pick up how the SQL works! Another big advantage is that you can always pull the data into R and have a ton of general purpose tools available.

dplyr is aimed at data analysis though, so may be other use-cases for edge db?


I've always wondered if a newer language could be designed with ANSI SQL as a transpilation target, or each of the vendor SQLs as targets. Optimization of queries would be a huge problem, but it always seemed like the only way it would be possible to break out of the SQL hegemony, i.e. first transpile, then start developing native support in the open source databases, then pressure the proprietary databases to adopt native support.


That's what frameworks like Hibernate or Doctrine do, they have their own object oriented query language that compiles to SQL.


Well, that's an object oriented design, and it's not really a full language in that it has its own syntax. I'm talking about a relational language with its own syntax.


> and it's not really a full language

Both DQL and HQL are complete query languages.


I've had similar thoughts to the title, but my ideas were more around improving the ease of use/writing.

1. A cleaner universal more natural syntax for analytics: I love writing python as it is to me such a cleaner syntax than C or Java. We could do the same for SQL and make something that feels more natural. Turning a common query like

> SELECT count(*), TO_CHAR(created_at, 'YYYY-MM-DD') FROM Accounts GROUP BY TO_CHAR(created_at, 'YYYY-MM-DD') ORDER BY TO_CHAR(created_at, 'YYYY-MM-DD');

into something much more natural like

> count by Day(Accounts.created_at)

2. A Visual SQL: for analytics it's so much faster to query and explore visually. Building queries visually means you don't make common typo or syntax or structure errors, joins happen smoothly, you can browse the data as you build, you don't need to google for syntax (what's that date function again?), and it works across dialects and databases. We've built and launched this a few months ago at Chartio https://chartio.com/blog/why-we-made-sql-visual-and-how-we-f...


I might be ignorant here ... but why not have disc backed datastructures and have some sort of ECS style interface that is like "normal" programming?

In pseudocode:

    newtype id = int
    names = dict<id,string>()
    balances = dict<id,int>()
    credit_scores = dict<id,float>()
    
    function broke_customers():
      return balances.filter(balance => balance < 0).keys()
    
    function exploitable_customers():
      bs = balances.filter(balance => balance < 0)
      cs = credit_score.filter(score => score > 100)
      return bs.intersect(cs).keys()
In the end, most queries are "just" set theory ... and having a very thin disc io layer allows to use the host language to process queried data on the fly.

It's very basic ... and does not address performant views, clustering, migration, etc ... but it's simple ... and does work well as demonstrated by the ECS systems in game development (which are an application of the concept).

(Sidenote: that's some messed up pseudocode ... I've been working with C#, Python and Haskell lately :-) )


Man, I'm so happy that DB hype is over and I can work with Postgres (which is ~35 years old) without anyone thinking it's the wrong choice.


The author seems to have overlooked the lindy effect behind SQL. SQL, like other ubiquitous technologies like Javascript and Excel hits the sweet spot of “satisficing” the user needs. Few things have the longevity of SQL in programming, and the fact that it’s battle-tested will always be a significant advantage over any new player.


I agree with the criticism of SQL here but to make yet another SQL but not SQL is not the way forward imo

I hope the writer reads http://www.learndatalogtoday.org/

In Clojure there are multiple databases that you can query by API, SQL and datalog


If by datalog with Clojure, you mean Cognitech's Datomic, yeah it's pretty friggin' cool. Really really liked it (well, the concept)... Until we started working out licensing and realized we were going down a rabbit hole of unclear future costs. They upped the cost on us in the middle of working on a project, their tooling was awful, they deprecated the REST API (which was a selling point for us they pushed) halfway through our efforts. We lost a lot of time, money (we paid for consulting work which was useless), and our product was delayed significantly. I know Stu will probably read this, but it's just the way it is. We also threw away Clojure then and never looked back.

Doh, that's not what this discussion was about. It's an old wound that won't heal. :-)


I've come to love SQL databases, and even writing raw SQL.

All the modern SQL databases are unbelievably powerful.


I sometimes don’t love sql especially when debugging a huge script of dynamic sql that was only partially scripted to generate the actual sql that is executed and there are between 1500 and 5000 lines to sort through. But you get used to it.


I think they make good points. This was written May 2019: how have they fared since then?


We're approaching the 1.0 release; latest blog post: https://edgedb.com/blog/edgedb-1-0-alpha-4-barnard-s-star


>>A language with good orthogonality is smaller, more consistent, and is easier to learn due to there being few exceptions from the overall set of rules (wikipedia btw)

table = (SELECT column | scalar expression FROM graph | table WHERE ..GROUP BY ... HAVING... ORDER BY...)

So in SQL, each scope is a table and that is the main primitive.

More metrics would be needed to criticize SQL orthogonality, instead of providing only one example of subqueries as scalar expressions, when they more generally produce tables.

Actually, SQL use the same query syntax for scalars and for tables and that could be seen as good orthogonality.


Very few people truly understand databases, and therefore very few people truly understand SQL. I would be suspicious of any “SQL replacement” that didn’t come from someone with many, many years in the field.


This comment is such an empty statement. The one plus one equals two kind. Are you implying something? Do you know if the authors have many or few years in the field? What does it mean to truly understand databases or SQL? To point out the vacuousness, let me rephrase this statement to come up with an equally profound message of my own:

Very few people truly understand CMS-es, and therefore very few people truly understand Wordpress. I would be suspicious of any “Wordpress replacement” that didn’t come from someone with many, many years in the field.


Good idea actually! One day one of these efforts will succeed.

As a front-end programmer for 7 years I feel I have a fine understanding about how a relational database and its queries can support my usecase. I understand the basics well enough to advice the backenders. Anyway, SQL or any Object oriented abstraction on top of it gives me migraine.

Let the critics criticize. Most people mistake pragmatism (SQL) for sound solutions anyway. I do feel there is also a need for graphical editors. Yet it is much better to build a graphical editor that compiles to something with comprehensible syntax.

Good luck


I love SQL, but I liked the analysis in this article. However, the EdgeQL language that was proposed looked horrible to me. It looks more like a general purpose programming language than a query language.


+1 for SQL.

I never understood the need to rebuild a SQL-like solution bc SQL seems like the right answer already?

Between inner joins and SPs, what else could you possibly need for data?


I mean sure, probably possible, but I don’t really have a problem with SQL. I’m pretty sure the query dialect the EdgeDB guys are talking about will die with the company...


I feel like he is using the work Orthogonal incorrectly, I once had to describe a complex relationship where there was a hierarchy of objects however instead of being a strictly up down relationship there was an up, down and right relationship. From the little chemistry I've learned I was familiar with phenol and its bonds or para, ortho, and meta. I defined the up object as para, the right object as ortho and the descending object as meta.


I think it's a mistake to design a language in 2020 with C-style curly bracket syntax.

It just looks like line noise compared to SQL, or whitespace significant languages.


SQL is also an unfortunate mix of relational calculus and relational algebra. The two can be shown to be equivalent.

While in calculus you declaratively describe the set of data you want and let the system figure out how to get it, in the algebra you describe how to construct the set you want.

With that in mind SELECT ... WHERE ... is calculus, UNION and JOIN, etc. are algebra.

I like SQL, though.


EdgeSQL is no better than the SQL out there, and is arguably much worse as it introduces yet another version of SQL. However, in the end EdgeSQL will only succeed if it has a better query optimiser than Postgres, SQL Server and Oracle. That requires a significant amount of investment and somehow I can't see this happening.


> However, in the end EdgeSQL will only succeed if it has a better query optimiser than Postgres, SQL Server and Oracle

EdgeQL is the query language, EdgeDB is the engine. The latter does query optimisation. In theory EdgeQL could become popular with another engine.


I'm a senior dev that up until recently managed to avoid having SQL in my knowledge stack as we've always used no-sql databases.

I knew the basics but I took a weekend to catch up on some more advance use cases and I can really resonate with this article. Unsurprisingly I came down to a conclusion that SQL is just a bad language, no matter how you look at it. It throws away every code flow standard in favor of their own nonsensical flavor. Where normal synchronous programs go top-to-bottom SQL is a complete spaghetti of flow and logic.

Just take a look at the most basic syntax: `SELECT person.name FROM person` The variable is defined at the end of the program which is just absolutely silly, what if the program is 100 lines long; do I need to start reading from the bottom? SQL must be the reason mouse scroll wheels were invented.

As an alternative take a look at view based systems: `for person in people: yield person.name` — isn't it infinitely more understandable and readable?

Unfortunately it seems SQL is here to stay as most people would rather work with this mess rather than invest some time to adopt something better.


Speaking of SQL alternatives, Datalog seems to be another language that leverages relational model. I love how straightforward and expressive Datalog is.

I wonder, why Datalog is not very popular for databases as a query language? Is it because of performance optimization? Could anyone provide some insights?


Fads will come and go but SQL has staying power. It will be around for another 50 years me thinks.


Stupid question, slightly off-topic but why every web app needs to be ready to scale and serve a billion users? I don't mean this in the pessimistic way of "be real you are not next Facebook". My question is what do you think all that money is for, one you are asking for from VCs? Facebook was not ready for a million users in the beginning. You need old stable and safe tech that will not crash your demo or provide horrible user experience to important first-generation users. Every time alpha breaks catastrophically is because somebody ambitiously tried something not well documented or understood. People understand MySQL. why would you add extra time and cost to your original first budget? to prove what? I understand it makes job less "job" and more fun but is it truly worth experimenting while you are fighting for survival? If I put all the dead startups in my cv it would be 50 pages long


It doesn't. Who claimed that?



Seems fitting, XKCD - Standards: https://xkcd.com/927/

The post has weird self faults in its complaints - lack of consistency, and poor system cohesion.

Lack of consistency isn't with the SQL standard, it's with the implementation (this is recognize this later in the post too). Like browsers and the HTML spec - everyone implements the standard SLIGHTLY but non-trivially differently.

poor system cohesion - Being able to integrate/inter-op with every other language literally means poor system cohesion. They're mutually exclusive ideas. This is is why ORMs exist, so there is strong cohesion with a given language.

I'm sad because there's so much work put in here by extremely smart people, but this is a tool looking for a problem. SQL has its shortcomings for sure, but simply replacing it with a slightly cleaner language is not the answer. The benefits do not outweigh the huge amounts of drawbacks that would be required to adopt something like this.

I don't know much about EdgeDB and hopefully there's a lot more benefit I'm not aware of, but purely on the post itself, too many drawbacks.


Judging by the comments here it seems lots of people suffer from Stockholm syndrome - me included. I'm so used to SQL that it's hard to imagine something better.

I wonder if we started from scratch today would we end up with something like SQL.


He didn't even show an example of a join in his EdgySQL.

This is a marketing article.


Ergonomics of a language I don't know is always worse than that of I do know.

As long as there's nothing valuable in the technology itself why would I bother changing things?


> As long as there's nothing valuable in the technology itself

You're assuming there's nothing valuable. I'd say there is. Whether that's enough to displace SQL I really doubt.


Maybe there is. I didn't try it. But it's written rather explicitly "OK, so we have highlighted the shortcomings of SQL. Why do they matter? It’s all about ergonomics." That's what drove me off.


> "...It’s all about ergonomics." That's what drove me off.

I don't understand. Their critique seems eminently reasonable, what else should it have been about that would have not driven you away? (thanks for the answer, upvoted, it lets me try to understand your position).


For instance, we choose Riak+LevelDB some 8 years ago as our data store. It doesn't support SQL but it's fast and easily scalable. And free. I'm not sure if we could achieve the same speed and fault tolerance with any SQL solution available at that point. So the deal is, you give up your time invested in learning SQL and get some advantages in return. This is reasonable.

But if it's only about language, no additional perks, then it just isn't worth investing your time.

If continue the parallel with programming languages, there are tons of C-killers nobody knows about. I think, roughly every third programmer in the world tried to write more ergonomical C at some point in their live. And there is Rust. Which is not about ergonomics really but about memory safety combined with comparable performance. No more buffer overflows, no more Heartbleed. People see what they are trading their time on Earth for and the language is getting traction because of that.

This is, of course, only my personal opinion but languages are just thin interfaces over technologies.


With respect, I think you're missing a lot. Sit with an expert SQL guy and you'll learn a great deal. Maybe not enough to change your mind but certainly enough to realise you're blocking out a really valuable technology.

> but languages are just thin interfaces over technologies

No. Not at all. SQL is derived from a mathematical basis. The principle came first, then the language and tech together developed to fulfil it.


But that doesn't contradict my point. The mathematical basis matters, sure, but the language on top of it (exactly because it derives from the math anyway) is mostly irrelevant.

I don't really care if I have to write SELECT or \forall or σ as long as they all have the same meaning. But I'd prefer to use SELECT because I already spent time learning how to work this way.

SQL as a language is fine. There is no real need to reinvent SQL unless you're going to reinvent the math behind it too.


2 points then;

1. Do you know SQL currently? Because your previous comments implied you don't

2. SQL is a bad implementation of the underlying relational model. And it's just bad in other ways. For one, it's bloody wordy:

  select * from people
but as SQL is an expression language, why not?

  people
From an answer of mine a few days ago (slightly modified here) to pick out differences:

  (
  select *
  from t1
  except
  select *  from t2
  )
  union 
  (
  select *
  from t2
  except
  select *
  from t1
  )
I'd prefer to write that crap as

(t1 \ t2) + (t2 \ t1)

or similar. If you write a lot of SQL, that matters.

SQL aggregates don't nest easily. Transactions are screwed up (https://news.ycombinator.com/item?id=23569513). SQL is poorly composable (https://news.ycombinator.com/item?id=23550420#23561301). For better or worse it has nulls therefore tristate logic. This causes errors! And more.

BTW. transactions don't work in the way you might think, atomically. Which is why MS added https://docs.microsoft.com/en-us/sql/t-sql/statements/set-xa... and I'm sure Pgres has similar. From the link:

"When SET XACT_ABORT is OFF, in some cases only the Transact-SQL statement that raised the error is rolled back and the transaction continues processing"

That peculiarity is from the standard BTW.

SQL is problematic. It could have been much better.


In my experience Apache Spark and PySpark is the most convenient way to make complex queries / to select, transform data


10x better is me telling my database what metrics and values that I want. No-code and probably not another language.


Oh man, have I ever heard this one before. No, we can't do better than SQL. SQL is powerful, and beautiful.


SQL would be the last thing that I give up for something else. It does a pretty good job for what it is used.


I love critiques of SQL about implementations of NULL. "NULL is so special that it's not equal to anything, not even itself!"

Like, duh. WTF should NULL be equal to?

Anytime I see people making this kind of argument about "doing better than SQL" I can immediately tell they are pretty much fucked in the head.

Good luck, edgedb peeps. You haven't got a clue.


> WTF should NULL be equal to?

In programming languages, NULL tends to be equal to itself. Why couldn't it be the same way in SQL?


No. That is not even close to being true. You're confusing None with NULL, like everyone else. Only a very few programming languages make this error.

NULL is undefined. It can't be equal or unequal to anything, including itself, for reasons that should be obvious.


> That is not even close to being true.

In C, NULL is equal to itself. In C++, nullptr is equal to itself. In Java, null is equal to itself. Same in C#. In JavaScript, null is equal to itself, and so is undefined.

> You're confusing None with NULL, like everyone else. Only a very few programming languages make this error.

Not so, as I've just shown.

SQL takes the philosophy that NULL isn't a value, but a marker for the absence of a value, and gives it special treatment so that it is not treated as equal to itself. Most programming languages do not take this approach, they instead treat null as a special value, special in that dereferencing it is disallowed, but it's still subject to the usual comparison rules (i.e. it's equal to itself).

Your contrasting of None again NULL isn't meaningful. They're just words. The semantics depend on the language.

> NULL is undefined

Depends on the language. In C it's defined as 0, roughly speaking. (Curiously the bit-pattern used to represent NULL is not required to be zero. [0])

As a curious aside, in C, the special float value NaN is not equal to itself.

[0] https://stackoverflow.com/a/9894047/


I am so completely unsurprised that people who don't understand the concept of undefined are telling me that it's actually defined.


Maybe if you introduce the concept of a mathematically-principled empty set, you can do away the unprincipled concept of NULL. Empty set equals empty set.


Null isn't empty set.


If SQL and GraphQL had a baby...


The claim can be made that: We can do better than ANYthing.

The counter is: Why hasn't anyone yet?


.Net Linq is a developer friendly alternative that has a way better syntax.


How do you even do Select col from table group by grp in EdgeQL?


Narrator: we cannot


Wouldn't it be easier to just propose Datalog?


only thing, I would use other than SQL for querying would be the way pandas handles queries or datalog. all this other stuff is doa


People have tried for decades. Go nuts.


SQL is the COBOL of relational data


I do not know what the standard data query language of 2120 will look like, but I do know it is going to be called SQL.


(2019)


No, you can’t.

SQL is easy. Data is hard.


No, you can’t. Another competing standard against SQL will die trying to be SQL.


sql is the language of the gods


> There are many more cases like these, and there is no consistency in a single SQL implementation, let alone across all implementations

So you decided to create yet another incompatible "SQL"?

https://xkcd.com/927/

And quoting MySQL's broken handling of a division by zero as a reference. Seriously?


>sql alternative #1427904 that absolutely nobody will ever use

Love it or hate it, sql is a standard and there is a ton of knowledge (stackoverflow answers, books, tutorials) and tooling (query builders, orms). This is either hopelessly naive, or hopelessly arrogant.


Good post.

> Swift, Rust, Kotlin, Go, just to name a few, are great examples in the advancement of engineer ergonomics and productivity.

golang is definitely not an advancement in engineering ergonomics. It can't be grouped with the other mentioned languages.


I shall name it...NoSQL.


Data integrity, speed , replication Pick two. Nosql gave up integrity. If you are fine with your bank account sometime missing your salary, great. Its not quite the same as twitter sometime missing a comment.


Why continue to use the keyword SELECT? It's just annoying.


What's the alternative? FETCH?


Why is anything needed at all? You don't need a keyword preceding an expression in other languages.


Ah, so a new challenger arrives. Good luck overcoming lock-in.


Please leave it alone. I'm sure we can do better than English as a conversational language, knowing more about how people use it, but I don't want to learn English 2.0, because I already know English.


If you want people to take your replacement language seriously, lead with an RFC that's been reviewed by industry peers. We all know the status quo sucks, but it's the status quo for a reason.


Some comments here, in my opinion, are missing the point. Saying "SQL is good enough, no need for EdgeQL" is like saying "C/C++/Java/... is good enough, no need for Rust/Kotlin/...".

EdgeQL is an explicit attempt to make not a new version of SQL, but a new language. It's about evolution, which is, again, explicitly noted. Saying "evolution is not needed" or "I don't want/need evolution" is strange to me. At the same time, giving constructive feedback is useful, as always.


People have a love/hate relation with SQL. I think most software engineers are OK with it, some even find it a curious case to tumble down the rabbit hole of multiple JOINs or inner SELECTs.

But for most people out there who simply want to poke at their business data, SQL, or this article's stated solution - EdgeQL are all pains in the wrong places. If you are making the life of engineers a bit easier, then you have to think of the millions of My/Pg SQL installations.

If you want to make the larger audiences' life easier (the business people who need insights), then you need to think outside of SQL altogether.


I was looking around yesterday to find alternatives to SQL and found nothing. Everything is in this ridiculous table-based model, the queries of which are complicated and error-prone.

It’s heresy to criticize SQL these days or even suggest that DBs could be easier and more robust. I envision a future DB language that offers perfect ease and safety with queries the way Rust has shown us that the memory unsafeness of C was voluntary all along.


Did you come across Datalog?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: