We Can Do Better Than SQL

tqi · on Aug 11, 2020

It's pretty arrogant to complain about the syntax being inconsistent across versions and databases and then present your own weird offshoot, as if every other version wasn't introduced for the exact same reason with the exact same lofty delusions of grandeur...

SQL is messy because describing the underlying data relationships are messy. The orthogonality example is a great illustration of this. What exactly should the result be if there are multiple dept heads? Should the result rows be duplicated? It's not clear how edgeQl would handle either (their edgQl orthogonality examples were constructed to only have result per sub query), but it seems like they would be kept together as sets. In that case, the result set is no longer a table, it's a dataframe, which is a useful data structure but is also not what relational databases do.

roenxi · on Aug 11, 2020

> SQL is messy because describing the underlying data relationships are messy.

> SELECT extract(day from timestamp '2001-02-16 20:38:40');

SQL is messy because all the syntax was decided on before there was a community that really understood what good syntax is. The 'from' in that extract does nothing and I can't easily identify if extract is a function or some sort of crazy parsing construct - what are the arguments? Is "day from timestamp '2001-02-16 20:38:40'" the argument? Are the arguments "day", "timestamp" and "'2001-02-16 20:38:40'"?

Are these annoyances crippling? Yes. Yes they are crippling. It should be possible for an amateur to quickly write an SQL validator as a starting project; relational algebra is not complicated. Any fool can write a validator for lisp. Relational algebra isn't that much more complicated - we don't have loops or flow control to contend with here.

Tidyverse's dplyr [0] implements the relational model for real dirty data and, as might be expected for something implemented this century, does a much better job than SQL. Not because the operations are that much different (although gather() & spread() are welcome additions) but through ingenious innovations like, as mentioned, functions having arguments instead of I-don't-even-know-what-that-is.

And the pipe operator which is legitimately ingenious. Great operator for data.

[0] https://dplyr.tidyverse.org/reference/index.html

sanderjd · on Aug 11, 2020

> It should be possible for an amateur to quickly write an SQL validator as a starting project

...why? SQL has been wildly paradigm-definingly useful for decades. It has driven hundreds of billions, perhaps trillions, of dollars of value. None of this hinges on the ability for an amateur to be able to write a validator for the language. It just seems like such a non-sequitur to me, such a strange thing to call out as a criticism.

SQL hasn't been wildly successful either because of or despite its syntax, it has been wildly successful because organizing data in the relational model and querying it declaratively is extremely powerful. The syntax is just not the interesting part. Any syntax that meets those criteria would do.

hospadar · on Aug 11, 2020

I second this emotion SO HARD. Is sql weird? sure. But (IMO) the hard part implementing|using a sql database is _not_ the syntax, it's the storage, query planning, consistency. Don't like SQL? use linq or one of the bazillion DSL libraries that "compile" to sql. SQL databases are so unbelievably extremely useful, and the minor syntax quirks between different DBs are a minor issue compared to the huge amount of commonality that they share.

Also FWIW, I'm not sure that writing a sql validator _is_ all that hard. The grammars for any dialect are readily available and really not so hard to turn into an AST with any of your favorite tools (ANTLR, lex/yacc/bison, any of a million PEG parser generators). In many cases you don't even need to, we do static analysis of spark SQL using their own parser, for postgres can use the parser straight out of libpg.

I guess the OPs point is that if you can validate it easily, you can understand it. I'm certain I could write a lisp parser in 10 minutes, but I don't think that would mean I immediately understand all Clojure code (which is often pretty obtuse). You still need to learn all the underlying concepts and data structures.

Certhas · on Aug 11, 2020

Your last paragraph is non sequitur. It is perfectly possible for the syntax to not be the interesting part, and for SQL to have been successful despite bad syntax. The underlying idea is brilliant, so the first syntax that exposed it sufficiently well got baked in as a path dependency, even though the syntax sucks.

sanderjd · on Aug 11, 2020

The point I was trying to make was just that the syntax was irrelevant; it was neither a major advantage nor hindrance. I think this is the same as your point.

Threeve303 · on Aug 11, 2020

One thing I have experienced is that often times developers are coming from a world of object oriented programming, or just coming from a JavaScript world.

Then often the storage and retrieval of data is an after thought during the development of an application. This is definitely true for CRUD apps. Then maybe the person used a framework to handle database interactions and now after a year or so, it’s not scaling well because the data access was not really given priority during development. So then the problem is the database engine or the syntax instead of proper design and planning.

That’s not to say there are no drawbacks to writing most dialects of SQL. I just believe most of the hate over SQL is coming from a procedural mindset to a relational one.

mufufu · on Aug 11, 2020

> It should be possible for an amateur to quickly write an SQL validator as a starting project

I think the point he is trying to make here is the same as the post was making concerning orthogonality. Having a smaller set of special syntax, and therefore an easier validator to write, means easier queries to write for the user. I don’t think he was implying that everyone who makes use of the language should know how to write a lexer for it.

> The syntax is just not the interesting part. Any syntax that meets those criteria would do.

I have to disagree with this. If this was the case we would still be programming everything in BASIC or C because they’re just another imperative programming language and it gets the job done. Having sugar syntax, a consistent language, etc.. all makes it easier for a programmer (or data analyst) to get the job he needs quicker (and therefore reduces cost), makes a program easier to maintain, and so on.

sanderjd · on Aug 11, 2020

The syntax is not the interesting part for programming languages either. Programming languages now have better semantics - things like runtimes, concurrency, data encapsulation, etc. - in addition to better syntax. SQL is not a programming language, its semantics boil down to the relational model. Don't get me wrong, there is room for improvement in the syntax, I have plenty of gripes with it, I just don't agree that it is a fundamental problem. It's already super useful as it is.

Risord · on Aug 11, 2020

Indeed semantics is what matter but I think for example lack of composability is semantic problem too.

<snarky> Ability to manage persistency and correctness of SQL engines is very usefull while language interface not so much. </snarky>

0fcf8d3559a64c · on Aug 11, 2020

>It has driven hundreds of billions, perhaps trillions, of dollars of value.

The same can (almost) be said for Javascript. A language with a lot of foibles can still be successful if it is the only realistic mechanism to interact with the system.

sanderjd · on Aug 12, 2020

No disagreement from me there! Though I personally like SQL more :)

mannykannot · on Aug 11, 2020

>> It should be possible for an amateur to quickly write an SQL validator as a starting project

> ...why?

I don't know about amateurs, but if the typical intended user of the language does not have a good grasp of what is and is not valid, they will often be reduced to trying one thing after another until they hit upon something that seems to work, with no real understanding of what it does. This is not how robust, correct software is written.

sanderjd · on Aug 11, 2020

That seems like a completely different point. (Unless I misunderstand what a "SQL validator" is, which is definitely possible.)

I do think this way of utilizing SQL is a problem, but I would lay the blame for it at the feet of a lamentably widespread anti-intellectualism in the field. It is true that the relational model is not trivial and must be learned before SQL makes much sense. But it is not too much to expect from someone who wants to be a practitioner in this field.

mannykannot · on Aug 11, 2020

I agree that someone who wants to be a practitioner in this field should understand the relational model, but the issue here is that SQL introduces inconsistencies and complications that are not inherent in the relational model, and, to some extent are avoidable. A justifiable emphasis on professionalism does not absolve SQL of these faults and does not render irrelevant attempts to find better language choices.

sanderjd · on Aug 12, 2020

Sure, I don't disagree. My point is that the perfect is the enemy of the good, and SQL is the good.

krick · on Aug 11, 2020

> all the syntax decided on before there was a community that really understood what good syntax is

Here you got me thinking: do we even understand that now? There was quite a bit of contemplating about of semantics, data types and different kinds of abstractions in the programming languages for the last couple of generations already, but the general consensus about the syntax is that "nice syntax is nice, but it isn't really that important". Most modern languages loosely follow either something C-like or Algol-like, most people like what they are familiar with. There are sometimes claims that a good syntax should be successfully parsed by some relatively simple parser (which is totally not obvious, TBH, because while it is clear why C++ is a good counter-example, we don't really have that many good examples: classic LR-parsers or something like that are really not that powerful, and most real-world programming languages implement something very non-generic of their own). Some people claim that "good syntax is no syntax" (Lisp), and some say that Haskell has a good syntax (ikr). The bottom line being that this is just a matter of taste, unlike most of what we can say about types and data structures.

So, to summarize: I never actually heard a compelling general theory of good syntax.

This is completely offtopic, BTW, I agree that SQL is trash and it is even kinda funny that somebody tries to defend it like that, since for a long time it was sort of a textbook example of "why a language created with the idea to be used by non-technical people is a failure from the very beginning".

kd5bjo · on Aug 11, 2020

> So, to summarize: I never actually heard a compelling general theory of good syntax.

I started to think along these lines when I got serious about learning a foreign language. Humans appear to have some innate language ability that’s reflected, among other things, in commonalities between disparate languages. As far as I can tell, there’s been no serious effort to design a computer language to take advantage of this.

My pet theory about why the object.method(args) call syntax won is that it really is the most natural to read, as evidenced by the fact that most human languages follow subject-verb-object word order.

feikname · on Aug 11, 2020

> My pet theory about why the object.method(args) call syntax won is that it really is the most natural to read, as evidenced by the fact that most human languages follow subject-verb-object word order.

Most western languages maybe, but according to wikipedia[1] the most common is actually SOV rather than SVO.

[1] https://en.wikipedia.org/wiki/Word_order#Distribution_of_wor...

indecisive_user · on Aug 11, 2020

Most common by number of languages maybe, but if we look at most common by number of people who speak the language, the top 3 most popular languages in the world(Mandarin, Spanish, English) are all SVO.

lifthrasiir · on Aug 11, 2020

> [...] as evidenced by the fact that most human languages follow subject-verb-object word order.

Nope. [1]

[1] https://wals.info/chapter/81 (Note that while this is the language count only, the same comment applies to the population using such word order as a native tongue: SOV and SVO are both common.)

g5becks · on Aug 11, 2020

Though I program in languages that most follow the obj.method(args) pattern, I really prefer the args |> function |> function pattern and I wish it was the norm in every programming language.

dahfizz · on Aug 11, 2020

In what language is that used? It reminds me of pipes in bash. I can see it being useful in circumstances where you have lots of function calls and fewer arguments (just like pipes), but I think it would look really ugly and hard to parse with a long arg list.

jdmichal · on Aug 11, 2020

It's a common functional thing. I know F# has both `|>` and `<|` to go both directions. It also has `||>` and `|||>` variants which take two- or three-value tuples and spread them to arguments.

https://docs.microsoft.com/en-us/dotnet/fsharp/language-refe...

rocketbop · on Aug 11, 2020

Elixir for one.

Yes it works best when you have one argument to pass, as the first argument is passed implicitly.

https://hexdocs.pm/elixir/Kernel.html#%7C%3E/2

Certhas · on Aug 11, 2020

Julia supports it, too:

https://docs.julialang.org/en/v1/base/base/#Base.:|%3E

phaedryx · on Aug 11, 2020

See also Clojure's threading macros: https://clojure.org/guides/threading_macros

int_19h · on Aug 12, 2020

These are two different use cases. In the first case, which method the call actually dispatches to depends on obj, so it kinda makes sense for it to be syntactically distinct from other args. The pipeline syntax is normally used with free-standing functions that are statically dispatched.

Now, yes, there are languages with multimethods, where the distinction is less obvious. For those that always dispatch on all arguments, a uniform syntax makes more sense. CLOS is a good example.

ukj · on Aug 11, 2020

The commonality you are referring to sounds like Chomsky's concept of Universal Grammar [1].

The object.method(args) "won" because it is a low barrier to entry from English to Programming indeed, but as you become more and more experienced and you shake off the imperative way of thinking in favour of declarative thinking functional languages becomes more expressive than imperative languages. They become better at managing complexity because they allow you to build higher and higher towers of abstraction.

And then homoiconicity [2] shines because of its 'universally grammatic' property.

1. https://en.wikipedia.org/wiki/Universal_grammar

2. https://en.wikipedia.org/wiki/Homoiconicity

kd5bjo · on Aug 11, 2020

I was thinking on a more basic level than UG; just the presence of linguistic universals [1]. I agree that functional languages are in a certain way more expressive than imperative ones, but that seems mostly orthogonal to the syntax of the language.

I’m thinking about basic stuff like improper nouns, denoting a clause’s role by inflection or adposition instead of word order, and having a clearly defined spoken representation- none of this is specific to which abstract structures the language is describing.

https://en.wikipedia.org/wiki/Linguistic_universal

yyyk · on Aug 11, 2020

>The object.method(args) "won" because it is a low barrier to entry from English to Programming indeed

BASIC and C didn't have a concept of objects* , which didn't seem to hinder adoption at all. object.method() became popular when OOP languages became popular, and that was because IDEs offered better autocompletion with that syntax.

* We could emulate objects with C in some gnarly ways. Almost no one did at the time, excluding C-with-classes/C++.

skocznymroczny · on Aug 11, 2020

It's not that simple, D has UFCS (Uniform Function Call Syntax), which makes object.method(args) and method(object, args) equivalent calls.

Also there's a non-language related advantage of dot notation. Type the name of the object, press dot and IDE will show you what things you can do with that object. You don't have that with other notations.

jasonpbecker · on Aug 11, 2020

That assumes a very OO mindset. method(object, args)/function(data, parameters) works better in other languages, and IDEs can handle this quite well without dot syntax.

Which goes to your point-- form follows function, in syntax as with many other things.

gvjddbnvdrbv · on Aug 11, 2020

Why is dot special? Fortran uses % (if I remember properly) IDE's can complete after a % as well as after a dot.

int_19h · on Aug 12, 2020

Dot is not special. Putting the receiver objects first is special, because it allows the IDE to only show verbs (methods) that pertain to that object. If you put the verb first, code completion lacks context, so it would necessarily have to show you all verbs that can apply to any object that could conceivably be acted on in that scope.

int_19h · on Aug 12, 2020

> SQL is messy because all the syntax was decided on before there was a community that really understood what good syntax is.

It's not that they didn't understand it. There are languages predating SQL that have better syntax.

But SQL was one of the so-called "4th generation programming languages", that were supposed to be operating at a much higher level. And because of that, there was the notion that, if they also have a more "natural" syntax, it would allow people who are not programmers to use them effectively. Hence why SQL was originally SEQUEL - Structured English Query Language. It was meant to be written by domain experts, not database experts.

Of course, that failed (like countless other similar attempts since then), and so now we're stuck with this syntax for no good reason whatsoever.

ideamotor · on Aug 11, 2020

Please use pivot_wider and pivot_longer instead of gather and spread. The true horror, even admitted by the the dplyr team ...

roenxi · on Aug 11, 2020

Oh, they're new. Great. Thanks for the tip.

klmr · on Aug 11, 2020

As a point of minor historic curiosity, the pipe operator in R actually predates dplyr and was invented independently multiple times by various people who noticed that the Unix model translates well to data transformations in R (F# also has a pipe but I'm not aware of anybody using F# as an inspiration for the pipe in R).

thom · on Aug 11, 2020

Except that dplyr is horribly slow and memory inefficient, and (if you were minded to actually put R code into production) it more or less forces you to do data prep in a database anyway.

goto11 · on Aug 11, 2020

> SQL is messy because describing the underlying data relationships are messy.

No, the relational model is beautiful and consistent! SQL is messy because the syntax is not consistent and elegantly composable. It could have those properties and still present the same underlying data relationships.

See Linq in C# as an example for how a more composable query syntax can expose the same data model.

For example in Linq you can chain arbitrary many select/join/where/group by in arbitrary order. In SQL you need nested subqueries to achieve the same which is a much more convoluted syntax.

gboss · on Aug 11, 2020

Slightly tangential but I really like Linq syntax. The only problem I have with it is that it is really hard to debug if there is a logic error. I often see developers write linq find out the record set they get back is incorrect and then break up the linq query to a nested if clause to get what they want.

merb · on Aug 11, 2020

I've seen so many errors around orderby/limit. which most people want at the end, but linq lets them define it, everywhere. (for a good reason) but this is probably the biggest problem with linq, it's a chain of operators that are ORDER dependant!!

int_19h · on Aug 12, 2020

That's a quality of tooling issue. And in Visual Studio, for example, you can set breakpoints on individual subexpressions in the query.

nightski · on Aug 11, 2020

Yeah I feel like that happens whenever you have a mapping from one language to another. Eventually you learn the intricacies of the transformation. There are very few situations where the Linq I write does not translate into the query I expect any more. But that took time to learn.

The only criticism I have of Linq is that it is a more general language targeting different back-ends including plain old objects. If Linq did not allow things that do not make sense in SQL this would be a little less jarring.

I also wish it supported full outer joins, unions, and window functions.

scarface74 · on Aug 12, 2020

I was amazed the first time I used the MongoLinq driver. It makes sense intuitively, but the idea that you can use the same Linq expression and it will be translated to either Sql or MongoQuery based on the provider you send it to was amazing.

I was able to hire a four or five contractors who never used Mongo just by requiring Entity Framework/Linq experience and less than 4 hours of training.

marcinzm · on Aug 11, 2020

>For example in Linq you can chain arbitrary many select/join/where/group by in arbitrary order. In SQL you need nested subqueries to achieve the same which is a much more convoluted syntax.

WITH statements alleviate some of this issue by allowing you to write subqueries in any order.

jimsparkman · on Aug 11, 2020

WITH (CTEs) make queries so much more readable and digestible. As a programmer who now does data and SQL, I latched on to these as soon as I found I could reduce repetition in a query with them.

meritt · on Aug 11, 2020

Make sure you understand what optimization fences are and how they affect your performance. CTEs are nice to read but routinely destroy the performance.

[1] https://thoughtbot.com/blog/advanced-postgres-performance-ti...

bjt · on Aug 11, 2020

As of Postgres 12 this has changed substantially. Instead of being materialized, CTEs are inlined and optimized with the rest of the query.

Exceptions:

1. If the results of the CTE are used more than once then it is materialized by default, though you can override this by adding "NOT MATERIALIZED" to the call.

2. Recursive and INSERT/UPDATE/DELETE CTEs are always materialized.

https://paquier.xyz/postgresql-2/postgres-12-with-materializ...

oarabbus_ · on Aug 11, 2020

>CTEs are nice to read but routinely destroy the performance.

This may be true on archaic versions of MySQL and Postgres but is not the case today, barring some esoteric edge cases (bugs) where the optimizer gets thrown out of whack. Once while doing data science consulting I rewrote a ~1000 line query in Aurora (MySQL flavored) which had a ~2.5s runtime, which was far too slow for the client's use-case.

After rewriting all the CTEs (there were many) into subqueries, there was a 2-3% increase in query speed, barely (on the order of under a tenth of a second). There was a very tiny improvement far smaller than the normal variance of the runtime.

Then I rebuilt the query and the joins, and was able to get the query to consistently run in the range of 0.8 - 1.2s. For my own purposes I then duplicated the query, re-implemented the CTEs, and did validate that indeed there is only a negligible increase in query time when using CTE.

3pt14159 · on Aug 11, 2020

For monster queries (thousands of lines of SQL) I find that temporary tables are also great. You can index them and, depending on your DB and its settings, they're usually held in RAM so they're super fast.

sumtechguy · on Aug 11, 2020

That is implementation specific. In some db's putting an index on a temp table does nothing. In later versions of the same db it does but only in particular cases. Make sure you read the docs around that.

achr2 · on Aug 11, 2020

CTEs do not get cached though, so they are actually quite bad for repetition without also using a temp table.

ComodoHacker · on Aug 12, 2020

Depends on DBMS and version. Some are smart enough to pipeline/tie or materialize it internally.

pmarreck · on Aug 11, 2020

Ecto.Query in the Elixir universe seems like it was inspired by linq

goto11 · on Aug 11, 2020

WITH's are great and definitely solves some of the issues with SQL. Some implementations unfortunately don't optimize them as subqueries, but that is not the fault of SQL.

prostodata · on Aug 11, 2020

> No, the relational model is beautiful and consistent! SQL is messy...

What is beautiful and consistent is the relational algebra. The relation model relies on this formalism to model data by making some rather strong assumptions about how tuples and relations represent things and how relational operations are used to process them. And these assumptions are precisely what propagates to SQL and what some authors (see references in the article) consider controversial and messy. Then the question is whether and how these controversies can be fixed (or whether they are bugs or features).

A radically different approach to fix these controversies is to introduce a different formalism and different data model (as opposed to fixing only syntax) which is based on using functions. In other words, instead of using sets and set operations, we use in addition functions and function operations [1]. Here you can find an implementation of this approach:

https://github.com/prostodata/prosto Prosto is a data processing toolkit radically changing how data is processed by using both sets and functions and being a major alternative to map-reduce, join-groupby and other set-oriented approaches

[1] Concept-oriented model: Modeling and processing data using functions: https://www.researchgate.net/publication/337336089_Concept-o... -> Read introduction (two pages) for why having only sets is not enough and why functions are important

achr2 · on Aug 11, 2020

I disagree, the relational model is very incongruous with the object model that nearly all LOB applications use. Which is where 99% of the usage issues lie.

wtetzner · on Aug 11, 2020

That doesn't make the relation model bad, it just means there's an impedance mismatch.

FWIW, I find the relational model to be much nicer than typical object models. Even in Java, which I write every day at work, I've found it to be much clearer to use immutable objects representing records. Not just for interacting with the database, but just for handling information in general.

Of course, it would be extremely painful to do without Lombok's @Value, @Builder, and @With annotations.

achr2 · on Aug 11, 2020

I agree that the mismatch is not a direct failing of SQL or the relational model - but we continue to use it without adopting a better approach or developing a common abstraction layer. With all of the syntax growth, SQL could certainly have a join syntax that understands object composition and returns structured data.

wtetzner · on Aug 13, 2020

> With all of the syntax growth, SQL could certainly have a join syntax that understands object composition and returns structured data.

I mean, many RDBMS's have support for JSON and/or XML, so you can sort of get it.

However, I wonder if we'd be better of as an industry if we dealt with our data in code in a more relation-y way than an object-y way.

nine_k · on Aug 11, 2020

It's the problem with the object model. Yes, I think ORMs are an anti-pattern, ActiveRecord-style especially.

scarface74 · on Aug 12, 2020

And then you end up inventing your own ORM and “helpers” because eventually the results end up in objects.

I do hate most ORMs. The only oneS I like are Linq based ORMs. The “ORM” is actually well integrated into the language.

You might as well use something like Dapper.

But yes agreed about ActiveRecord.

mannykannot · on Aug 11, 2020

You have piqued my curiosity, but what would a linq solution to this exercise look like?

int_19h · on Aug 12, 2020

Which particular exercise?

mannykannot · on Aug 12, 2020

The multiple dept. heads case.

013a · on Aug 11, 2020

A very simple, basic SQL query would be something like "select * from users where foo=bar;"

Already, we're introducing a weird inversion of syntax that, in my experience, trips up people learning it: data in SQL is stored as "rows" with "columns" inside "tables". More formally, we've got a hierarchical relationship where Tables > Rows > Columns, yet we write the query as Columns > Table > Rows.

There are far more consistent and beautiful querying languages than SQL: I would point to MongoDB's query language, which is less of a query language and more of a static javascript-interpretable library, but is still far easier to learn and more consistent than SQL. The same query in MongoDB: "db.users.find({ foo: "bar" });". How is this better? It embeds the operation in the statement ("find"); reading it hierarchically follows how the data is stored (Collection > Rows); the filtering operation is the same shape as the data being stored; and it naturally disallows most injection attacks.

guiriduro · on Aug 11, 2020

I always thought a SQL query has a very sensible layout, I was never confused as to which I was doing or in what order. For me a query can be simplified to: select <projection> <selection>

njharman · on Aug 11, 2020

> db.users.find({ foo: "bar" })

I don't know MongoDB query language, but gah! that looks horrible. It uses three different syntaxes; dot notation, curlies/brackets and colon key value. Full of punctuation and doesn't read like english.

There is no distinction between noun "users" and verb "find". There's extraneous "db". does foo: "bar" mean equal or is it find() that determines the operator, maybe combo of both? how do I do other operations.

.Only if you are familiar with programing language that has that same syntax does any of it make sense. Otoh even educated non-programmers are gonna be able to read the SQL as SELECT "these things" FROM "this table" WHERE "these conditions are true".

doubletgl · on Aug 11, 2020

> Only if you are familiar with programing language that has that same syntax does any of it make sense.

I'd argue that most relational DB users are familiar with a programming language, and therefore most likely familiar with the C-style syntax. It's better to build on something that most of the potentials users are familiar with already.

> There is no distinction between noun "users" and verb "find".

There is no such distinction in natural language either (if you see words purely as sequences of characters), you have to know what is what and infer it from the context.

> even educated non-programmers are gonna be able to read the SQL as SELECT "these things" FROM "this table" WHERE "these conditions are true".

Yeah SQL looks a bit more like natural language at first glance, but that's about it. That familiarity is a false friend, it doesn't really help with the learning curve.

This kind of thinking reminds me of the ruby community trend a decade ago when DSLs were created to look beautiful and like written language. It's useless and confusing for long-term, practical purposes. Same with BDD style testing languages. The promise that non-technical people will feel right at home and can start contributing rarely lives up to reality.

fomine3 · on Aug 14, 2020

> It uses three different syntaxes; dot notation, curlies/brackets and colon key value.

Is it an issue? using symbols makes it easier to understand syntax.

> There is no distinction between noun "users" and verb "find".

Weird criticism from SQL. It can be distinguish by call with parenthesis.

IMO SQL keyword is harder to recognize.

013a · on Aug 11, 2020

Why would you want your query language to read like English? Most people don't speak English.

jolux · on Aug 11, 2020

That doesn’t prevent injection, and the solution to injection attacks is using parameterized queries and prepared statements, not switching to MongoDB. Plus ORMs (really query builders) already provide behavior like this against SQL databases anyway.

013a · on Aug 11, 2020

That's fair, but I didn't say it prevents injection: I said it prevents most injection attacks.

MongoDB is absolutely still capable of being vulnerable to injection; its just harder, because it requires the client to provide an object which is parsed by your application with no data validation. In other words, SQL is vulnerable to injection by-default, because everything is a string, while you have to opt-in to being vulnerable with MongoDB, by writing your application to parse user input with no schema.

In reality, do applications do this? Hell yeah. Wire up a basic Express API, have it auto-parse any JSON its given, pass it straight to mongo, you'll be vulnerable. But, a backend which has any kind of type safety or API schema or GraphQL or something like that will be safer on mongodb than one with all that, on a SQL database with no ORM or parameterized queries or prepared statements.

jolux · on Aug 12, 2020

The difference as minimal. If you know the first thing about what you're doing with SQL you will use prepared statements. It's not some sort of arcane feature that nobody understands.

throwaway894345 · on Aug 11, 2020

And those ORMs have to deal with the SQL composability issues as well, often to the effect of dramatically poorer performance.

jolux · on Aug 11, 2020

You’re right, using a query layer to access an SQL database is still using an SQL database. You need to know what you’re doing with your queries. The important part is not the object mapping and model tracking part, it’s the part that allows you to build typed queries in the native language. Diesel for Rust is a great example, as is Ecto in Elixir.

romanoderoma · on Aug 11, 2020

I think Elixir ECTO does a very good job at that

https://hexdocs.pm/ecto/Ecto.Query.html#module-composition

lostjohnny · on Aug 11, 2020

> I would point to MongoDB's query language

seriuosly?

    db.orders.aggregate([
       {
          $lookup:
             {
               from: "warehouses",
               let: { order_item: "$item", order_qty: "$ordered" },
               pipeline: [
                  { $match:
                     { $expr:
                        { $and:
                           [
                             { $eq: [ "$stock_item",  "$$order_item" ] },
                             { $gte: [ "$instock", "$$order_qty" ] }
                           ]
                        }
                     }
                  },
                  { $project: { stock_item: 0, _id: 0 } }
               ],
               as: "stockdata"
             }
        }
    ])

VS

    SELECT *, stockdata
    FROM orders
    WHERE stockdata IN (SELECT warehouse, instock
                        FROM warehouses
                        WHERE stock_item= orders.item
                        AND instock >= orders.ordered );

petepete · on Aug 11, 2020

Another huge win for SQL is that it's easy to construct from parts. You can very easily run and debug your subquery or common table expression on its own before combining it into a larger, more-complex query. If you (as I usually do) create plenty of views while analysing a dataset, the approach can be extremely powerful.

Doing the same in JavaScript is possible, but it's slow and cumbersome by comparison.

013a · on Aug 11, 2020

The original article addresses the deficiency in construction from parts, in its "Lack of Orthogonality" section.

MongoDB queries, while being interpretable by javascript, aren't really javascript. You can't interact with the data using javascript (well, you can, using eval, but you shouldn't). You interact with the data via the query language, which is, again, expressed in JS, just like SQL is expressed in English.

It's more accurate to consider the Aggregation Pipeline as being the "composable" system to get at data in MongoDB. And its exceedingly composable; far more than SQL. It's literally a pipeline; a series of steps which fetch, mutate, filter, map, limit, calculate, correlate, relate, and otherwise interact with the data in a database. Each step operates on the output of the previous step, in series. You can programmatically swap steps in-and-out, in production, with no string manipulation or ORM, debug each step in series, remove steps, see the output, get performance characteristics on each step. There's no complex black-boxed query execution planner or compiler, because the query plan is the pipeline.

lostjohnny · on Aug 13, 2020

> with no string manipulation or ORM

OR, another way to look at it, using MongoDB's ORM, which is quite bad if you ask me.

013a · on Aug 11, 2020

If you try to directly translate SQL to MongoDB, without changing your schemas or query, then yes, it's gonna look bad. That doesn't make for a good comparison, and I think you know that.

lostjohnny · on Aug 13, 2020

> That doesn't make for a good comparison, and I think you know that.

That query is not my invention, it comes directly from MongoDB documentation

If it looks bad, it means it is bad by design

https://docs.mongodb.com/manual/reference/operator/aggregati...

fortran77 · on Aug 11, 2020

MongoDB is better because it's Web Scale!

http://www.mongodb-is-web-scale.com/

Beefin · on Aug 11, 2020

you're comparing $lookup, an operator that nosql isn't designed for to a join, an operator sql was designed for

013a · on Aug 11, 2020

Lookups are very common in MongoDB; Starting with SQL, lifting the data as-is into Mongo, and translating the queries 1:1 will just result in garbage queries, like that one.

An "idealized" NoSQL schema is far, far more complex than anything anyone used to SQL would arrive at ([1]), but most of that is because in a "pure" NoSQL/Document-oriented database, the query engine simply isn't that powerful (think Dynamo). MongoDB has an inordinately powerful array of tools to get at data in a performant way, and $lookup is available as one of those tools. Can it be misused? Yeah; just look at the parent comment to see clear misuse. But generally, it's very common to see.

Modern thinking around MongoDB schema design is closer to SQL than NoSQL/Dynamo. Arrays are bad, denormalization can be valuable but use sparingly, that kind of stuff.

[1] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

Beefin · on Aug 11, 2020

I wouldn't say arrays are bad, and the entire paradigm of data modeling in mongodb is to store your data based on your application usage patterns. if you have to query across multiple collections via a $lookup, then maybe you'd benefit from embedding the smaller of those collections into the former.

013a · on Aug 11, 2020

Maybe. But, as a general rule, I advise against arrays of unbounded size on documents (arrays of a known bounded size, say, containing enums to act as a multi-value flag, or email addresses on a user's account, or something like that, are fine).

One example: We used mongodb to track the state of a general CSV import system. So, we'd have a document for each csv file a user imported, and on that document, we were storing errors which occurred during the import to later display to the user. Of course, in an array. Worked great for years, until one day, a user uploaded a very bad CSV, non-maliciously, with hundreds of thousands of lines, with dozens of errors on each line, generating an array millions of items large. The failure condition here was wild: the import just got slower, and slower, and slower, until eventually the (modestly provisioned) db cluster started failing. We immediately normalized that array into its own collection, re-ran the import, still generated millions of errors, but with no problem.

As always, a general rule doesn't apply in every situation, but in my experience, unless you have a really strong grasp on how a system's use will scale, years into the future, unbounded arrays are icky. Lookups across two collections are only a modest performance loss over a direct query, and if the arrays get up there in size, they can actually be faster.

emptysea · on Aug 12, 2020

Interesting, we ran into a very similar thing at my current place where we have a CSV importer storing results in mongo. Importantly it also stores the errors that occurred during the import.

In our case someone uploaded a _huge_ CSV with some misalignment in the columns so every row had an error.

The resulting the mongo document was larger the max document size (16MB?), so it couldn't even save to the database.

Mongo is painful to work with and I feel like I keep finding more reasons to hate it.

sfvisser · on Aug 11, 2020

> SQL is messy because describing the underlying data relationships are messy.

Not sure that is true. You can imagine having a smaller query language with clean semantics able to capture messy data relationships. Small functional expression languages come to mind. Where it lies on the spectrum from purely-table-form to can-hold-everything is a design choice.

Your query language doesn’t need to capture the underlying data model exactly. SQL is an example of this itself.

I’ve run into the orthogonality issue myself several times. You don’t need to drop SQL today, but for some data domains more expressive query models can work really well.

Areading314 · on Aug 11, 2020

Right, it's like criticizing python, or English, for being inconsistent, or "large". Turns out that doesn't matter -- what matters is that the language is useful because it has a wide base of users and libraries, just like SQL does.

m000 · on Aug 11, 2020

Python and English are meant to be general purpose languages, so they are kind of expected to be large and occasionally inconsistent. SQL is (by definition) a domain-specific language which has grown out of proportions.

Are there really any SQL libraries in the traditional sense (i.e. reusable/composable SQL code with a well specified API)? SQL "libraries" typically focus on hiding the inconsistencies and the abhorrent syntax under the carpet.

And the user base is there, mostly because of the database engine properties and features. The query language itself is just a bad side-effect. And TFA does make a point that NoSQL abandoning the underlying RDBMS model is in fact a regression.

bmh100 · on Aug 11, 2020

SQL is designed around databases. Python is designed around objects. Clojure is designed around expressions. A table is analogous to a list of objects. None of these languages is more domain-specific than any other.

Areading314 · on Aug 11, 2020

Perhaps libraries doesn't really apply to SQL per se, but instead you have tooling, ecosystem, DB engines, ORMs, extensions etc, that all speak SQL and would be hard to do without.

asdfasgasdgasdg · on Aug 11, 2020

SQL is messy because describing the underlying data relationships are messy.

The article presents several examples where SQL's messiness cannot be plausibly attributed to underlying data relationship messiness.

wtetzner · on Aug 11, 2020

I think the real problem is that SQL text is the only interface to most relational databases. There is a SQL standard, of course, but it's difficult to get vendors to implement the standard, because SQL is the user interface to the database.

Instead, perhaps a second format should be standardized, which is machine readable/writable, that more or less represents relational algebra + whatever extra features SQL supports. It can be clunky and verbose, as long as it's straight-forward and easily composable. It would effectively be like a compiler intermediate representation.

Once you have something like that, you can have as many front-ends as you want with whatever syntax you prefer.

Of course, getting all of the RDBMS vendors to agree to it is still a problem, and they'll probably all still include their own vendor-specific extensions and differences, because they want to lock you in to their system.

But at least maybe the open source ones could agree, which would still be quite beneficial.

IshKebab · on Aug 11, 2020

I don't see how that makes them arrogant. SQL's implementation inconsistencies are a real problem. Their query language doesn't claim to be another version of SQL - it's clearly very different.

Also that is only one of the things they are hoping to fix. It sounds like you're saying "We have 5 variants of SQL already so nobody is allowed to write any other database query languages ever. We must use SQL forever." which is stupid.

gwbas1c · on Aug 11, 2020

In a world where there's a new programming language every other month, what's wrong with suggesting a new query language?

At least we don't have to change our whole data model or give up consistent data to try it!

joshsyn · on Aug 11, 2020

Most relationships, I have encountered are just one-many, many-one, many-mane, one-one, indirect, direct or graphical in nature.

You want flat result, aggregated result. Why does this have to be blamed on the data relationship and not SQL language itself?

For any graphical type of relationships, I find SQL utterly hard to express my queries properly. It feels like a assembly language at that point.

xiaodai · on Aug 11, 2020

yeah the edgeql syntax just looks messy and the doc doesn't even explain how to do a group by

lowwave · on Aug 12, 2020

It is very common to hear from new developers who are not familar with declaritve programming to complain about SQL. They don't tend to see the SQL syntqax fit into their 'programming' paradim. It is very simple, SQL is magic box, you ask it what you want, and it gave it to you. It is around for so long for a reason.

quickthrower2 · on Aug 11, 2020

I was hoping for something more left field myself. If SQL is based on tables, what about a QL based on relations only. Columns of data that are related, the "TABLE" implementation detail doesn't need to factor into it.

cmrdporcupine · on Aug 11, 2020

I think you might be misunderstanding what a relation is in the relational algebra or relational db. It's not a relationship. A relation is nothing more than a set of sets. Tables are relations, and views are relations. The results of queries are also relations.

This is a common misunderstanding of what the relational in "relational database" means. It's not (primarily) about relationships, except insofar that relationships can be described using relations and queried using relational algebra. But the key concept is that of the mathematical relation, as per Wikipedia:

"In mathematics, an n-ary relation on n sets, is any subset of Cartesian product of the n sets (i.e., a collection of n-tuples)"

So the goal of relational db languages is to provide tools for querying these sets of sets. Joins, Unions, Selections (Restrictions), and Projections are some of the tools that are provided. SQL provides some of these, though often calls them confusing names.

For example, in SQL the "select" keyword begins the query statement, but selection proper (as per relational algebra, also called restriction) is actually what is expressed in the "where" clause. What comes after "select" in your query is technically "projection" (choosing which tuples/columns exist in the output). Many ORMs and similar tools get this messed up, because the authors of them know SQL, but not the relational algebra on which it is nominally based.

jeltz · on Aug 11, 2020

Isnvt that true for SQL? SQL treats everything as a relation and does not make any difference between a table, a view, a set returning function, a sub query or a CTE.

dragonwriter · on Aug 11, 2020

> If SQL is based on tables, what about a QL based on relations only.

Aside from DDL (which I'm not sure how you would do without distinguishing between base relvars [tables], derived relvars [views], and other classes of relations), SQL doesn't particularly treat different kinds of relation differently beyond what is minimally necessary (you can't write, via insert or update, to a relation that isn't a relvar, for instance.)

iddan · on Aug 11, 2020

This is what we are trying to do with Cayley https://github.com/cayleygraph/cayley

woile · on Aug 11, 2020

The website needs a link to the documentation, I just couldn't find it :/

rplst8 · on Aug 11, 2020

> If SQL is based on tables, what about a QL based on relations only.

Try Sparql.

udp · on Aug 11, 2020

Came looking for the reference, and found it - but there are fewer and fewer mentions of SPARQL with each of these HN discussions. I think people just think "oh some semantic web crap" and move on without realising how powerful RDF etc. really can be when used as simple tools, without the 90s hype of changing the world with linked data.

At least the old ideas are resurging in the form of Neo4j (essentially a triplestore) and Cypher (essentially SPARQL).

tasubotadas · on Aug 11, 2020

Gremlin - https://en.wikipedia.org/wiki/Gremlin_(query_language)

Cypher - https://en.wikipedia.org/wiki/Cypher_(query_language)

p4bl0 · on Aug 11, 2020

Are you thinking of Datalog?

oarabbus_ · on Aug 11, 2020

>In that case, the result set is no longer a table, it's a dataframe, which is a useful data structure but is also not what relational databases do.

What do you mean here? The difference between a table and a dataframe is that a dataframe is a construct held in memory, while a table is persisted storage written to a database.

unityByFreedom · on Aug 11, 2020

I completely agree with both the title of this post and your comment. The solution is in diagrammatic "languages"/tools like Airflow, not more query languages.

Selecting columns before tables always felt weird to me. Doesn't it make more sense if you had a graphical view this way? (imagine boxes around the below items where you can drag lines to make connections between tables/inputs)

    USERS
           --user_id-- [data processing] => ...
    GROUPS

In SQL that would be,

SELECT * ,groups.* from users INNER JOIN groups ON users.user_id = groups.user_id

SQL is so messy and full of details that can be much better represented via dataflow diagrams.

tsimionescu · on Aug 11, 2020

Given our source control tools' limitations, I will take a textual format over graphical representation any day.

Either way, queries need to be composed at runtime a lot of the time, so for many uses of SQL you can't just have the queries as blobs that you prepare ahead of time in some other tool - they must be objects you can work with programmatically.

cafard · on Aug 11, 2020

Somebody at work just stepped a Microsoft Dynamics "integration". The network admin is trying to recover the integration database from a snapshot. An integration maps data from external files into accounting structures: it uses a GUI, with names that make sense to accountants. And the database that holds it is an Access database with the integration details in a BLOB. I'd pay money for a tool to save these out to a textual representation we could put under source control.

unityByFreedom · on Aug 11, 2020

There's no reason a diagram can't be in source control. You just need a tool that can convert the diagram to code and the code to a diagram. You probably also want to store the diagram's layout in textual form too so it can be converted back to the original human-created layout.

Then you can use git and optionally build custom tools like a visual diff of diagram v1 vs v2.

tsimionescu · on Aug 11, 2020

The diff tool is the key here. Without that, there is no meaningful version control.

unityByFreedom · on Aug 11, 2020

With a visual representation, diffing may be as easy as clicking next/previous between versions as you would images in an album. You could of course add more bells and whistles with highlighting.

tsimionescu · on Aug 11, 2020

Sorry, I should have been more specific. When I say diffing tool, I mean one that can do what diff does - help Git do automatic merges, and help me do manual merges when auto merge finds conflicts. Two pictures side by side can do neither.

unityByFreedom · on Aug 11, 2020

I don't see why it couldn't be built. Git merge doesn't always do all the work for you, you still need to evaluate the changes.

83457 · on Aug 11, 2020

There are graphical query generators in at least Microsoft's management tools. Most devs use straight SQL as far as I know because once you get into more complex queries the graphical interface becomes a hassle.

unityByFreedom · on Aug 11, 2020

> There are graphical query generators in at least Microsoft's management tools

The dev still needs to type the code first, which does not address any problems with SQL.

> Most devs use straight SQL as far as I know because once you get into more complex queries the graphical interface becomes a hassle.

Those interfaces may be limited, or the devs may not have enough experience writing "code" via a higher-level graphical interface. Going from writing SQL to creating dataflow diagrams may be a bit like switching from imperative to functional programming.

cmrdporcupine · on Aug 11, 2020

"Selection" in the relational algebra is actually everything that comes after the WHERE clause. SQL confuses people here because "SELECT" seems to imply "select these columns" when in fact that is technically called "projection."

The "Selection" is the set of predicates that restrict the resulting relation. Projection is choosing which tuples ("columns") to use in it.

The relational algebra has no "tables", this is, again, a SQL thing. It has relations (sets of sets) and operations on them. In SQL "tables" are one kind of relation, and views are another.

josefrichter · on Aug 11, 2020

the first paragraph of your response was unnecessary, or perhaps could have been worded more constructively

enjo · on Aug 11, 2020

XKCD’s comic on competing standards feels appropriate here:

https://xkcd.com/927/

CodesInChaos · on Aug 11, 2020

I don't think treating query languages as a standard is the right comparison here. I think it's great that programming languages have evolved from C or PHP quality languages. I view SQL like PHP, I can write it if I have to, but it's certainly not a language I enjoy working with. EdgeQL has its flaws, but it still looks like a big improvement over SQL to me.

WesleyJohnson · on Aug 11, 2020

I must be in the minority. I enjoy writing SQL and figuring out clever ways to construct queries to get what I need out of the data. Conversely, I am not a fan of PHP at all, but as you said, will write it if I have to.

EdgeQL looks... interesting, but I need to see more to decide for sure. Would be cool if they built out a translator, so you could pass in SQL and get back EdgeQL.

CodesInChaos · on Aug 11, 2020

It's a translator, but in the opposite direction. EdgeDB is a postgres frontend that translates EdgeQL and its schema definition language to SQL which it then sends to postgres. (It can also translate GraphQL to EdgeQL to SQL)

I don't think translating from SQL to take a look at the EdgeQL it produces makes much sense, since it'd result in very unidiomatic queries. SQL prefers joins and flat rows, while EdgeQL is based on following links and has great support for nested data output.

agumonkey · on Aug 11, 2020

is xkcd ever not appropriate ? there's probably an xkcd about it

kybernetikos · on Aug 11, 2020

Not a munroe original, but I like this one https://thomaspark.co/2017/01/relevant-xkcd/

agumonkey · on Aug 11, 2020

oh I thought he did that :)

zadkey · on Aug 11, 2020

https://xkcd.com/927/

bryanrasmussen · on Aug 11, 2020

>It's pretty arrogant to complain about the syntax being inconsistent across versions and databases and then present your own weird offshoot, as if every other version wasn't introduced for the exact same reason with the exact same lofty delusions of grandeur...

Obligatory XKCD https://xkcd.com/927/

bjo590 · on Aug 11, 2020

Am I the only full stack dev that likes SQL?

SQL is an incredibly expressive and flexible way to read, store, and update data. It's ubiquitous, so the SQL skills I learned six jobs and three industries ago are still relevant and useful to me today. Relational Databases and SQL are heavy lifters that I often relay upon to build projects and get things done.

mathgladiator · on Aug 11, 2020

No, I'm with you and prefer SQL for many tasks.

SQL got a bad rap in many ways due to security issues, databases in general, and "web-scale".

SQL as a language within other languages is a nightmare from a security standpoint, and if language integrated query was more common across languages earlier on then this wouldn't have been an issue.

Databases generally depend on normalization, but normalization comes with interesting scaling problems and how do you replicate normalized schemas. Thus denormalization became a thing, and then the emergence of NoSQL and document stores started to infect everywhere. The JOIN was a killer too, and then the discipline required to do sharding made it annoying to manage, so easier to manage solutions became a thing.

I'm looking at databases in a different light these days with more appreciation, but now the hot new thing is GraphQL makes things... interesting. I don't view GraphQL as a server-side solution, but a client solution to overcome the limits of HTTP/1.1. However GraphQL clients are exceptionally complicated, and I'm not sure they are worth it. The only problem is that to overcome them requires engineers "to know how to do things", but that is a hostile stance. People want to go fast and make progress, and GraphQL enables that.

wmichelin · on Aug 11, 2020

GraphQL and SQL are not mutually exclusive at all.

GraphQL is, in no way, faster than any REST alternative in terms of implementation speed. If anything, it is slower, as you need to be extremely methodical with your API changes, as (same with REST I suppose) deprecating fields / entities, for mobile clients specifically, is a PITA unless your clients have really nicely built out forced upgrades.

What GraphQL _does_ give you, is type safety and extreme client flexibility . It is a better solution than REST in almost every scenario, other than the initial learning curve, which takes a couple weeks and then you know it forever.

Would I recommend some startup write a GraphQL API for their MVP? No, just get something working. Are you at a more medium-sized company looking to build out your much more permanent API? Then yes, you should probably strongly consider GraphQL.

Draiken · on Aug 11, 2020

Too bad nobody thinks whether or not they need the flexibility in the first place. MVPs with single clients using GraphQL for flexibility that is not needed or used are common.

GraphQL is a hammer, and now every project is a nail.

Anecdotal, of course, but the first thing all FE developers I've ever worked with do when they start/join a project is add/suggest GraphQL/Apollo.

Nobody considers how awful things look on the back-end when you need to cache or make magic happen to avoid thousands of N+1 queries.

I believe unfortunately it has become the only way many front-end developers learn to interact with any back-end, and now everyone's forced to use it regardless of its drawbacks.

Same with React. Facebook managed to get free training for all of their future hires. I hate the company but that was a genius move.

watwut · on Aug 11, 2020

I have seen startup using GraphQL for mvp, precisedly because they could change and crank ui fast. They already knew GraphQL, I did not when I joined and learned to modify already existing one in a day or so.

virtue3 · on Aug 11, 2020

MVP? Just use your apollo/GraphQL server as the node monolith and be done with it.

unodgs · on Aug 11, 2020

Recently I tried using SQL directly on the frontend: https://medium.com/@unodgs/sql-on-the-frontend-react-postgre... as an alternative to GraphQL which I also find too complicated. You might find this interesting.

simonw · on Aug 11, 2020

I've been experimenting with SQL as an API language - including client-side SQL constructed in JavaScript - fir a couple of years with my Datasette project.

I'm using similar security tricks to you: read-only queries with a time limit, against SQLite rather than PostgreSQL.

More here: https://simonwillison.net/2018/Oct/4/datasette-ideas/ and https://github.com/simonw/datasette

unodgs · on Aug 11, 2020

Great to see that! Did it work well for you?

shean_massey · on Aug 11, 2020

It took me a solid month before graphql finally "clicked" but there's no going back now, I absolutely prefer it over rest now. 10/10 would do again and do suggest everybody give it a try.

devxpy · on Aug 11, 2020

This makes a ton of sense to me, thanks for sharing your learnings.

I see very little value in using GraphQL, when you can just write SQL on the client!

We desperately need frameworks to better facilitate this, like Hasura or Django -- DB migrations, permissions, authentication, real-time subscriptions, Admin UI.

My most wanted feature is SQL type providers, like Rezoom.SQL - https://github.com/rspeele/Rezoom.SQL

brynjolf · on Aug 11, 2020

What are examples of those "easier things"?

mathgladiator · on Aug 12, 2020

MongoDB could be an example, but it introduces yet another empire.

Redis is also easier.

The key is what are you designing against. If you design against a DB, then you may find that scaling beyond a single host with gotchas. But, if you have the discipline to keep everything within a document, then you can scale up easier as the relationships between documents is more relaxed.

However, cross document indexing and what-not creates more problems, and that in and of itself is an interesting challenge.

Zamicol · on Aug 11, 2020

What is a good "web scale" solution?

falcolas · on Aug 11, 2020

Given that I’ve seen 25TB MySQL databases, RDBS scale up juuuuust fine. Just... have someone on hand who understands them if you’re going to go that high.

nl · on Aug 11, 2020

These days, SQL.

It's true that in the 2000-2010 period many SQL implementations struggled to scale with growth in websites (many other parts of the webstack did too).

mathgladiator · on Aug 12, 2020

These days, anything.

The question is what is it going to cost in either licensing solutions or engineering effort.

smabie · on Aug 11, 2020

Do you love SQL, or do you love relational algebra? Because actual SQL, the language, is pretty shitty.

Perhaps the best querying language I've ever used is Q-SQL, integrated into kdb+/q. Unlike SQL, it's actually part of the language (q/k) and, most importantly, it's modular and more expressive than SQL.

If you're interested in how we can do a lot better than sending strings to remote databases using an inexpressive and non-turing complete language, check it out: https://code.kx.com/q4m3/9_Queries_q-sql/

jeltz · on Aug 11, 2020

Not parent poster but I love relational algebra. But so far I have yet to use an alternative to SQL which is less bad. Most of them seem to be designed by people who do not understand SQL.

kevincox · on Aug 12, 2020

I agree with you, I haven't found an alternative that is less bad. But that doesn't mean that I like SQL. It is so inconsistent and uncomposible, with too many awkward ways to do different things.

I love the idea of a functional algebra for querying a database, and while SQL is acceptable, it is far from great in my opinion.

int_19h · on Aug 12, 2020

Take a look at Suneido. As a product, it's a weird thing that kinda stands by itself, and isn't particularly useful because of that. But the database and the query language is straight up relational algebra.

https://suneido.com/info/suneidoc/Database/Queries/Syntax.ht...

piva00 · on Aug 11, 2020

Same for me, even more after venturing into the NoSQL hype of the early 2010s, I implemented pretty successful systems using non-relational databases but over time I stick to a RDBMS as most and as far as I can.

Not only I prefer to work with SQL nowadays I also prefer SQL over any ORM in older codebases, ORMs are pretty useful for getting up to speed without caring about your persistence layer too much but after 17 years in this industry I've had my fair share of issues with ORMs to avoid them whenever I can.

Native SQL queries with placeholders for my parameters in their own files, loaded by my database driver to execute and return data is my go-to solution for data access, it's flexible, maintainable and readable if you treat SQL as your normal code (code reviews, quality standards, etc.).

igeligel_dev · on Aug 11, 2020

Same for me. I love it. I always avoided the NoSQL things so far since the use-case is mostly unstructured data in comparison to normal SQL databases. I think what most engineers struggle with is just thinking of rows and columns as data. It is a separate thing to create schemas, work with them, inserting data, manipulating data and reading data. These are all different skills. Creating the proper queries is like learning an own language. Most programming languages come up with their own ORMs like the LINQ style of C# (EntityFramework) or all other ORMs like TypeORM for TypeScript, the Django ORM or Hibernate. So overall, to learn how to handle data you have to understand SQL basically + the abstraction layer. Sounds a lot harder than having data in a simple object.

Personally, I was lucky because I had great courses, even in highschool, regarding SQL including: How are rows working, what is normalization, how does it help with data and so on. So naturally I developed some feeling on how to handle database tables.

randomdata · on Aug 11, 2020

It sounds like you are more in love with relational algebra than SQL itself. Relational algebra is sound mathematics, but SQL used to express those ideas is horribly old and outdated. We have learned a lot about language design in the half century since its inception. We can do better.

goto11 · on Aug 11, 2020

I love SQL but I still think we can do better! The strength of SQL is the underlying relational model and relational algebra. Syntax wise SQL is somewhat clunky.

Linq in .net shows IMHO how queries can be expressed in a more consistent and composable syntax while still conforming to the relational model.

steve_adams_86 · on Aug 11, 2020

The work I do lately tends to be missing an sql layer and yes, I miss it a lot. I loved organizing my data at that layer and having such powerful ways to query it. It felt like I could eternally find ways to optimize it, and I really enjoyed learning year after year.

Lately I use nosql for very light data and otherwise our API outputs heavily cached and extremely simple data. Adding a database as a middleman wouldn’t make sense. Still fun, but I miss Postgres!

freerider · on Aug 11, 2020

When I started at a company there were using MFC in windows and started transactions in a CDialog::OnOK().

Me and my companion moved everything to stored procedures. Now every IT-department says that the product is VERY stable.

randomdata · on Aug 11, 2020

I like relational algebra. I dislike SQL. It harkens from a similar era as COBOL, and I dislike COBOL for procedural applications for the same reason. We've learned a lot about language design over the years and it is a shame that we've put very little effort into adopting new languages to address this particular problem space.

sanderjd · on Aug 11, 2020

No, you're right, SQL is awesome. I mean, the syntax itself is pretty creaky, but it doesn't matter, the power to declaratively query structured data is the key. Everything that provides that power is awesome; SQL is just at the top of that heap in the breadth of its utility.

mszcz · on Aug 11, 2020

Seconded. Every language has it's use and quirks. Does SQL get tricky in tricky situations? Sure. But what doesn't.

This piece reads like Joey being unable to open a carton of milk [1], "there's gotta be a better way!".

[1] https://www.youtube.com/watch?v=wwROPN3Fir8

bni · on Aug 11, 2020

I really like the relational model, and I think its the best way to model data. Referential integrity is an awesome thing that removes the possibility for many bugs to exist.

As for SQL it has its warts, but im pragmatic when it comes to programming languages, like for example C, JavaScript, its also "ugly" but its often the best option anyway.

Akronymus · on Aug 11, 2020

At work we almost exclusively use pure stored procedures[1] and everything is normalized very well. It is an absolute joy to write SQL, because of how terse it is while still being very readable.

Trying to implement business rules about data relations outside of the DB is a nightmare.

[1] We use dynamic SQL within stored procedures for pivots.

sozy777 · on Aug 11, 2020

You lose source control on your procedures. How you deal with that?

piva00 · on Aug 11, 2020

Why? When I worked doing infrastructure and CI/CD automation we created a pipeline for deploying stored procedures, not that different from any other code lifecycle process.

dragonwriter · on Aug 11, 2020

> You lose source control on your procedures

Why would you do that?

Akronymus · on Aug 11, 2020

We use daily backups of the whole server, so if we really need to rollback, we can. And if we absolutely need source control, we could write a procedure in combination with a trigger to automatically write the stored procedure to some source control.

Also there are tools out there that provide that functionality. Found with a few seconds of searching. https://host.apexsql.com/sql-tools-source-control.aspx

gregw2 · on Aug 11, 2020

You can use Liquibase or Flyway and an automated deployment process to keep your SQL code in sync with non-SQL code (if needed.) For bonus points, you can make your stored procedures be callable by other stored procedures, create/teardown mock data, and do TDD where your test suite of stored procedure tests runs on build during deployment and either has a PASS and deploys or hits a FAIL and the deployment aborts.

diegoperini · on Aug 11, 2020

If you use a DBMS to manually update these, yes. Otherwise you can do this in code whenever you want in your build/deploy pipeline.

djrobstep · on Aug 11, 2020

Just wrote a stored procedure in a text file and committed it to git. Seemed to work ok.

quickthrower2 · on Aug 11, 2020

I love SQL. I'd love someone to make it better for complex queries. Have you seen the enterprise SQL monstrosities. Why do we have ORMs if SQL is perfect?

tsimionescu · on Aug 11, 2020

We have ORMs because our programming languages object models are not relational, they are hierarchical - from C to Haskell, everyone goes for highly non-relational data representations.

So, when we interact with a relational DB we need some kind of layer to map between the world of relations in the DB and the world of objects in our program.

jimbokun · on Aug 11, 2020

I would say they're not even hierarchical in many cases, but an unconstrained graph.

quickthrower2 · on Aug 11, 2020

Correct but I didn’t explain well. I mean the SQL generation is left to the ORM and I think that’s no accident.

mattmanser · on Aug 11, 2020

Not for the reason you're implying, actually for the opposite reason. I'm from the days of yore, just before ORMs became popular and it basically replaced a lot of boilerplate code, but it wasn't the SQL that was the bulk of it.

It was mainly to save time in writing code to map columns to object properties really, the sql statements themselves were trivial even if you weren't lazy and just used select *.

Also, here's a now mainly historical pain most devs never encounter any more: Before ORMs and the various migrators, your object properties might not have the same name as your SQL columns. Yes, it was dumb when you did it, yes, it caused loads of bugs, yes, it actually happened quite a lot.

Multicomp · on Aug 11, 2020

> It was mainly to save time in writing code to map columns to object properties really, the sql statements themselves were trivial even if you weren't lazy and just used select *.

I am running a side project and yes, writing the native SQL statement to take your object and put it in the database is not a problem, put in the parameterised values and off you go.

But getting the data back from the database? Oh the horror. So much boilerplate in order to see if there are any records returned it all, if there are enough columns with the correct name for the kind of object you are making, if there is data or not in each column as appropriate for that specific column, if a given field can be coerced into being a string or an integer or a date or similar, then they're all marshaled into a dto object which is passed to the create new object validator. 800 lines of code later, and you may have an object back!

Dapper appears to be the sweet spot for me, I am still writing SQL queries and still designing the SQL tables myself, no orm magic here, but it handles the actual marshalling to and from an in-memory dto object versus data in the table for me, and that is very valuable time savings.

mattmanser · on Aug 11, 2020

Most SQL libraries handle this all for you? You really don't have to do any of that.

It's perfectly fine to just 'know' that the data is going to be an int and just do `var id = rs.getInt(id")`.

I'm on my phone so it's too fiddly to write my own brief example, but if you get rid of the silly comments in this you'll see you can do it all in a few lines, just very boilerplate lines:

https://thedeveloperblog.com/sqlconnection

nicoburns · on Aug 11, 2020

Completely agree. For example, why isn't there functionality to define aliases for complex expressions and then reuse those through a query. Simple query-local SQL functions would be nice too.

I'd love to see a "SQL-like" language that compiles down to SQL itself, much like Babel or TypeScript in the JavaScript world. I think the tricky thing is that there is no single SQL target.

pkalinowski · on Aug 11, 2020

> Completely agree. For example, why isn't there functionality to define aliases for complex expressions and then reuse those through a query. Simple query-local SQL functions would be nice too.

What about common table expressions? Or custom defined functions?

    with X as ( select... )

It's widely used for analysis

jeltz · on Aug 11, 2020

CTEs can do that, but it is quite clunky for when you just want to reuse a partial per row computation.

parliament32 · on Aug 12, 2020

>functionality to define aliases for complex expressions and then reuse those through a query

What's an example of what you're trying to do? I think views, user-defined functions, routines, and select aliases would cover all the bases...

jimbokun · on Aug 11, 2020

> I think the tricky thing is that there is no single SQL target.

So the jquery of SQL, then.

quickthrower2 · on Aug 11, 2020

Multi sql targets means it’s even more of an advantage to compile

scarface74 · on Aug 12, 2020

michaelchisari · on Aug 11, 2020

Same here. Every experience I've had in the alternatives reminded me why SQL was superior.

agumonkey · on Aug 11, 2020

how many devs accept it for what it is, limits included ?

it's alright, it's consistent enough, it's good

I'm reading about datalog and prolog more and more but sql is ok

stonemetal12 · on Aug 11, 2020

Do you like relational algebra, SQL the language, or both?

To me relational algebra is the beauty queen, and SQL is the "beauty mark" that prevents it from reaching perfection.

michaelcampbell · on Aug 11, 2020

I use it a lot, because I know it fairly well. Not sure I "like" it; it's like a lot of food - I know what of the food varieties I prefer over others, even when I didn't immediately "like" any of them. More of a "got used to" type of thing.

aspyct · on Aug 11, 2020

Totally with you, long live SQL!

coryodaniel · on Aug 11, 2020

Team SQL. There have been few data models Ive encountered that I couldn’t express with some functionality of Postgres.

throwaway_pdp09 · on Aug 11, 2020

What you (and I) like is not SQL per se but what comes through of the underlying relational model, that SQL hasn't screwed up. SQL as a syntax and semantics is a mess. It could have been better.

ckocagil · on Aug 11, 2020

I'm not impressed for two reasons:

1. Anyone striving to build a better SQL should make a comprehensive list of common (but difficult!) database tasks for OLTP and OLAP workloads. This will expose the weakness of their language. SQL has had 50 years and myriads of improvements to cover all these common cases. This is not a fair fight, so come prepared.

2. It's not enough to be just "better than SQL" to replace it. SQL has such a huge momentum that a new language needs to be absolutely better _and_ it should have many features that SQL cannot possibly have. My nice-to-have list would contain predictable performance, lock ordering, ownership relations (for easy data cleanup), and a standard low level language which the query optimizer would output.

GeneralMayhem · on Aug 11, 2020

Not disagreeing with your point (if purist language nerds had their way over practicality, we'd all be writing Haskell and Prolog) but most of your nice-to-haves strike me as properties of the database engine, not SQL itself.

Predictable performance - this will always be not only implementation-dependent but data-dependent as well. In order to know whether a join will be efficient or not, you need to know things like relative sizes of the tables, which is not necessarily a language problem. I have worked with SQL implementations that had extensions to let the user annotate joins with relative sizes on each side, but I don't think that's quite what you mean.

Lock ordering - again, good databases should have defined semantics (Postgres, for instance, does take locks in order when using ORDER BY), but I'll grant that this one could be stronger. That said, I think this is pretty niche. How often are you doing large multi-row transactions where lock order is a serious problem? If I have enough volume that deadlock is likely, I probably have enough volume that I want to be breaking up the process into a sharded or two-phase commit anyway.

Ownership relations - I think this is a DDL problem rather than a SQL problem.

Low-level language - I don't think you'll get a portable low-level language here (at least not for any definition of "low level" that's much lower than the SQL AST) because, again, the basics are implementation-dependent. What kind of scan is the base atom of a query? Well, it depends - is your database distributed? sharded? row-store-based? column-store-based? I do wish more open source database drivers would let you play with the AST in memory (Postgres has ways to print it out, but I don't think there's a good API). That would tend to solve the most significant problem raised in the article (composability) - plugging together SQL clauses automatically is hard, but plugging together subtrees can be much easier.

AmericanChopper · on Aug 11, 2020

> Predictable performance - this will always be not only implementation-dependent but data-dependent as well

This immediately jumped out at me from the parent comment. It would be entirely possible to implement a query language where you specify a plan for your query. But then you’d immediately lose the “better than SQL” competition, because your complexity and maintainability problems would skyrocket.

I’ve had to deal with this problem as an Oracle DBA, and it’s a complete nightmare. It starts with a statistics refresh ruining a couple of execution plans, so you start specifying them manually with the plan manager. Then it gets worse over time, because stats refreshes become a big risk and you don’t want to do them anymore. Eventually you get to the point where you pretty much only run verified plans. Then your verified plans slowly degrade overtime, because the underlying cardinality of every table is constantly changing. You’ve replaced the query optimiser with yourself, which is not only tedious work, but it’s simply not possible to do the job as well as any mainstream DB engine could.

CaptainZapp · on Aug 11, 2020

I wish that I could upvote your comment more than once, because this rings so true.

There certainly are (rare) situations, where you need to provide hints in one form or another, but it's really a bloody nightmare to maintain and may completely bork, when you - say - upgrade to a new version of the database engine.

I work with relational databases since the early 90s and can give you a no-bullshit money back guarantee that you (not you personally, obviously) are not smarter than the optimizer.

Usually there are weird data patterns involved if you absolutely must provide hints. But basically:

Don't do it!

mnsc · on Aug 11, 2020

I think one of the challenges with sql is that beginner developers can create naive sql queries that "work" but are extremely complicated for the optimizer to "get right". So in some cases (talking from own experience) the developer can, with the use of hints, "be better" than the optimizer when the problem all along was the overall structure of the query.

Edit: don't do it