Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The other way you have to leave the Clojure ecosystem is when you have to communicate across the network or persist to disk. Writing everything fits in memory data analysis in Clojure is some of the cleanest and most fun programming I've ever done.

But once you want to spread between multiple computers or operate on data larger than what comfortably fits in memory, you end up coding just like you would in C#.

I started working on some versions of persistent data structures that serialize to a KV store, which lets you send a hash map to another machine, have save that machine add a key, then send it back. It seemed to work pretty well, but there's a ~10x performance penalty over Clojure's inbuilt data types (which are themselves not lighting the world on fire speed wise).

That's a pretty harsh price to pay - you can use 100 machines to simulate single address space Clojure on a 10x machine. Awesome, but likely impractical. If you need the persistence between program runs or would like to run incremental computation on large data I could see it working well, though.



When I was using HoneySQL[0] speaking to PostgreSQL over JDBC to deal with persisted data, I did not feel non-idiomatic. When I used Kafka and Onyx[1] to process large dataflows, I did not feel non-idiomatic. Granted my working set was only a few 10s of TB, but there wasn't much pain involved. I tended to use Nippy[1] for serialization.

I didn't use special data structures (except when the problem called for it), with a clear separation using nippy serialization (in place of pr-str/read-string) at the edges of my app.

[0] https://github.com/jkk/honeysql

[1] http://www.onyxplatform.org/

[2] https://github.com/ptaoussanis/nippy


But did you really get much of the benefit of using Clojure? If you store your data in an SQL database for instance, that layer provides the equivalent of what Clojure's STM can do within the process. You open a transaction, run some queries, perform some logic, then update the database. Clojure's persistent data structures don't give you extra leverage here - you could just as well write the logic with mutable datastructures in C# (that you throw away afterwards).

This is no accident, as Clojure's STM was built around the premise of emulating the MVCC provided by databases to provide ACI (minus the D) guarantees to program against. If you're already getting those guarantees elsewhere, doubling up does no good.

Similarly, code that operates on streams is generally not where you get bogged down by state and multithreading in something like C#.

I say this as somebody that believes Clojure is a powerful addition to the toolbox. I even use refs in production code...


I say you still do get a benefit from using because of the repl-driven development process. I miss that instantaneous feedback and the feeling that you are in dialogue with your program whenever I develop in other languages.

Add to that the fact that the language shepards you into using immutable datastructures and writing pure code, I still think Clojure is a significant advantage when writing distributed applications.


Yes, we leveraged Clojure a lot. PostgreSQL has far more advanced indexing and querying capabilities, particularly around too big to fit into RAM data, than what I'd want to waste time on building myself (if my team could even do it). The D in ACID is pretty important btw, and being able to offload the replication/backups/failover to AWS (using RDS) was a huge win operationally.

Clojure's HoneySQL is the best way to programatically interact with a database I have found, and it's not particularly close. It provides nice data structures to work with (that can even be schemed or speced) and combined with clojure makes writing complicated logic a breeze. It is easily extensible to new SQL functions, clauses, and operators, even vendor-specific stuff (we used a lot of PostgreSQL specific features around function-based indexing). LINQ doesn't compare to the flexibility here provided by data structures and a potentially just in time SQL compiler. No ORM does. As an example, the user might be submitting ajax requests to my backend, filtering some data by adding more filters or other clauses to my query. What I store in the session or in the database is fundamentally just (pr-str my-query). When the user comes back and wants to continue modifying it, I (read-string my-query-str) from my store. Then I just keep assoc'ing, filtering, and manipulating the data. If they decide this is a nice query, I just save it. It can become an alert, a report, etc. The only machinery is basic clojure data structure manipulation, basic print/reading, and the power of compiling data structures to SQL.

As far as operating on streaming data via Kafka (raw) or with Kafka + Onyx, I suspect you've never actually done this in Clojure. It's a breeze, quick to write and performant. I've yet to meet someone who has worked with Onyx who wasn't blown away by how simple it is to use and how quickly you can evolve very complicated data flow graphs. It provides reasonably low latency, customizable batching, and I can still write straight-forward and simple Clojure. You don't really deal with state or multithreading when dealing with Onyx (Onyx handles that for you and you can declare the parallelism behavior you're looking for). When I consumed Kafka directly, I didn't have many problems that get bogged down by state. We'd keep some local state, but the system was designed to either run in batch to consume some entire time interval or to be able to be kill -9'd at any point, in which case the data was designed with idempotence in mind (such that we can handle/detect processing data more than once).

I selected and championed Clojure because it was a dynamic language (very similar to python or ruby) that had access to battle-tested JVM libraries. I selected it over python and ruby, because it made choices that resulted in simpler programs that have far better locality when I or a co-worker must understand and modify a program in the future. The emphasis on immutable data and functional programming allows nice straight line programming where all I need to know to reason about the code is in the function arguments. There aren't as many surprises. The reason I've stayed with Clojure is the community focus on simplicity and the stability of my code. The code I wrote 8-9 years ago works just fine today. That's not a huge brag (cough Fortran cough Common Lisp), but it's refreshing when I'm looking at how quickly evolving languages like Rust and Python.


Thanks for taking the time to write this up!


So this got 3 downvotes, and I don't care about the karma, but I'd be quite curious to hear about opposing experiences scaling up Clojure programs since there seems to be some disagreement.

If you started to struggle with fitting everything in one node due to CPU or memory constraints, but still felt like you were writing Clojure in a half way idiomatic style when moving to a cluster, I'd love to hear from you.


~10X a performance penalty doesn't seem that bad. In C a pointer derefference will beat a web service call by more than 10X. Also why a handjammed KV store? EDN and other(faster) serialization libraries have been around for quite a while.


I used nippy originally, then later Java serialization libraries once it became clear optimization was required. I tried using both Redis and RocksDB. The happy case had all but the last layer of nodes sitting in cache in memory (as long as your memory is more than 1/32 of the size of your disk).

I think I could do better starting from scratch given what I've learned.


That’s just what happens around here when you imply that a Lisp might not be perfect.

Don’t think to hard about it, they don’t either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: