Sirius: A distributed system library from Comcast

ojilles · on April 23, 2014

I'm not an american citizen, but I can't have been the only one with a reaction of: "Comcast? ... Comcast?!". But this is really awesome -- hope more tech is going to get open sourced from Comcast and similar companies.

anon_coward932 · on April 23, 2014

Comcast does lots of really good work on the back end. They've been spending a ton of money developing all of their new platforms using open source tools -- which inevitably means they'll create new stuff like this and contribute it back. They have a literal army of open source nerds hacking in ruby, scala, erlang, etc.

Comcast realized a long time ago that if they didn't want Google to eat their lunch, they needed to approach product development more like Google does. It's starting to pay off. I would expect to see more open source projects coming from them in the near future.

opendais · on April 23, 2014

Comcast is terrible to its consumers because they know you can't run away from them and get better service at that price point [generally].

Open sourcing stuff that costs them [very little] and may result in a cost savings in the future and doesn't allow you to disrupt their core business? Why not?

Maybe I'm just a bit cynical. ;)

ars · on April 23, 2014

You are cynical.

People complain about comcast because they are the only choice - when you don't get a choice you hate what you get.

Their actual service is perfectly fine. I'm sure you can dig up some bad stories, but think about it from a percentage point of view: Comcast is huge, there are bound to be some problems, but as a percentage of total customer the number of problems is perfectly normal.

PS. No connection with Comcast, I have a sore spot about herd mentality. People just believe things without checking for themself (or based on anecdotes).

Said a different way:

If you have tons of choices you will highly review your choice as a form of choice validation. If you have only 1 or few choices you will only review the company if there are problems.

The result of that is poor reviews without actual poor service.

opendais · on April 23, 2014

I refuse to use Comcast due to my experience with them which is anecdotal. But fair enough to call me cynical. :)

So, ummm, maybe you should ask if I've used them before going on a rant?

ars · on April 24, 2014

It wasn't really directed at you (except maybe the cynical part).

opendais · on April 24, 2014

Fair enough. :)

I just assumed it was because it was in reply to my comment. My mistake.

ars · on April 24, 2014

I wrote "people" to try to make it more general. Your comment was just a jumping off point for it.

Bjoern · on April 23, 2014

Reminds me of Infinispan [1] or other similar Data Grid software.

[1] http://infinispan.org/

agibsonccc · on April 23, 2014

Hazelcast is a favorite of mine: http://hazelcast.org

bri3d · on April 23, 2014

On cursory first glance Sirius looks more advanced than Hazelcast: Sirius seems to provide substantially more flexible cluster topologies along with "real" consistency behavior (via Paxos). Plus, Sirius seems to come with a built-in persistence system with at least some thought applied to it, while last I checked Hazelcast left persistence entirely up to the library consumer.

As a disclaimer I haven't looked at the implementations of either, only the documentation (and I used Hazelcast in a project a while ago).

larsmak · on April 23, 2014

Having used HazelCast extensively, and considering making some of my code Sirius-depentant, I can tell you that it is not more advanced. Sirius is basically just a distributed key-value store w/no partitioning, whereas HazelCast has a lot of abstractions built on top of the simpler principle[1]. In fact, it might be a bit overkill in some situations, and that's one of the reasons I will consider Sirius.

[1] http://www.hazelcast.org/docs/3.1/manual/html-single/

comcast-jonm · on April 23, 2014

Our particular reference dataset (TV and movie data) had some application use cases that required custom data structures to get the in-memory performance we wanted. Sirius doesn't provide datastructures per se (and isn't really even a datastore itself)--it provides an eventually-consistent, replicated update stream that you can use to maintain your own datastructures. Having that developer freedom and control over the datastructures was an important design principle.

bri3d · on April 23, 2014

Good point!

I agree and I think we were looking at different dimensions: the consistency/partition tolerance story (the "distributed system" bits) seem more thought out in Sirius while the variety of data structures and connectors look more built up in HazelCast.

This may be an issue of documentation as well: Because it's built more "vertically" with a large consumer-facing cross-section (lots of data structures stacked over each other), the HazelCast documentation seems to be very use-oriented while the Sirius documentation focuses on the fundamentals.

agibsonccc · on April 23, 2014

Yeah, I definitely use hazelcast for the variety of data structures.

I personally use it for a variety of tasks including distributed locks, a task tracker (think like hadoop) and for phases. If you're curious, here's the impl I'm talking about:

https://github.com/agibsonccc/java-deeplearning/blob/master/...

This is to augment an akka clustering distributed run time.

whadar · on April 23, 2014

How does it compare Redis?

bri3d · on April 23, 2014

Redis implements operations against simple in-memory data structures with a network frontend, not a distributed storage/replication system.

Sirius lives at a lower level: you could, for example, use Sirius to build the storage system backing a distributed Redis clone.

ihsw · on April 23, 2014

The network front-end is key here, especially since the goal of Redis is -- in spirit and in actuality -- to act as a network front-end to in-memory data structures.

It's damn good at it too, but disassociating that network front-end may make sense in the future.

comcast-jonm · on April 23, 2014

Redis and Sirius target slightly different use cases, although they share the goal of keeping data in memory. For our use case that motivated Sirius' development, we wanted to avoid doing I/O to an external system (even a fast one like Redis) in order to simplify development by having direct access to the data in native datastructures.

Additionally, we found that we needed some custom datastructures to get the performance we needed, so providing application developer control over those datastructures was an important motivation.

Bjoern · on April 23, 2014

Is it possible to do write through / write back to some other data store?

comcast-jonm · on April 23, 2014

This isn't currently supported, although Sirius has some options that might be helpful. Sirius isn't really a data store per se; it's really a persistent, replicated update stream (transaction log) aimed at letting you build and maintain your own in-process representation of the dataset.

As such, you could set up a separate cluster running in "follower" mode, subscribing to the stream of updates and them writing them out to some other datastore.

Bjoern · on April 23, 2014

Thank you for clarifying that.

fizwhiz · on April 23, 2014

LSM tree, anyone?