I'm planning to build a CephFS cluster at this point. I've given up on finding a cloud storage provider that will work well for us. This will require a fairly high fixed monthly cost to get space and transit at a datacenter, but after ~10-30TB BW you start to save a lot of money.
There are trade offs. Aside from more upfront costs and a fixed monthly, Ceph is ridiculously complicated. Interface wise, they need some much better abstractions.
B2 was a strong candidate, their BW rates are a little high still but approaching reasonable. The one issue is that they seem to have inconsistent latency. Not a problem for most use cases, but I need nearly all requests to come back in <100ms consistently, as I'm using this for web hosting. Their use case seems more focused on "hot standby backup" than on high availability ATM. I'm strongly considering them as the backup provider for my storage cluster.
FWIW, GCS is not winning any best-of-show awards for latency either. S3 at the time I ran tests was doing a better job.
> Backblaze B2 was a strong candidate... The one
> issue is that they seem to have inconsistent latency.
I work at Backblaze, and B2 is our second product (Online Backup was our first product). We are less experienced at serving up data that has fixed latency requirements, but we're learning and actively improving this area all the time. I'm curious when you last tried it and I would be interested in hearing about your experiences (in a personal message if you like).
In a nutshell, the first time we read a file we build it from the vaults (vaults are our lowest layer which is reliable but slowest) and that should be fairly consistent with some caveats (see below) and then for at least the next 24 hours it should be extremely fast and consistent being served out of our SSD cache layer.
Caveats: If you upload a ton of files it is getting loaded into the most recent vault we have deployed. So for three or four days you and everybody else are uploading tons and tons of files to this exact vault causing great loads. After ten days the vault will be "full" and we'll deploy another vault and suddenly serving up your files will become a lot more consistent and easier and faster because the vault is almost entirely idle.
What we just started doing is deploying twice as many vaults at once to lower their load in half while they "fill up" making the response in the first 10 days faster and more consistent.
hurstdog can speak to this more, but we absolutely don't scream on latency (particularly time to first byte). Internally, we rely heavily on caching, and that's just now been made available (in Alpha) for Cloud CDN to wrap GCS. Does <100ms for cache fill matter, or just aggregate p95 < 100ms including caching?
I'd also like to point everyone at Zach Bjornson's wonderful blogpost from last year [1], which was both thorough and independent!
It really shouldn't be this complex. I would love to just be able to boot an executable with a simple config file and be done with it. SeaweedFS shines a light on how this could be improved: https://github.com/chrislusf/seaweedfs
... which naturally brings the question: why choose Ceph and not SeaweedFS ? I've read a bit about the latter (especially the Facebook paper), and the design and operability seem simple enough that it might be a good starting point
Ceph is more production-ready, and has a lot of things SeaweedFS I believe doesn't yet have (such as consistency checking and rebuilding of failed nodes). It also has FS layer capability. SeaweedFS can use a filer but it currently doesn't support indexing (so you can't do subdirectory lookups for example, you would need to know the full filename).
SeaweedFS is fundamentally trying to do a different thing than Ceph is, and it benefits certain use cases. Reading the Facebook Haystack paper will give you an idea of the differences.
That said, I'm extremely impressed with how simple the interface is. It's very easy to get started.
There are trade offs. Aside from more upfront costs and a fixed monthly, Ceph is ridiculously complicated. Interface wise, they need some much better abstractions.
B2 was a strong candidate, their BW rates are a little high still but approaching reasonable. The one issue is that they seem to have inconsistent latency. Not a problem for most use cases, but I need nearly all requests to come back in <100ms consistently, as I'm using this for web hosting. Their use case seems more focused on "hot standby backup" than on high availability ATM. I'm strongly considering them as the backup provider for my storage cluster.
FWIW, GCS is not winning any best-of-show awards for latency either. S3 at the time I ran tests was doing a better job.