I think there is really no doubt that the storage prices itself ( for all types) are pretty amazing. But let's face it: traffic costs are the huge elephant in the room. By charging 12ct and more per GB, the traffic costs easily become your biggest expense, and make the storage price reduction from 2.6 to 2ct per GB almost forgetable.
For me this is in no way acceptable and it seems to be a vicious attempt to sneak in some extra profit without having the customer noticing this upfront. Shure, these harsh words seam like a big exaggeration, but I literally never ever hear or read something about traffic costs in Googles fancy blog posts, and I would make the assumption that only a fraction of the HN community is aware of this fact. 120 bucks for a sloppy TB of traffic is just way too high.
This x 1000. Even when you throw your own CDN on top of this, the bandwidth markup is nasty. Neocities would be unsustainable if we used GCS for our hosting.
I'm paying about $0.01/GB right now, and I've seen market rate at half that. And that's not even directly using IP transit providers. You can get a gigabit unmetered for $450-1000/mo which you can shove a theoretical 324TB through every month. The difference in the numbers is so staggering I sometimes wonder if I'm even doing the math right.
Perhaps their bandwidth is better somehow (prove it), but 12-18x better it is probably not, and having truffle shavings added to your IP transit really adds up when you're hauling a lot of traffic. If you're doing something with heavy BW usage and low margins, be careful with stuff like this. It quickly becomes much more expensive than doing it yourself.
I'd love to be wrong here. I'm sitting next to 60 pounds of storage servers I'm setting up for a data center, they're taking up my entire living room. I would love to get out of the data persistence business forever. But at these BW rates, it's never going to happen.
I've looked at this. Easily the best deal in town. A few notes for people considering it:
- It's IP transit and a DC from HE, which means it's possibly not BGP peering through other transit providers? If this is true, if HE's network fails, so does your server's internet connection. Typically you want to be multi-homed to ensure redundancy, unless you're doing something like an Anycast CDN and you don't care if it craps out for a few hours.
- They only provide a 15A circuit (likely only 80% usable) for a 42U rack, which is pretty ridiculous. You'll blow through that pretty quickly if you're using dual Xeon servers. It's cheaper under their pricing to get a second 42U rack than it is to get the proper amount of power needed for a full cabinet.
- If you're doing Anycast, be aware that HE doesn't provide any BGP communities except blackholing, which could make it hard to tune your network.
(If any of this sounds annoying to deal with and think about, you know exactly why I'd prefer that the cloud providers had better BW costs so I could not have to do this anymore. Again, if the business model works with GCS, congratulations, use it.)
> It's IP transit and a DC from HE, which means it's possibly not BGP peering through other transit providers?
The gigabit transit that comes with the cabinet is HE-only, yep. You can however buy transit and an interconnect from any other carrier in the datacenter (there are quite a few in FMT2 at least).
> They only provide a 15A circuit for a 42U rack, which is pretty ridiculous.
Yeah, it's pretty shitty. I ended up going with Xeon D for the power efficiency. Works pretty well.
At our highest egress tier, our CDN offering around GCE (or GCS in Alpha), starts being competitive [1]. I agree though that if you just want some bandwidth, no major public cloud provider comes close to the kind of transit rates you'll find for "this isn't VOIP or gaming...". How much egress does Neocities do a month? Feel free to ping me if you'd prefer to discuss offline.
$0.02 for North America is a rate where I start to go "okay, that could make sense" for entry level. I would call it priced right if it was extremely high quality bandwidth, as in you made some good peering arrangements with the eyeball ISPs and had some strong low-saturation IP transit in the mix. You would have to convince me of it's quality. My traceroutes should show a lot of direct regional peering to the major ISPs and Level 3 caliber transit.
I really think you could differentiate a from your competition hugely by just using that $0.02/GB point for all your customers, regardless of the amount of egress they're using. It's a big chicken-and-egg problem to have those bandwidth costs initially as a startup, but then be able to handle them as a bigger organization. It made us move to our own infrastructure, and once we're here, I already have my infrastructure in a datacenter at that point, so why bother with the cloud?
Even better (for me anyways) would be to provide an option for the "this isn't VOIP or gaming" bandwidth. Not everybody needs that. Actually, probably most people don't. For what I would use GCS for I only need transit to other datacenters. That bandwidth is much cheaper than major ISP peering bandwidth because there aren't monopolies like Comcast extorting everybody for peering.
I'm planning to build a CephFS cluster at this point. I've given up on finding a cloud storage provider that will work well for us. This will require a fairly high fixed monthly cost to get space and transit at a datacenter, but after ~10-30TB BW you start to save a lot of money.
There are trade offs. Aside from more upfront costs and a fixed monthly, Ceph is ridiculously complicated. Interface wise, they need some much better abstractions.
B2 was a strong candidate, their BW rates are a little high still but approaching reasonable. The one issue is that they seem to have inconsistent latency. Not a problem for most use cases, but I need nearly all requests to come back in <100ms consistently, as I'm using this for web hosting. Their use case seems more focused on "hot standby backup" than on high availability ATM. I'm strongly considering them as the backup provider for my storage cluster.
FWIW, GCS is not winning any best-of-show awards for latency either. S3 at the time I ran tests was doing a better job.
> Backblaze B2 was a strong candidate... The one
> issue is that they seem to have inconsistent latency.
I work at Backblaze, and B2 is our second product (Online Backup was our first product). We are less experienced at serving up data that has fixed latency requirements, but we're learning and actively improving this area all the time. I'm curious when you last tried it and I would be interested in hearing about your experiences (in a personal message if you like).
In a nutshell, the first time we read a file we build it from the vaults (vaults are our lowest layer which is reliable but slowest) and that should be fairly consistent with some caveats (see below) and then for at least the next 24 hours it should be extremely fast and consistent being served out of our SSD cache layer.
Caveats: If you upload a ton of files it is getting loaded into the most recent vault we have deployed. So for three or four days you and everybody else are uploading tons and tons of files to this exact vault causing great loads. After ten days the vault will be "full" and we'll deploy another vault and suddenly serving up your files will become a lot more consistent and easier and faster because the vault is almost entirely idle.
What we just started doing is deploying twice as many vaults at once to lower their load in half while they "fill up" making the response in the first 10 days faster and more consistent.
hurstdog can speak to this more, but we absolutely don't scream on latency (particularly time to first byte). Internally, we rely heavily on caching, and that's just now been made available (in Alpha) for Cloud CDN to wrap GCS. Does <100ms for cache fill matter, or just aggregate p95 < 100ms including caching?
I'd also like to point everyone at Zach Bjornson's wonderful blogpost from last year [1], which was both thorough and independent!
It really shouldn't be this complex. I would love to just be able to boot an executable with a simple config file and be done with it. SeaweedFS shines a light on how this could be improved: https://github.com/chrislusf/seaweedfs
... which naturally brings the question: why choose Ceph and not SeaweedFS ? I've read a bit about the latter (especially the Facebook paper), and the design and operability seem simple enough that it might be a good starting point
Ceph is more production-ready, and has a lot of things SeaweedFS I believe doesn't yet have (such as consistency checking and rebuilding of failed nodes). It also has FS layer capability. SeaweedFS can use a filer but it currently doesn't support indexing (so you can't do subdirectory lookups for example, you would need to know the full filename).
SeaweedFS is fundamentally trying to do a different thing than Ceph is, and it benefits certain use cases. Reading the Facebook Haystack paper will give you an idea of the differences.
That said, I'm extremely impressed with how simple the interface is. It's very easy to get started.
I am currently using the Hetzner SX131 in RAID 1 and http://minio.io. For about 200€ you get (2x) 30 TB and 50 TB traffic included. Thats about 0.007€ per GB with "free" traffic.
There are a couple of interesting infrastructure/business trends driving this across providers.
The call out of 12ct for throughput and 2ct for storage actually reflects a typical blended data environment. In short, data access frequency definitely follows the pareto principle. The vast majority of data is write only, with very little of it being accessed frequently. For those generalized storage populations the pricing actually aligns.
Related is the issue of "stalled" throughput of dense storage like HDD. 4 years ago it was 4TB spindles at 5-7,200 RPM. We're up to 10TB per spindle these days. But rotational latency is still 5-7K rpm, limiting throughput to say 20-200MB/s depending on how much you like latency and queueing. As I recall HDDs are going to be up to ~20TB per spindle over the next few years. And still 5-7K rpm. Storage keeps getting cheaper, throughput is not.
And lastly its really important that youre not paying for traffic. Youre paying for your providers network. That's their datacenters, their backbone, their leased fiber investments, their hundreds of millions in capital. It's not something as simple as "oh I can get cogent for $0.2/mbs." It's an old post, but for idea see https://news.ycombinator.com/item?id=7479030.
The best thing about Google Cloud is the Console. I haven't seen a comparable console for managed services. Think about the UX nightmare that is the AWS Management Console and take that complexity and separate it out neatly and make everything available in a fast search box and you have Google Cloud Console.
It's the primary reason I've switched over to Datastore and App Engine.
Hmm. I actually really like the AWS management console. I find that everything is logically organized and easy to manipulate. That could just be because I've been using it for a long time.
What about ~~reliability~~ durability of storage (bitrot, not the uptime)? I didn't see any information comparing this, is it practically the same across both?
At this point, durability is hard to compare because the published numbers are total fantasies. S3's published durability is 11 9s, which I've complained about in the past because the chance that the human race will be wiped out next year by an asteroid impact is probably not too far from one in a hundred million, which puts a hard cap on durability at around eight nines.
Then ask yourself, "is the chance that someone made a mistake in designing the system which causes catastrophic data loss higher or lower than one in a billion?"
Yes, the number is total fantasy. The only actual meaning of the durability SLA is that if the durability SLA is violated, your cloud provider will give you some service credits. If the data is worth more than some service credits, this is small consolation.
In practice, if you absolutely must compare durability, what you are trying to do is compare the frequency of black swan events, but actually estimating the frequency of those events requires access to proprietary data which you don't have.
Both providers are undoubtedly storing the data in multiple locations on multiple types of storage systems with multiple layers of error coding.
After thinking about this issue earlier, the most likely data loss scenario in my opinion was "I get injured, the company hires someone incompetent who doesn't pay the bill for a year while I'm in the hospital."
I've always seen durability from the perspective of Amazon telling me, "Look, we're going to lose your data - here's how quickly you should expect it to rot." Basically, take the number of objects you have, and multiply by: 1xe-7 each year. So, if I have 400 Billion objects, (4e11) - Amazon is going to lose approx 40,000 of them/year, and I should be prepared for that.
With regards to your Asteroid Strike (And other disasters such as Riot, Insurrection, War, Hurricane, Earthquake, etc...) - these are all disclaimed in the agreement you sign with Amazon under a clause known as Force Majeure - which essential means, "Acts of God". The durability clause comes into effect under the normal course of business, not exceptional events. For those types of scenarios, you'll want to have a business continuity plan in place, not a durability formula on your storage service.
Your numbers are off... according to Amazon's durability notes, if you have 400 billion objects then Amazon would expect to lose less than 40 per year, not 40,000. And for that level of storage, you're paying Amazon something like a quarter million dollars per year to do storage for you.
My point is that there is no point in taking that kind of durability guarantee into account, because there are much bigger threats to data storage. It's kind of like saying that driving by car is very safe if you don't get into an accident--while technically true, it's not a very useful fact.
The exact values may be off, but that is how to think of it. Losing any specific object is less likely than earth ending. But these are providers with many trillion objects. In that case there are a handful of objects that are going missing each year, on average. The durability guarantee allows customers to make an educated decision as to how much they invest in loss prevention.
But lets get back to that 400B object customer who's losing 40. How much does it cost to verify (resilver) all those bits, even annually? Is it even tenable?
No, that's absolutely not how to think about it. In short, it's a dangerous simplification and it's not at all representative of how failures work in a system like S3 or Google Cloud Storage. It does make good marketing copy, but if you are an engineer or a program lead you have a certain level of responsibility for understanding why cloud storage does not, in practice, give you eleven 9s of durability per object every year. (At a first approximation, you would at least expect the data loss to follow a Poisson distribution, but…)
Drilling down into the techical guts, S3 and Google Storage do not store separate objects in the lower storage layers, it's too inefficient. So below the S3 / GCS object API, you have stripes of data spread across multiple data centers with error encoding, along with a redundant copy on tape or optical media. Randal Munroe estimated Google data storage at 15 EB (https://what-if.xkcd.com/63/), so taking that number, let's suppose a stripe size of 100 MB (just picking a number out of the air that seems reasonable) and you get 1.5e11 stripes. Taking Amazon's 11 nines, that gives a loss of 1 stripe every 8 months.
So, all we've done so far is look at how a system would be implemented, and we've already completely destroyed the notion that you would expect 40 of 400B objects to disappear due to bit rot in any particular year. Supposing the objects are 10KB in average size, you might expect most years to lose no objects at all, and if you lose any you might lose 10,000 at the same time—and the entire extent of your recourse is to get a service credit from your cloud provider.
The gotcha is that the system simply isn't that reliable. First of all, engineers at Amazon and Google are constantly pushing new configuration and software updates to their stack. Some of these software updates can result in catastrophic data loss, and some of these errors will not get caught by canaries. "Catastrophic" might mean metadata corruption, it might mean the loss of many stripes all at the same time, but from the cloud provider's perspective they're still meeting SLA for most of their customers so most of their customers are happy. On top of that, you also have to take into account the possibility that a design flaw in the storage media would cause massive data loss across multiple data centers simultaneously, or other nightmare scenarios like that. Given that I've personally experienced data loss due to design flaws in storage media and I only have ever owned twenty hard drives or so in my life, you can imagine that a fleet with millions of hard drives presents some unique reliability and durability problems.
You can pretend that these configuration and programming errors are "unusual events" but the fact is that stripe loss for any reason is already an unusual event, and you might as well include the most probable cause of data loss in your model if you are going to model it at all.
So, what is the SLA? It's part of a contract. It defines when the contract is performed and when it is broken. It's also a piece of marketing and sales leverage. That's all. It's not a realistic or particularly useful description of how a system actually works—so the responsible engineers and program managers at companies which use cloud services are always asking themselves, "What happens if Amazon or Google violates their SLA? Will I lose my job?"
(A footnote: You don't need to verify the bits yourself, cloud providers will send you messages when your data is lost. If you want more durability then you go multi-cloud or buy a tape library.)
(Disclosure: I work at a company that provides cloud storage.)
I found this response to be fantastic, educational, and I've already bookmarked it for future discussions with people regarding cloud storage - so thanks very much for your response. I wish I could highlight it as an "Answer of the Month" - but at the very least, I'll share it with others when discussing the topic.
So, perhaps another way of looking at this, is the 11 9s of reliability means that Amazon is providing a guarantee that they will lose at least that many objects, but, for all the reasons that you highlighted, there underlying redundancy mechanism means that there are all sorts of reasons why a catastrophic loss could result in many orders of magnitude lost of data, not only during exceptional events, but just under the normal course of business.
I would note that your anecdote about losing "data on a hard drive" isn't as relative at cloud storage levels, because one of the things that has been drilled into my mind by a colleague who works on a cloud system at scale, is they not only assume they will lose a single device, but they also scale so that they can lose (in order), a complete Rack, a complete PDU, and a complete Data Center - and still continue to provide availability to storage. That is, in the normal course of business, they plan on losing data centers, and continuing to provide full availability (albeit, at reduced durability in that event until they restore that data center). Google takes this up a notch, and provides availability in the event that an entire region is lost. Cautious companies can roll their own Business Continuity Plans on top of this as well.
What would be interesting, but unlikely, is for Amazon/Microsoft/Google to share with their customer what their actual loss of data was in the prior periods (Say, per year), and then provide a rolling graph of actual loss. Also useful (and almost guaranteed not to be available) - is what percentage of customers lost more than their SLA each month.
Disclosure: I currently work at AWS and have a bit more than passing familiarity with large scale storage systems. I know what erasure encoding is. I know from experience just how wrong things can go.
The passing comment of distribution of errors is important. However the "1 stripe is 100Mb, and objects are 10KB, ergo you'll lose 0 or 10,000 objects" bit is bizarro. I suspect youre letting your personal experience lead to assumptions that may not be true in other implementations.
Yes, that's definitely not true on all the storage systems I've worked on. Some storage systems will pack a single customer's data into a single stripe and others won't.
Google is making their cloud offering very attractive compared to AWS.
The issue with Glacier is the convoluted retrieval pricing. I understand they want to dissuade people from using it as a primary storage, but the potential for a surprise bill is hard to swallow.
Interesting how they still offer unlimited storage through their consumer Amazon Drive service.
The AWS web console is a pain point. I had a particularly frustrating experience once where attempting to restore a Glacier storage class file in S3 silently failed due to permission issues (I had forgotten to assign permissions to the bucket owner in my automated upload script), but the console told me that the file would be "available in 3-5 hours". I wasted days due to this.
Meanwhile other competitors (e.g. Nearline) promise to have your file available within seconds...
Yep. Nearline is great. Just so we're clear though: milliseconds ;). There's no latency penalty to get your bytes back, even with Coldline. We do charge a retrieval fee because the pricing is determined by a "promise" that you won't touch the data frequently (roughly once a month for Nearline and once a quarter for Coldline), otherwise you should use our regular flavors.
Disclosure: I work on Google Cloud and am a happy GCS customer.
Coldline seems to be a perfect solution for personal backups. I used amazon glacier, but got hit with extremely huge bill, when I decided to retrieve my backups, so I'm not going to deal with them anymore, their retrieval pricing is absurd. If there's good UI and scripting solutions for manual and automatic backup into google coldline, I'm in.
I highly recommend the "Arq" [1] commercial program for backing stuff up to these kinds of services. They support most of the big names (AWS, Google, Dropbox, etc.) and will probably be supporting the new Google options soon.
It does local encryption before sending your data up to the cloud.
Notably Arq restores all macOS permissions/meta-data, and there is an open-source test kit to show whether such a program does so.
It also works on Windows.
I have no relationship with the company, other than as a happy user.
Likewise - I can't say enough great things about Arq. In particular, when I moved to Singapore, I was able to really take advantage of the gigabit line I got (for $39/month), and use the S3 storage that Amazon offers in Singapore. Looking forward to using the Google Storage in Singapore as well...
FQDNs to the rescue, I very much doubt anyone but me is going to call a bucket backups.ninjagiraffes.co.uk (and if someone does so now, well, good trolling)
$60/year is too much for me. I have around 200GB data for backup now, slowly growing. It's $0.007 * 200 * 12 = $16/year for Google cold storage. Also I'm skeptical about "Unlimited Storage", there's always a limit and I prefer to know about it beforehand.
Of course Amazon Cloud Drive looks nice for people who wants to backup their media with little effort. I'm using a bit different approach, I'm backing up my data to a home server, but I need a reliable mirror in case of emergency.
Google Nearline seems to be $0.01 / GB / month which would be about $24 / year for you. Sounds good since you can retrieve data whenever you want without any extra cost. I would probably use that for my backups if I ran out of space on my normal storage.
But per this announcement Google's new "Coldline" storage hits the AWS Glacier 0.7 cents/GB/month price point, and without Glacier's mindboggling operational and billing complexity, long delays on retrievals, and I think the particularly huge bills if you need a lot of that data all at once, i.e. for restoring a backup. (Can't remember the Glacier numbers for the latter (too complicated for me to remember), but Google makes Coldline retrieval simple, 5 cents/GB.)
In fact, it's implied they believe Coldline is the biggest new thing they're doing, for it's the first thing in the announcement they discuss in detail. It's certainly got me interested.
For the record I'm aware of people with 10s of TB in Amazon Cloud Drive without any issues. It is of course yet to be seen whether they'll pull a Microsoft and take it back.
Heads-up about Amazon Cloud Drive. From their TOS:
"We may use, access, and retain Your Files in order to provide the Services to you, enforce the terms of the Agreement, and improve our services, and you give us all permissions we need to do so. These permissions include, for example, the rights to copy Your Files, modify Your Files to enable access in different formats, use information about Your Files to organize them on your behalf."
amazon has limits on the type files you store in drive, and they can terminate the service at any time without notice. That's all according to their terms of use
that's very scary if they can terminate my service and delete all my binary backups because they did not like binary files.
IANAL, but it's just a catch-all clause so they could kick "abusers" without worrying that whatever "infraction" was not explicitly enumerated in the terms.
For comparison, CrashPlan has a similar clause (see section 11)[0].
Ditto Backblaze (see "Our Rights")[1].
I wouldn't be surprised that all "unlimited" consumer products have similar clauses. If this is a concern for you, use a metered product.
For our Personal Backup product (fixed $5/month for as much data as you can keep on your laptop) your data is encrypted on your laptop BEFORE UPLOADING to Backblaze, and we have no idea what is inside your files and I assure you we don't want to know. (Look into setting your own private encryption key if you are worried at all about this.) In our 10 year history we have never once removed a customer's file because we objected to the file contents in that customer's Personal Backup because we simply don't know what is in the files.
For our other product line (B2 Cloud Storage for half a penny per GByte per month) the problem is you can have a "Private" bucket at which point we absolutely DO NOT care what you store. But if you have a "Public" bucket that is a public website. If you have public bucket serving up illegal content such as a phishing website or sharing bootleg songs or movies that you do not have legal rights to share, then we may shut you down (turn the bucket "Private" so nobody but you can get the data).
TL;DR - use our Personal Backup product or a "Private" bucket and store WHATEVER YOU WANT. But if you break the law serving up a website from Backblaze we have an obligation to shut you down.
In case it's not widely known, rsync.net maintains 'gsutil' in the environment (along with s3cmd) so you can move data between google cloud services and rsync.net.
Although it should be noted that, circa 2016, the cool kids are all backing up to rsync.net with borg[1][2] (the "holy grail of backup software") which limits the use cases of gsutil with an rsync.net account.
I say this most times I see you advertising and I'll say it again: You really need to work on your pricing to be competitive with things like this. Even with the HN discount your service is expensive and even with the Borg/Attic pricing which is cheaper still, your service is 3x more expensive than alternatives like Nearline (aside from bandwidth), which themselves can be used as a backend for Attic (just mount GCS locally).
I really like the look of the service but I can't justify paying over 3x for it.
"your service is 3x more expensive than alternatives like Nearline (aside from bandwidth), which themselves can be used as a backend for Attic (just mount GCS locally)."
Well, our storage platform is online and random access so it would be inappropriate to compare it to either nearline or glacier.
The appropriate comparison is to S3 - or in this case, the multi-regional GCS option that this discussion points to.
In that case, our attic/borg pricing is 3 cents as compared to (roughly 3 cents) for s3 and 2 cents for GCS - and that assumes you use no bandwidth. Since we charge nothing for usage/bandwidth, the comparable prices from amazon/google would be slightly higher.
It sounds like a steal to me and every day plenty of people agree enough to commence using our services.
Also, unrelated, you know you can just drive up to an rsync.net location and get your data - even if the Internet is crippled.[1]
> Well, our storage platform is online and random access so it would be inappropriate to compare it to either nearline or glacier.
That would be true but you stated "circa 2016, the cool kids are all backing up to rsync.net with borg", so comparing your service pricing as a backup product to Nearline is appropriate I believe.
Don't get me wrong, your service definitely has a lot of good applications and the free bandwidth is quite great but for those of us that just want somewhere to park a whole pile of data we aren't going to touch much, Rsync.net is quite expensive.
Also a downside of rsync.net vs. AWS/GCS etc. is that you can't direct users' HTTP requests to rsync.net. That dampens the usefulness of online storage and unlimited bandwidth, since it essentially limits it to servers under our own control. If I could run HTTP off it directly, my feelings would be very very different.
I agree that the benefits of "it's ZFS" is an important thing people miss when they say "oh I'll just use GCS or S3". And the bandwidth to get your data back out is absolutely material. Your bundled price is seriously a good deal at lots of points in the option space.
But, I would like to quibble with the "online" comment about Nearline or Coldline: there's no access penalty (now). We kind of quietly announced it in June or so, but Kirill the PM linked to it again. I had understood the "default" rsync.net choice to be in a single location, so comparing to our multi-regional (as opposed to Regional, Nearline or Coldline) seems incongruous.
For that price, I'm better off just paying $60 / yr for CrashPlan, which is unlimited storage. I'd say that applies to anyone who has a terabyte or more of stuff to back up.
CrashPlan removed all my data since I wasn't keeping my disks online. Fair enough -- it's a backup solution, not remote storage. Still, I'd rather have paid for the latter in the first place and saved myself the trouble of having to upload everything again.
This has happened to me too and it is annoying. However, I believe the delete is a soft-delete unless you specifically configure CrashPlan to remove deleted files after so many days. I have mine configured to NEVER remove files, even deleted ones.
I've found that you can alleviate some of the headache by assigning the disks unique drive names. The reason for the deletion is that you might plug in your external HDD that gets mounted as drive F. You let it finish backing up, then disconnect it. If you later plug in a tiny flash drive that also gets mounted as F:, CrashPlan will see the contents of that drive and assume that you deleted everything from the previous drive and accordingly mark is as deleted in the backup.
If you mount the external HDD as drive Z:, however, when you disconnect it, CrashPlan will simply show the device as "Missing", but won't delete it unless something else gets mounted as Z while CrashPlan is running.
Not a significant portion, no. CrashPlan is only my third layer of backup, the offsite backup. Each of my drives have a clone backup drive (typically just a WD My Passport that makes them easy to transport when I travel), and then a larger external that holds backups of all of them--I usually keep that one in storage and only plug it in once a month or so to sync. CP is the third layer of backup. The most I do in a typical week is use the Android app to grab a few movies or documents from the CP storage to get it onto my phone and tablet.
I should probably "test" doing a recovery from CrashPlan at some point, but I fear that even getting a third of it (1TB) would probably trip some sort of overcharge from my ISP.
I have. It was decent. accidentally hosed a vmware image and it was easier for me to just pull the whole thing down from Crashplan. Speeds were decent, limited mostly by my cable modem. I don't recall the exact throughput but I remember looking and saying "Hey, this isn't bad" and not being annoyed by anything.
As far as I can tell, they're still operating out of one data centers, and while they claim it's Sacramento location is outside of earthquake danger zones, I can still see one to the west impacting availability depending on fiber routing, electrical supply, etc.
Echoing another comment, I'm not sure if this is a big a danger as a programmer's or operator's potential to wipe out all your data no matter how many datacenters it's located in, but....
That seems acceptable for a consumer backup (not cloud storage) solution.
I have data on my laptop, an external disk that I keep at home, and BackBlaze. The likelihood of all three failing in the same week so that I can't restore from any single copy is low.
> That seems acceptable for a consumer backup (not cloud storage) solution.
I work at Backblaze, so I'm biased. :-) But the use case is important, meaning I wouldn't group together all "cloud storage solution" as one use case.
Backblaze does not have compute at all (like EC2) so Backblaze is a terrible choice for a company that will spend a lot of cycles analyzing the data stored in the cloud or doing compute on the data stored in the cloud. That is a much larger issue than the one data center for that use case.
On the other hand, it is my profound belief that for long term durability of data you should have AT LEAST three copies of the data stored by profoundly different vendors. Hopefully different file systems and stored by software written by different programmers so the same bug that affects one copy won't affect the other copies. Hopefully stored at separate physical locations. Put a different way: the only way to get more reliable than data stored at Amazon Glacier is to have one copy in Amazon Glacier, another copy on Google Coldline, and another copy in Backblaze. In that use case, the single data center of Backblaze is obviously a non-issue.
Indeed, and I have seriously considered them for their 71% lower price than AWS Glacier and now Google's Coldline, but I wanted to point out the apples vs. oranges difference in one way each does redundancy, or not in the case of Backblaze.
Google finally is taking a step over the cloud offerings. I was skeptical about them ~1.5 year ago, they where only following the AWS without big innovations.
Google smart to be aggressive on the storage side of their Cloud business. Once you have the data (assuming it's big data) it naturally makes sense to move the processing of that data into Google Cloud services as well...
There are two outages there, one that only affected a few projects and one that affected only service in the central US (we have regions over much of the world).
Anecdotally, from watching and working with other services internally I don't think most of our outages affect all regions. We actually spend a significant amount of engineering effort ensuring that we're as decoupled as possible.
Disclaimer: I'm an Engineering Manager on Google Cloud Storage.
On the flip side, many many outages lately have hit AN entire region, making the availability zone bit not so useful. For example the last US Central load balancer issue, it took out the entire central region for anyone using it.
Not sure what happened there, but think a deploy should be rolled out to 1 availability zone at a time for hosted things (like load balancer)
Is there some documentation anywhere talks about how Google Cloud (any product) creates isolation between various regions while at the same time exposing a simple "regionless" programming model?
Probably the best reference is the SRE book, specifically the chapters on loadbalancing and distributed consensus protocols.
Other than that, the general approach is to minimize global control planes and dependencies in our software stack. In the case of GCS, we do have a single namespace which means we need to look up the locations of data early in the request. Once we know the locations of data we can route the request to the right datacenter to serve it. That global location table is highly replicated and cached, of course.
When outages happen, most are caused by changes to the stack, so we also are careful to roll out code or configuration slowly and carefully, slowly increasing the blast radius after it's been proven safe. For example rolling out new binaries first to a few canary instances in one zone, then to a few instances in many regions, then to a full region, then to the world, all spread over a few days.
Disclaimer: I'm an Engineering Manager on Google Cloud Storage.
That's what I was thinking. The various outages I've seen (at least big enough to get mentioned here) seem to affect things globally. The AWS model is traditionally having things isolated between regions. While that makes programming slightly more complicated, for me it creates an added layer of safety.
To be clear, GCS has had multi-region support (the "Standard" storage class), we're just making it explicit in the name. To add to hurstdog's response, I'd say: No, most outages for GCP have not been global. In particular, most Compute Engine outages are single zone and caused by either a networking configuration change (often affecting VM <=> Internet connectivity, like https://status.cloud.google.com/incident/compute/16004) or sometimes a software rollout to our VM infrastructure.
Much like with GCS (and any service at Google), the most common source of outages is a rollout. While we strive to offer zonal and regional services where appropriate, some like GCS and PubSub do have real value as "global" APIs. Trust me though, hurstdog spends a lot of his time struggling with that balance ;).
Can someone explain to my why I shouldn't be worried about Google becoming a monopoly? I think they make some fantastic products and they have been expanding their reach to hardware, DNS, web hosting, and storage.
You don't need to start worrying about them becoming a monopoly until they actually figure out how to properly provide support for their products to the people (eg enterprises) that need it. :P
It's difficult to see how AWS and GCE can justify their ridiculous bandwidth charges given how low they buy it in bulk. This is plain profiteering. With cloud bandwidth was supposed to become something one takes for granted not something to fuss over with calculators.
And the whole nickel and diming charges that force users to needlessly seperate their computing needs into storage, compute, memory, bandwidth, reads, writes and what not are actually forcing considerable complexity on users.
This cannot be brushed aside as a good model for cloud when valuable user time is being wasted grappling with needless intricacies that have no reason not be flat.
End users cannot buy bandwidth at low bulk rates and for things like backups may be faced with considerable overcharges on their local connections.
I'm extremely reluctant to rely on Google services for core infrastructure for anything revenue generating after seeing the shenanigans they've played on pricing with their other services (especially Google Maps, where they initially jacked up the pricing to an absurd level, then backpedaled and made the prices 1/10 what they were initially threatening) and the relatively short migration windows they typically provide.
Besides capricious pricing changes, their support is notoriously bad, even for paid apps for business.
They don't seem to understand business customers needs nearly as well as Microsoft/Amazon, from what I've seen.
Really not who I want running my business's server infrastructure, even if it's completely free.
Dear god, the copy on that page is awful. The first paragraph is just vacuous twaddle. Anyone who starts a minor product announcement with the words "Today, we’re excited to announce..." deserves their own circle of hell. Most of that first section is redundant preamble and should be rewritten as bullet points.
In fact the whole thing could do with being rewritten in a simple bullet-pointed skimmable form. That way I wouldn't have to come to HN comments to find out what the hell it's all about. Is it too early in the day for a drink?
For me this is in no way acceptable and it seems to be a vicious attempt to sneak in some extra profit without having the customer noticing this upfront. Shure, these harsh words seam like a big exaggeration, but I literally never ever hear or read something about traffic costs in Googles fancy blog posts, and I would make the assumption that only a fraction of the HN community is aware of this fact. 120 bucks for a sloppy TB of traffic is just way too high.