I specifically worded this to be about money not brains. Most readers here can probably imagine how to implement a bounded time service. Most readers here also cannot afford to operate one. That is the point. Operating software reliably at large scale happens to be very expensive. 24x7 coverage with a short time-to-repair costs at a minimum several million dollars per year.
24x7 coverage with a short time-to-repair costs at
a minimum several million dollars per year.
Interesting - what are the constituents of that cost?
What sort of challenges do you face? Do you use PTP grandmaster clocks, or something else? How many sites, and how many clocks per site? Are the support issues mostly hardware failures, configuration problems, or something else? Is 24/7 support needed because the equipment lacks failover support, or is the failover support unreliable or insufficient?
You generally need at least 4-5 SREs for a high availability large (big 5) scale subsystem in a multinational corp just to cover all of the timezones and make sure you're not frantically calling everyone when someone goes on vacation or has to pick up their kid from the nurse. The salary plus benefits and overhead on that is easily in the millions.
I think it was meant that Google has such high costs. I read somewhere that Google operates two atomic clocks in each of its data centers, but I can't find a source for it right now, just this: https://www.wired.com/2012/11/google-spanner-time/