Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
End of Moore's Law: It's not just about physics (cnet.com)
60 points by davidiach on May 3, 2014 | hide | past | favorite | 61 comments


For a nice dose of doom and gloom, I quite like the dark silicon paper[1], which explores the limited use we'll get out of Moore's Law, even if it manages to continue (and as far as I understand it, transistors/$ has already flat lined so it's at least temporarily over).

Since we're no longer getting good power scaling out of shrinking, if Moore's Law keeps up, we're just essentially getting price discounts.

A nice thought exercise for me is what would computing look like if you could fab wafers for free, today. Sort of the ad absurdum take on Moore's Law continuing.

We'd have more memory, and a ridiculous amount of flash storage, and high end graphics cards would become cheaper (but not faster). Desktop CPUs might look more like server CPUs, but single core performance wouldn't change one bit.

We'd probably see heterogeneous computing a whole lot more. Xeon Phi like processors on package next to Haswell cores at the very least.

Probably computers would start to have FPGAs in them, as well as large amounts of niche circuitry to compute hashes, encode videos.

We might see computation embedded inside memory (I think Hynix did this recently but I couldn't find a link). Maybe memory that accelerates garbage collection in order to accelerate modern workloads.

So there are some interesting places to go, especially from a Hacker News perspective, but even if transistors became free, we still wouldn't see the rate of progress we did in the 80s or 90s (in terms of speed up to the tasks we're doing).

Of course, this is all near to medium term stuff. I personally believe we'll see that rate of progress again when we move off of silicon, to a different computing paradigm, or (most likely) both - I can't believe we'll inch our way to science fiction computing, or that science fiction computing won't be possible at all.

[1] Dark Silicon and the End of Multicore Scaling (ftp://ftp.cs.utexas.edu/pub/dburger/papers/ISCA11.pdf)


Oh, if transistors became free we could realize a huge boost in performance. We could basically take the cream of the crop and use only 9-sigma parts. (The range of part quality is pretty wide, cost/rarity is the challenge)

We could also abandon yield-oriented design constraints, and a few other things.

Now this probably isn't something that would keep doubling every year, but there would be an initial large bump.


Not so fast! I must object to the loophole you claim to have found.

I said the wafers would be free; testing ~100 trillion of them to find the best costs extra :)

I never even thought of that angle - do you know how much better such a part would realistically be? I thought binning was mostly a matter of defect reduction (where some defects are slight). But at that extreme level, it could be a different game altogether.


People keep demonstrating 1THz transistors on the lab. So, I'd say that such part would get up to a around 100GHz clock (a few switchings per pipeline stage), with our usual architectures.

And it would use an insane amount of power.


Thanks for linking to that paper. It was interesting reading, especially since it makes predictions for 32nm and 22nm nodes. I wanted to judge the accuracy of those predictions, but I think I found an issue with their methodology.

Figure 2(b) shows SPEC CPU2006 scores for then-current 45nm CPUs. For example, the Xeon X3440 is a Nehalem microarchitecture quad-core CPU with a 95W TDP.[1] It seems to benchmark around 30 points.[2] That means it's the lower-left of the two gray triangles in figure 2(b). The upper-right triangle is an Extreme Edition i7. Figures 2(e) and 2(f) show ITRS and conservative predictions for then-future feature sizes.

The SPEC CPU2006 benchmark is single-threaded, making it hard to calculate performance-per-watt for multi-core CPUs. The researchers knew this. In the paper, they mention budgeting power for caches, then they drop this bombshell:

> In the case of a multicore CPU, the remainder of the chip power budget is divided by the number of cores, resulting in the power budget allocated to a single core.

That's a mistake. In a single-core benchmark, one core will run faster than base frequency, burning much more than its typical fraction of TDP. Some CPUs will run a single core at twice the base frequency.

To pick a ridiculous example: The Xeon E5-2695 v2 has 12 cores and a 115W TDP.[3] It gets 55 points in the SPEC benchmark.[4] After doing the math for subtracting caches and dividing remaining TDP by the number of cores, the performance per core is even better than the ITRS estimates for 22nm.

Comparing their predictions with modern quad-core CPUs, it does look like the numbers are closer to their conservative estimates than the ITRS's. Still, that oversight is pretty bad, and it makes me wonder what other mistakes are in the paper.

1. http://ark.intel.com/products/42928/Intel-Xeon-Processor-X34...

2. http://www.spec.org/cgi-bin/osgresults?conf=cpu2006&op=fetch...

3. http://ark.intel.com/products/75281/Intel-Xeon-Processor-E5-...

4. http://spec.org/cgi-bin/osgresults?conf=cpu2006&op=fetch&fie...


How much of "the tasks we're doing" are repetetive? Someone in the 4TB SSD discussion said MicroSD cards at the moment could fit 50TB of storage in a standard laptop hard drive form factor - if anyone would pay for it.

If transistors became free, how much could they be used for not-computation/storage? How much would your cheaper "(but not faster)" video card be able to get a speed boost by having terabytes of pre-computed things in cache?

e.g. the difference between brute force password cracking and rainbow tables, or basic arithmetic and log tables.


I wouldn't underestimate our ability to through raw power at a problem. Imagine if we can give each process its own cpu/ram (with some mechanism of accessing a segment of another process's ram if given permision). Not only would this be good for security, responsiveness, general parallelism, but it would also reduce a lot of the complexity we have in emulating concurrency in software.

Of course this also assumes that power is free.


Did you mean to throw raw power at a problem?


Dark silicon or even CPUs in general are orthogonal to Moore's law.

There are other domains that can eat up as many and as small of transistors are you can throw at them. FPGAs are a great example.

New fabrication technology will allow for new paradigms that we perhaps cannot sufficiently even predict right now.


I have, perhaps, a different perspective. I have been spending a lot of my income on computer hardware and power for the last decade or so; and being involved in spending other people's money on same for another half-decade back.

My take? When Intel thinks they are ahead, compute doesn't get cheaper.

In the DDR2 days, If you were on Intel, you had the choice between the stunningly inefficient and expensive rambus ram, or a stunningly shitty memory buss with (not very many) low-power ddr2 modules.

At the time, the AMD HyperTransport system was absolutely beautiful. Even on cheap boards, you could get more than 2x the low-power ddr2 ram modules per CPU that intel could. (at the time, lower-density modules were dramatically cheaper, per gigabyte, than higher-density modules) It worked way better when you had multiple CPUs, too.

Then ddr3 came, and Intel came up with their QPI systems, which were awesome. AMD came back with a competently built ddr3 platform, too; the G34 systems were a huge upgrade from the mcp55 chipset socket F platform.

But the benchmarks came out in Intel's favor, even when AMD had twice the cores. I mean, you could argue that the AMD systems had advantages in some limited situations, but they had lost the dramatic advantage.

As far as I can tell, intel has been largely resting on their laurels, price-wise. The E5-2620 is better than, but really not radically better than the E5520. Now, some of the higher-end E5s are pretty nice, but they are priced accordingly.

Until Intel gets some real competition again, we have to pay for our performance gains.

So yeah, really, until AMD gets their legs back under them, and I hope the A1100[1] will do it, I don't expect dramatic performance per dollar gains from Intel.

[1]http://www.amd.com/en-us/press-releases/Pages/amd-to-acceler...


I agree. HyperTransport was stunning when it came out. Revolutionary, even. Ditto for AMD64, still the standard as we speak.

I'm not sure, however, if this is due to Intel resting on their laurels vs an entire Intel generation being shown the door because of epic (excuse the pun) fuckups. With the (largely wasted) effort expended on Itanium / Itanium 2 / EPIC / IXP / Netburst / etc, no wonder other vendors excelled. The MHz wars took a horrible toll on Intel for mainstream x86 with things like Prescott, with its 31 stage (!!) pipeline.

Stalls on Prescott were horrific for performance. On IXP, microcode screwups (often due to explicit caching) were horrific for customers. On Itanium, everything was horrific. I doubt we will ever know exactly how much these escapades cost humanity. On the other hand, maybe we're all richer for lessons learned.

It seems Intel is not interested in screwing up so badly anymore, so I think it's the competitors' turn to sweat. Intel still has a long way to go in recapturing territory it could have already had; ARM and MIPS have come a long way in the phone/server and NPU/packet processing space respectively, and they don't look as easily dislodged as AMD...


It's even worse than you enumerate: echoing lsc and his mention of "stunningly inefficient and expensive rambus ram", the highest level architects at Intel were petrified by DRAM size, I think it was, concerns, and ordered some stunningly stupid things that at least in some cases the engineers under them knew wouldn't work. Intel had not one, but two *1 million part recalls", one of which was for motherboards just before OEMs were going to start shipping.

And AMD, which only occasionally manages this, did everything right for a short period of time with their K8 microarchitecture (P6 style, 64 bits, HyperTransport plus on-chip memory controller) while Intel was screwing up so much.

I wonder how history would have gone if they hadn't then taken 2.5 years to start delivering the successor K10 microarchitecture, and another half a year to deliver one that didn't have a screwed up TLB. Intel is not the sort of adversary you can just give three years to get its act together, especially with their historical manufacturing prowess keeping them at least a process node ahead of you (and pretty much everyone else?).


It's always been economics. EUV already works. E-beam lithography already works. Carbon nanotube transistors already work. III-V transistors already work. It's just that none of these technologies work as cheaply as double patterned silicon.


The fact the we can't manufacture transistors cheaper doesn't mean we won't see big jumps in compute value.We could seem improvements from fpga's(500x max),approximate computing(100x-1000x), packaging(3x-10x) or the radical option of analog computing(6-9 orders of magnitude for brain simulation etc).

Also, there's still huge amount of value to unlock from software, interfaces and the rest.Just look at the iphone where most of the value comes from a great touch interface and an app store. Moore's law is just a bonus.


500x performance improvement from using fpgas? Where do you get this number?


Asics can max out at about 1000x , and zvi orbach has a recent column talking about how to make an fpga with 50% efficiency of asic.

http://www.eetimes.com/author.asp?doc_id=1322021


I must have misunderstood, I thought you were saying FPGA's will surpass CPU's 500-fold.


And it will be economics that breaks through the cost challenges. If you can do something that nobody else can, it's a nice spot to be, you can even license out the actual implementation.

There are companies like IBM that take a more holistic approach to R&D. I'd expect someone like them to prevail in adaptation of new techniques or even paradigm shift.


Could these eventually be as cheap as double patterned silicon?


Yes, potentially. There are strong research efforts underway for all of them.


We still have a magnitude order or two of computing power that we can squeeze out iff the semiconductor density/price stops.

We don't implement many optimization possibilities in each hw/sw layer because the "layer below" keeps changing all the time and we need to keep compatibility. Once we could say "this is it, this layer is as good as it will ever get", then (and not a day before) you can start to re-architecture everything above it to maximize performance by throwing away flexibility that won't be needed anymore.

E.g., instead of transistors being spent on translating x86 to the underlying microcode and cycles being spent on translating JVM/CLR bytecode (or javascript) to x86, we'd be able to define a single standard and adapt both processors and compilers to that. You can't break compatibility at every technological change - but if you have a reason to believe that it will finally be stable (which hasn't ever happened yet) then it does make sense to make a single final switch that disregards all compatibility and legacy issues - even if the benefits are small, they accumulate for each such layer and you only have to do it once.


I've been hearing about the end of Moore's Law ever since I was a kid. I remember smart people citing wave lengths and economics convincing me that there as no way we'd go beyond 300nm. Oh yes, and that would mean the end of x86 as well (since intel wouldn't have the process advantage to compensate for the less efficient CISC).

Well, maybe they are right this time, who knows, but I'm way more skeptical.


Moore's Law broke a while ago. Note the past tense.

nVidia couldn't get their economics at 28nm. RAM cells are now more expensive in 20nm than 28nm. There are lots of other examples.

Remember: the strong form of Moore's law was that transistors get cheaper ever 12-18 months. The original formulation of Moore's law was always about economics, not technology.

The problem is: nobody knows what to do.

You now need clever circuit design and humans who can squeeze another 15-20% out of existing technologies. Unfortunately, all the humans who knew how to do custom VLSI design are dead, retired, or doing something else since custom VLSI was fool's errand for so long (see DEC, IBM, Silicon Graphics, etc.).


> You now need clever circuit design and humans who can squeeze another 15-20% out of existing technologies. Unfortunately, all the humans who knew how to do custom VLSI design are dead, retired, or doing something else since custom VLSI was fool's errand for so long (see DEC, IBM, Silicon Graphics, etc.).

Broadwell is a custom-designed part, and it's scaled extremely well from 22nm to 14nm. There have been yield issues going to 14nm, but the biggest hurdles have been solved, with a lot of help from the design team. I guess Intel must have hired all the people you think are dead or retired.


It really helps to specify what you mean by custom. There is hand-built vs. tool-built, there is full-custom vs. semi-custom, etc...

I for one doubt Broadwell is hand-built, hand-routed, full-custom. Which is what your parent was probably talking about, as they mention DEC.


Unfortunately, all the humans who knew how to do custom VLSI design are dead, retired, or doing something else

Well, sort of. The ol' DEC guys are still out there, many at semiconductor companies.

Anyway, design is not the biggest lever right now, and it stabilizes last. When architecture & process slow down, design will probably see increased focus as you finally have more time for squeezing out 20% extra speed- and without everything being tossed the next go-round.


I'm not sure you can extrapolate that from one company's use of one foundry's process nodes. Intel at least seems to be doing well with their 22 nm process node.


Yep, they seem to be doing well, but they are almost an entire doubling time later if you compare the two previous doublings with to Moore's law.


This might in part be explained by their moving to nonplanar FinFET transistors. And in practice their geometry doesn't quite match the idealized drawings: http://www.eetimes.com/document.asp?doc_id=1261761


Who knows. How can anybody tell if 28nm if more cost effective than 28 nm when it isn't sold in the market as a fab ?


The difference this time is we're already there. Sure, the only physical limit to computing I'm willing to give with current research is Bremmermann's limit [1], but it doesn't mean we'll ever reach it.

Our brain is awesome with a merely few 10's of billions of elements. We sure can do better than what we have with cpus with a few billions of transistors -- the problem must be we don't have a clue of what to do with them. A hint to that is even the most powerful supercomputers with 100's of trillions of transistors still can't do better than our brain in some tasks.

[1] http://en.wikipedia.org/wiki/Bremermann%27s_limit


A single transistor can't do the job of a neuron and 500 synapses.


A couple transistors may be enough to do the job of a synapse - and it would do that "job" many orders of magnitude faster.

A modern CPU has some 100000 times less transistors than we have synapses, but it's effective frequency is at at least 1000000 faster than synapse firing rates. So on "volume" ratio our CPUs are already more powerful than our brains, we just don't know how to wire those transistors properly.

It may well be that the power of such a neural network is nonlinear to its size, and a tenfold smaller/tenfold faster network is much weaker. But we do already have supercomputers that have a similar number of transistors than our brains have synapses, while still having the millionfold speed advantage.

Our limit is in our [lack of] ability to build brains. If we knew how, then our current silicon tech would be enough to run ubiquitous stronger-than-human AIs - but we don't; and "bruteforcing" a brain emulation without a better understanding is so inefficient that it requires currently unfeasible resources.


Synapses turned out to be more complicated than we thought. Single cell organisms did all of their sensing and data processing through their membranes. Guess what synapses evolved from?


You're right that people cried wolf before and were wrong. However, that doesn't mean they're wrong this time too. There are many good reasons to think that Moore's Law is ending. Next-gen lithography is just getting too expensive for scaling to be economically feasible, let alone technically feasible.


This is an economically invalid argument without more detail. The only way a lithography process can be too expensive is if it has an extremely skewed lag time (i.e. 6 months prep per line, with low throughput) or if the consumables of the process are very expensive or very wasted.

None of the possible future processes has any of these constraints (with the exception of perhaps E-beam lithography, but it's not a hard constraint).

If you can build a continuous fabrication line with a process, then no matter how much it costs, the capital-investment eventually diminishes to a tiny fraction of the cost of each chip, compared to consumables.

Current lithography processes were "too expensive" previously. But they've been running continuously since they were built, and so are again a tiny fraction of the cost of an actual processor today.


Another factor to consider is discounting. Even if a technology has zero marginal cost, it's not necessarily profitable when you take opportunity cost and discounting into account.


Moore's Law is already mostly dead (for Intel). Look how little they add in performance with each new generation. At best, they get some significant gains for ultra-specific instructions by adding accelerators to their chips (say for computational photography or whatever), but that's something everyone else can do, too, and at that point we're not talking about CPU performance anyway.


One thing I can predict with complete confidence is that we have no clue what the path forward for improving computer processing power will be from 2022. It's fun to speculate but that's a long time for new technologies, approaches, and architectures to take hold. Intel isn't the only company with an interest in improving the state of the art here.


I think maybe we might have been mis-reading Moore's Law for years, or rather, Moore himself might have mis-stated it.

The radical changing reality that it describes is the number of transistors a person could economically employ at any one time to perform work on their behalf.

So yes, physics and economics might provide us with a limit (or increasing slope of difficulty) for the construction of single chips, the practical effects of the Law continue unabated, at least as far as I can see. The average person continues to be able to employ more and more electronics to perform work for them. This is increasing geometrically.

I'm not trying to dismiss either this DARPA guy or Moore, just point out that the specific details of Moore's Law may not be as important as we make them out to be.

In my mind, the big obstacle we have now to continued growth is small-system, imperative thinking. Systems of the future will be massively parallel. I have no idea how long it will take the IT industry to truly transition, but that's the next big hurdle, not counting atoms inside a switch.


Yea I remember reading that DWAVE could tackle classical computing problems at 20 petaflops. All of Google today is 20 petaflops, across many cartoon colored datacenters.

If you can produce small numbers of expensive computers that are more efficient than millions of cheap computers cobbled together, you are still advancing how much computational power each person has access to.

But a significant problem with this approach is most real time video games aren't practical when the computer is far away.


Latency isn't the only issue— do we want to be building a world where people are constantly trusting third parties to process their data privately and correctly?

Certainly there are also cryptographic workarounds to those problems but their overheads are so high as to make the outsourced computation somewhat pointless.


Except that, for most classes of problems, creating a software solutions that uses a massively-parallel system efficiently is much more difficult - and thus more expensive. This may be the real economic limit to using more transistors.



Worst formatting I've ever seen. Anyway:

Colwell said that for the Defense Department, he uses the year 2020 and 7 nanometers as the "last process technology node." But he adds, "In reality, I expect the industry to do whatever heavy lifting is needed to push to 5nm, even if 5nm doesn't offer much advantage over 7, and that moves the earliest end to 2022. I think the end comes right around those nodes."

Let's assume that in 10 years time everyone agrees we have hit the wall on how small we can go. What then? Is there any reason to believe that popular architectures of today are necessarily optimal? I'm curious to hear people's ideas about what we do instead of shrinking dies.

My personal guess is that we move towards massively parallel systems with large numbers of low-power cores, but bare-metal programming completely and work on developing smarter and smarter compilers to take advantage of parallelism. My personal hope is that we find some sort of cold optical switching technology that lets us build ridiculously fast computers that look like glowing crystal cubes. Of course, I have no idea how that would work, or I'd be out pitching it XD


Have you heard of photonics crystals?

http://optoelectronics.eecs.berkeley.edu/eliy_SCIAM.pdf

But I think that since the features of the materials are on the same order of magnitude as the wave lengths of light you are trying to switch, this actually makes things bigger. But still really cool.


I had heard of it and I know that has been some progress with optoelectronics recently, but hadn't looked into it in this detail - thanks. Most of his recent work seems focused (groan) on things like solar cell efficiency and suchlike rather than my fanciful goal of a completely optical processor, but I need to read more.

Maybe I should ask him to lunch, since I live within walking distance of UC Berkeley and don't really take advantage of it :-/


There are several other structures that behave like transistors, there is no reason to believe that silicon is the optimum computer building material.


Yeah, I'm sure someone had similar doom and gloom to say about vacuum tubes.

The thing is, you just don't know where the next technological revolution will come from. Yes, we've about hit the limit for silicon, and we may very well stagnate for another decade because of it, but there's always going to be someone scrappy enough to try what the big slow incumbents won't.


There are gains yet to be realized besides reducing the size of the process. For example, reversible (a.k.a. isentropic or adiabatic) computing offers a way to reduce heat generation, which might combine with 3D construction in interesting ways. New ways of designing chips might allow progress to continue, but they're hard and risky. They're not terribly attractive as long as shrinking the process offers predictable advances and remains economically feasible.

Still, it's worth taking a moment to appreciate how crazy just getting under 10 nm is. The wavelength of light that is visible to the human eye starts at around 380 nm. Looking at a 10 nm chip with violet light would be like trying to navigate your house by sonar using a sub-woofer as your emitter!


One thing that seems to be glossed over in his argument is the impending move to 450mm wafer technology. That should allow Intel and others to continue shrinking at reasonable cost.


> it's time to start planning for the end of Moore's Law

Do we? Because all current software runs on current hardware. Even if in the next ten years we only get another 1% increase in speed, then the current software will run at the current speed (something which we are all perfectly fine with right now). I think it's indeed a doom and gloom article.


I want it. Python is 100x slower than C. I want a language that doesn't need to worry about performance. I want computers optimized for humans, not them. I want computers that can model the weather, the ocean. I want search algorithms that optimize car engines, electric plants. a million x isn't nearly good enough.


smaller devices will provide the economic drive, just as they did when 14" HDDs had enough performance.


Either this article or the guy's speech perhaps both are pretty lame. No supporting facts are presented as to why he thinks the 7nm generation will be the last frontier. He just basically states the obvious that if there is no profit motive or the technology isn't there then it won't happen. No kidding.

The whole article reads like an appeal to authority.

Are there any factual reasons to believe this time is actually different than the past 35 years?


> The whole article reads like an appeal to authority.

You might be right. But Colwell isn't just a country rube who hitched a ride into town on the turnip truck. He's been there, done that, literally made billions of dollars in profit for Intel. I reiterate:

   Bob was the chief architect for a number of design
   teams who, *LITERALLY*, no exaggeration, generated
   billions of dollars in profit for Intel.
You might have heard of a few of the products he architected. [1]

"Authority" is frequently wrong. But this guy certainly deserves a little more respect than your flippant dismissal.

[1] http://en.wikipedia.org/wiki/Bob_Colwell


Interesting, I wonder why it doesn't say that in the article which would lend a lot more credability to it. It just talks about his position with DARPA


> No supporting facts are presented as to why he thinks the 7nm generation will be the last frontier.

I don't know about his reasons, but charge carriers at doped silicon have orbitals that are typically around 10nm in diameter. (They vary with dopant density, temperature, etc, and don't have clearly determined boundaries, thus there is no hard number.)


For decades I have heard that 11nm would be the end of Moore's law as we know it. This is proving to be true. CPUs have been frozen at 1 to 4GHz for nearly a decade, with the latest advances going to power savings. Currently at 22nm, the next step is 14, and then 11. Perhaps they can eek out one more jump to 7nm or 5nm, but I expect that will barely be worth the effort and thus will drag out for at least decade itself.

But Moore fans need not worry. There is a clear next step, and I am wondering why no one is talking about it: Optical Interconnects. Connecting chips and circuits within chips with optical channels should allow plenty of room for speeding up processors and reducing power requirements well into the mid-21st century.


Clock Speed is not relevant. In that "decade" our single thread performance has increased a fair amount, but we now have an additional 4-8 CPUs in the same amount of space.

An Ivybridge will very handily beat a Northwood based system even in single threaded performance.


Dont get me wrong, I love having all of my 12 cores, but I dont really count that as improved speed. More (no pun intended) importantly the rule was every 18 months if I'm not mistaken, not 10 years. Lets double it and add a little bit, 3 years ago you pretty much could have bought the same machine.

in the last 3 years, processors have been about 3.4ghz, 3 years before that it was still around 3.1ish. I personally haven't noticed any improved performance on single threads. they pretty much do things at the same completion time.

I have a 12 core 4.5 ghz machine (3.4 overclocked), and I ran a compute intensive process on a 2010 machine that was 8 cores 3.4ghz and they finished about 5 minutes later at about the exact same time.

i'm not saying all things are equal and that things havent improved, just that i would have expected a lot more from something that was supposed to double (and get cheaper) every 18 months after 4 years




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: