Back to blog

Why hardware flaws are so bad

See blog

Readers' comments

Reader comments are listed below. Comments are currently closed and new comments are no longer being accepted.

Sort:

tgmoog

I am by far an expert but from my basic electronics tech courses I believe chips are more base on semi conductors although transistors are still around as semi conductors are a little better at switching and control ... Just brings back fond memories of messing around with a Motorola chip which taught me that one can do a lot of things with just one chip and a simple BASIC program and a cheap computer back around 1996 and some knowledge of converting written specs to machine code or simple OFF-ON switching ... with this switch I can mess around with variable speed drives like pumps and fans and fill some tank as fast as possible to the maximum with no spillage ... This is one chip that took some time to design and test and if there is some way to manipulate this chip it is quite understandable ... One should try to go through the various calculations of electronics to send signals through a transistor or semi conductor let alone a chip and then at least for me one can understand the system ... It is found now and the hardware will be fixed to stop Meltdown and Spectre ... No, fixing the hardware will not be to recall everybody with a computer which would be rather quite silly but buying new hardware just like Y2K ... Yes, it will be expensive just like Y2K was but it will be done and the economy will be boosted and many will benefit ... As far as Spectre I concerned, where was James Bond when you needed him ??? ......................

Chris Jankowski

The problems demonstrated by the Meltdown and Spectre security exploits are real and the root cause is in a deficient hardware implementation of a technique known as speculative execution. This is indisputable and well documented.

However, the practical repercussions will not be as severe as some of the alarmist articles in the press pronounce.

First of all, the problems belong to the non-functional category. This means that all the calculations that the computers do are still deterministic and do exactly what the software tells them to do. We can still calculate orbits of the planets and balances of bank accounts exactly. Functionally, all these computers and phones are working as they should.

The problems, however, create security risks of unauthorised access to information. Like most data security risks, the problems can be mitigated in a variety of ways. Certainly, at least the Meltdown exploit can be disabled by a software change in the operating systems. In fact, the first versions of these patches are already being rolled out by OS vendors. Similar approach can be taken against the Spectre exploit. Then one can always tighten access to the compute resources.

Eventually, the root cause of the problems will be eliminated in the future processors, as they are designed, tested and enter production. Of course, it will take long time before the old computers will be replaced by new ones. This process will take about 10 years, before great majority of today's computers, phones, etc. will be replaced by new designs. This will happen naturally. We certainly do not need to replace the current systems because they are open to these exploits.

Oreg

One niggle:
"Because chips are physical objects, testing them is comparatively slow and difficult."
On the contrary, verifying physical chips is much faster than their software representation. Unfortunately, producing hardware is very expensive and physical chips cannot be modified to fix bugs. That's why vendors try to catch as many bugs as possible before producing silicon, using virtual models.

blue asgard in reply to Oreg

Uncommitted logic: an uncommitted logic array in which there are fusible links of one kind or another. People more usually use FPGAs, a particular form of uncommitted logic, these days.
.
If you are saying that any logic system (hardware or software) which is capable of reconstructing its internal state indefinitely can enumerate all possible internal states then I cannot agree. On that basis you could tell whether an arbitrary Turing machine when presented with an arbitrary tape would or would not terminate. The TM has limited registers (basically recording state and the limited actions which can be taken for a given combination of state and input) and there is that fundamental result, the Turing Machine Halting Problem, - related to Gödel's theorem - which says that that is impossible. You really cannot decide in all cases whether the TM will halt or not. Any more than you can predict the outcome of providing arbitrary inputs to any logic structure.
.
Logic is sometimes defined using a functional language, which has no concept of state. By definition this is a subset of all possible logic structures, but at least logic so defined is 'safe' and provably correct. At least until you try to put it onto a chip using conventional structures. You then face a placement problem, for which there are no automated solutions, so it is done by hand (all manner of partial solutions are available but they nearly always waste too much 'silicon real estate'). When you do that you step outside the proven logic structure and then there are no guarantees (you've just thrown them away). This is the territory in which the Halting Problem and Gödel lurk.

Oreg in reply to blue asgard

Maybe I'm missing something, but isn't Gödel's Theorem about natural numbers whereas computers are integer-based, even digital? In theory you can prove anything about them by complete enumeration or formal verification.

The problem is the complexity of the computer's logic which makes complete verification infeasible in practice, as the article rightfully points out. Hence, only small subsets of logic are formally verified whereas the rest is just probed with constrained-random approaches. That's what makes escapes (undetected bugs) possible. That and the human factor.

blue asgard in reply to Oreg

If you look at my reply to guest-seealmn you will see that it is about hardware. We are used to chips which provide fixed functions but it is quite possible to have self-modifying logic, for example memristors, which for the most part aren't deployed yet.
.
The point about my original post was that there are fundamental limit theorems which say that we can never guarantee that hardware will function as specified. In that it is about logic responding the way it is supposed to and only the way it is supposed to no matter what inputs it is provided with. In short you cannot guarantee completeness and consistency at the same time, or, even if the logic responds the way it is supposed to for the inputs it is anticipated to see, that it isn't presented with unexpected inputs or that it responds correctly to those unexpected inputs. You cannot enumerate these possibilities if they are infinite (i.e. the logic has memory and the capability of looping). The hardware/software issue is also a red herring because the hardware provided might be uncommitted logic and software comes along and commits it. Where is the barrier? The reasoning is the same.

Oreg in reply to blue asgard

Now you're talking about software but the article is about hardware where your points don't apply. It is very finite and not self-evolving. Again, any hardware truth is provable by complete enumeration.

Oreg in reply to Oreg

Hm, I didn't get that quite right. I think the point is that the range of natural numbers goes to infinity whereas the range represented in a computer is severely limited with a fixed upper bound. That seems a better guess at why Gödel doesn't apply here. The rest of my original post still stands.

Oreg in reply to blue asgard

Storage is an interesting point. In a processor core, storage is rather limited, basically registers. The possible values of a register are finite and can be enumerated, resulting in finite register values at the other end of a logic cone. There are no loops between two registers. Tiling verification like this is a common approach in practice (as is verifying behavior for unexpected inputs). With memory and looping thus eliminated, Gödel doesn't apply.
.
What do you mean by "uncommitted logic"?

Oreg in reply to blue asgard

To qualify as correct a processor only needs to guarantee to execute all programs perfectly---it does not have to (and cannot) guarantee any properties of those programs. The Turing argument is about proving properties of the program, not properties of the Turing machine. Placement doesn't change the logic nor any fundamental property and, therefore, is orthogonal to functional correctness. None of this contradicts enumerating the states of a processor core.

(Interesting that I've never come across the term "uncommitted logic." I guess it's what the community calls "configurable logic" these days.)

blue asgard

There are deeper issues here than the article suggests. While the identified problems might indeed be fixed in future generations of chips there is no guarantee that the fixes won't expose new vulnerabilities or that there are no other independent vulnerabilities still to be found.
.
The present situation is not so disastrous. Although Moore's law seems to be slowing down - the present hiatus in single-core speeds is actually due to heat dissipation problems, not inherent gate switching speeds - there are still performance hikes waiting in the wings while ancillary technologies like the introduction of new materials into gate manufacture, the increase in intensity of extreme UV lamps to usable levels etc. all come on stream. And there are new technologies beyond that, like electron beam lithography (too slow, so too expensive to-day) which could give us even smaller, so even faster, gates, 3D lithography, atomic-level gates, spintronics etc. etc. So, sooner rather than later, we'll all be upgrading our chips which will all have higher performance and the exposed flaws fixed. In the meantime software fixes which hobble chip performance will persuade us to upgrade sooner rather than later anyway. Now is not the time to sell Intel, AMD etc. shares if you have them, as business is now going to be better than it might otherwise have been.
.
However the real villain of the piece is Gödel's Theorem, and its computational handmaiden, the Turing Machine Halting Problem. Both are limit theorems and are presented as decision problems. It is possible to prove, in mathematics, that certain properties are undecidable. The Turing Machine Halting problem says that there is no decision process which will,tell you whether or not an 'arbitrary' Turing Machine will stop calculating when presented with an 'arbitrary' input tape. (Turing Machines, like computers can loop round and in the case of a TM can rewind or overwrite the input tape, thus looping around and generally behaving in a way which makes termination unlikely - but not certain either way) That doesn't mean you can never tell (often it's obvious), it just means you cannot always tell. There will always be that case which will get through any decision procedure.
.
Gödel's Theorem is worse, so much so that mathematicians are still not clear about its full implications to-day although the theorem was first presented back in 1931. In essence it says that two essential properties of an algebraic system (of which computer arithmetic and software are a proper subset), namely consistency and completeness cannot both be demonstrated at the same time. If an algebra is provably consistent its completeness is not provable, and vice-versa. Consistency is defined in terms of decisions, i.e. 'Theorem T is true', written as just T. So a consistent algebra would be one for which, for all T deducible from the axioms and theorems of the algebraic system, "not (T and not T)" is always true. A complete algebra would be one in which all such T are deducible only from the axioms and theorems of the system. In short you cannot guarantee that a given piece of computational hardware (and processors have finite-state machines at their heart, and FSMs are simplified Turing Machines) will do what they are designed to do - and no other, particularly in unanticipated situations. That is true both in hardware and in software, and that, in essence, allows the bad guys a way in -- somewhere.
.
Well, evolution has faced such a situation before, notably in respect of the constant duel between single-celled organisms and multi-celled organisms, like us for instance. We are plagued by bacteria and viruses (i.e. entities outside the bacterial 'axiom system') and we have evolved sophisticated immune systems which can respond quickly enough to novel challenges. And novel challenges we meet practically every day, as a bacterial strain can evolve in as little as 24 hours while it takes us hundreds, if not thousands, of years to do so. We will need to engineer a similar strategy if we are going to hold, rather than stop, the ever-increasing waves of malware attacks which follow on from creating the natural ecosystem for them to evolve and be passed on. And learn to live with it which, up till now, we haven't.

blue asgard in reply to guest-seealmn

The essence of my point is that there are fundamental limitations preventing complete verification of functions either in hardware or software. We have a similar situation in nature in the battle between eukaryotes (creatures like us) and prokaryotes (bugs, not of the software kind) and the defence there is to have an adaptive system which can deal with unknown threats.
.
In computer systems terms this becomes the engineering of reliable systems out of unreliable components, whether software or hardware. Imagine firing a sophisticated computer into deep space, where it is at risk from high energy particles ploughing through it. All those NASA probes manage to survive years of bombardment not to mention really cold temperatures, yet they keep going. Usually they work in terms of redundant components and the ability to reconfigure pathways around faulty ones. That assumes that the 'virtual' architecture is sound, so isn't a solution on the ground. But if the virtual architecture was encoded somehow and repeatedly checked against performance then both the virtual architecture could be amended, and any rerouting needed, effected at the same time. The nearest to that you can get is an FPGA plus software specification of function but amending the virtual architecture in real time could be an interesting challenge. That would necessitate designing FPGAs so they could be updated in real time and then doing the update without screwing all the signals in transit.
.
If all our computers were like that then hardware verification would be easy, lots and lots of simple cells. But then you'd have to have layers of software to make it as convenient to use as modern computers and that would have to be endlessly self-checking and self-correcting. I suspect it could be done by multi-level architectures where each level was sufficiently cellular to be verifiable below the Gödel limit but there are solutions now which are vulnerable to being eviscerated by the dreaded theorem.

guest-seealmn in reply to blue asgard

Hi, nice post. Could you elaborate though where you see the (negative) practical/engineering implications of Gödel's theorem and the halting problem? According to my understanding, while there are from time to time unintended infinity loops (does not halt) in software, it appears to be not a big issue in modern multi-tasking operating systems (kill the task and start again).

blue asgard in reply to Oreg

Somehow my reply to you failed to post. I suggest reading my reply to guest-seealmn, it's sort-of relevant.
.
But finite vs infinite arithmetic is just a red herring. You can simulate any precision you like if you are prepared to program it. All computers do is help you with what most people want. Natural numbers are just 1, 2, etc anyway. Both natural numbers and real numbers map onto infinite sets. The real issue is complexity. If it takes you a finite time to check and in the end the only solution is to simulate, how long do you wait to see if the simulation is going to terminate? Imagine a self-evolving program. Will it evolve to the function you want, or is it doomed to pursue ever-more unlikely possible solutions? Gödel's Theorem says you can never tell.

Zouf in reply to guest-aalljlwm

Intel is often compared with a lawyer office with a small tech division tacked on the side... One can safely assume that they will not do a mass replace. Their position so far is that operating system fixes correct the flaw (which, by the way, is really more a vulnerability than an outright bug), that the performance penalties of these fixes are "load dependend" (i.e. if there's a large penalty it's your fault, not theirs), and that therefore everything is fine.
.
This said, everyone has been operating equipment (laptops, phones, servers, name it) with these vulnerabilities for a few years, and the planet has not stopped turning. On the other hand, once you expose this sort of vulnerability, you create a challenge for hackers to develop an exploit.

guest-aalljlwm

How Intel plans to replace flawed chips? Or does it somehow plan to push the problem on customers? This article leaves worrying ambiguity whether electronic producers are responsible for safety of their products, or they sell everything like a street hawker sells apple pies - you bought it, it is your problem now.

2MNnH4s4w5

Over three hours since posting and the mods can't get rid of idiotic rambling posts such as "guest-aalinine"

guest-aalleisl

In the subject of hacking, it strikes me why law enforcement agencies do not turn the table on hackers and hunt them and their money.
Hackers have the same vulnerabilities as institutions they attack, only bigger: they establish an internet connection to the attacked computer, which is traceable, they use hardware and software, which is itself vulnerable and findable, they must provide an account number to wire the stolen money, which is by definition match-able to some financial institution in the real world.
Easiest way to catch hackers would be to create lots of traps, bogus bank accounts, bogus company emails and bogus internet servers, and catch anyone hacking them.

CaptainRon

If its going to take years to fix they may as well not bother. By the time its done those chips will be outdated. Better to make sure the next generation of chips is not vulnerable.