Michelson Is Turing-complete: What Does It Mean And Why It Matters
A Turing-complete programming language is a language on which one can realise any calculable function, i.e. write literally any program. Most languages are Turing-complete, and it’s bad.
Why? In this post, we explain the danger of Turing-complete languages, why there are only a handful of Turing-incomplete ones, and how Tezos’s language, Michelson, is protected from the danger.
What is Turing-completeness?
Alan Turing was a British mathematician, the founding father of informatics, and the creator of the first modern computer. In 1936, he described an algorithm as a machine that can calculate any function through simple commands like addition. The abstraction in question is known as the Turing machine.
A Turing machine has a tape of infinite length broken down into cells containing numbers or commands: go to the cell to the right or to the left, read its value, or write a new number therein. The machine also can work with the rules of transition: thus, if a cell contains 0000, the machine will go three cells to the left and write the current number there. This set of rules is sufficient for writing any program.
A real-life example of this principle is the programming language aptly named BrainFuck: it has only eight commands: read, write, move the cell to the right / the left, add one to or subtract one from the current value, and open or close a while cycle.
Here’s how you write hello world in BrainFuck.
Computers work with letters in ASCII encoding: H is 72, e is 101, l is 108, etc. The program first writes 10 into cell 0 and then launches a cycle of 10 iterations, write 7 in cell 1, 10 in cell 2 etc. Then it sums up the missing numbers into each cell and produces the result: 72 101 108 108 111, which means “Hello” in ASCII.
Any modern-day computer works with the same set of commands as BrainFuck does. When the user presses “e” in a text document, the computer similarly sums up bits and uses cycles. Thanks to the high speed of operation, however, users don’t notice that.
Just like a computer, a program in any high-level language that displays a line on the screen reads zeros and ones from the data storage, makes an ANSII code out of them, and then says “Hello, world!” In fact, programming language commands launch enormous cycles of primitives that Turing had described back in 1936.
Why Turing-complete Languages Are Dangerous
The main danger lurks in the cycles. A developer cannot prove his or her program will necessarily end its operation in the final point instead of relaunching a cycle. The developer cannot also determine the program operation’s results with any input data. There can always be a combination of input arguments that will break down the primitive causing the program to execute something unintended.
Thus, GameBoy has an 8-bit processor that uses numbers from 0 to 255 as input arguments. Pokemon Yellow stores the player’s inventory and pokemons in the same form in a separate file. If you hoard a certain amount of objects, the processor will see an executable file instead of your Pikachu. Robert McIntur used this feature to hack the Pokemon Yellow, and write a programms to play midi files and show png images by only tapping GameBoy buttons in special sequence. Similarly, another developer nicknamed Masterjun hacked Super Mario World on a 16-bit console SNES. He connected 8 controllers to the unit and input an executable code of simple games like Snake and Pong into Mario game. In a somewhat similar fashion, players hacked into Starcraft by overflowing the memory buffer and entering binary code via in-game actions. As Starcraft works on full-fledged computers, so they managed to launch Super Mario Bros. in it as well as create a level editor for it.
The same principle can be employed by hackers who break into smart contracts on blockchains. They crack the algorithms with carefully selected arguments and then make the contract send the funds to a specified address. That’s how the famous The DAO heist occurred once. That’s what still happens these days.
The most dangerous feature of Turing-complete languages is the fact they’re everywhere. If there is a quite banal addition in a language, it is Turing-complete, and therefore, programs written in it are vulnerable to attacks.
There are Turing-incomplete languages like HQ9+ that has only four commands: display Hello world, its own source code, 99 beer bottles, and an endless timer. It can literally do nothing else but that.
Why Michelson Is Reliable Despite Its Turing-Completeness
The main feature of Michelson is its strict typification. Thanks to that, a hacker won’t be able to convince a smart contract to mistake a string for clear bytes or a number with a decimal for a natural one. The lack of strict typification is what helped the devs from the examples above break games via the interface and implant a foreign executable code. Smart contracts on Michelson can be also formally verified to make sure that the contract does what the developer intended with any input arguments.
Finally, there is a time limitation for execution. A Turing machine can calculate any function because it has infinite data storage and infinite time. In the blockchain, the memory volume and execution time are limited, so the hacker won’t be able to implant the foreign code even if they find a loophole.
Subscribe and never miss updates from the world of Tezos: