What is blockchain?
A blockchain is a special type of database. You may also have heard the term distributed ledger technology (or DLT) – in many cases, they're referring to the same thing. A blockchain has certain unique properties. There are rules about how data can be added, and once the data has been stored, it's virtually impossible to modify or delete it.
Data is added over time in structures called blocks. Each block is built on top of the last and includes a piece of information that links back to the previous one. By looking at the most up-to-date block, we can check that it has been created after the last. So if we continue all the way down the "chain," we'll reach our very first block – known as the genesis block.
To analogize, suppose that you have a spreadsheet with two columns. In the first cell of the first row, you put whatever data you want to hold.
The first cell's data is converted into a two-letter identifier, which will then be used as part of the next input. In this example, the two-letter identifier KP must be used to fill out the next cell in the second row (defKP). This means that if you change the first input data (abcAA), you'd get a different combination of letters in every other cell.
A database where each entry is linked to the last.
Looking at row 4 now, our most recent identifier is TH. Remember how we said you can't go back and remove or delete entries? That's because it would be easy for anyone to tell that it's been done, and they'd just ignore your attempted change.
Suppose you change the data in the very first cell – you'd get a different identifier, which would mean your second block would have different data, leading to a different identifier in row 2, and so on. TH is, in essence, a product of all the information coming before it.
How are blocks connected?
What we discussed above – with our two-letter identifiers – is a simplified analogy of how a blockchain uses hash functions. Hashing is the glue that holds blocks together. It consists of taking data of any size and passing it through a mathematical function to produce an output (a hash) that's always the same length.
The hashes used in blockchains are interesting, in that the odds of you finding two pieces of data that give the exact same output are astronomically low. Like our identifiers above, any slight modification of our input data will give a totally different output.
Let's illustrate with SHA256, a function used extensively in Bitcoin. As you can see, even changing the capitalization of letters is enough to completely scramble the output.
The fact that there aren't any known SHA256 collisions (i.e., two different inputs that give us the same output) is incredibly valuable in the context of blockchains. It means that each block can point back to the previous one by including its hash, and any attempt to edit older blocks will immediately become apparent.
Each block contains a fingerprint of the previous.
Blockchains and decentralization
We've explained the basic structure of a blockchain. But when you hear people talking about blockchain technology, they’re likely not just talking about the database itself, but the ecosystems built around blockchains.
As standalone data structures, blockchains are only really useful in niche applications. Where things get interesting is when we use them as tools for strangers to coordinate amongst themselves. Combined with other technologies and some game theory, a blockchain can act as a distributed ledger that's controlled by no one.
What this means is that no one has the power to edit the entries outside of the rules of the system (more on the rules shortly). In that sense, you could argue that the ledger is simultaneously owned by everyone: participants reach an agreement on what it looks like at any given moment.
The Byzantine Generals Problem
The real challenge standing in the way of a system like that described above is something called the Byzantine Generals Problem. Conceived in the 1980s, it describes a dilemma in which isolated participants must communicate to coordinate their actions. The specific dilemma involves a handful of army generals that surround a city, deciding whether to attack it. The generals can only communicate via messenger.
Each must decide whether to attack or retreat. It doesn't matter whether they attack or retreat, as long as all generals agree on a common decision. If they decide to attack, they will only be successful if they move in at the same time. So how do we ensure that they can pull this off?
Sure, they could communicate via messenger. But what if the messenger is intercepted with a message that says "we’re attacking at dawn," and that message is replaced with “we're attacking tonight”? What if one of the generals is malicious and intentionally misleads the others to ensure they're defeated?
All generals are successful when attacking (left). When some retreat while others attack, they will be defeated (right).
We need a strategy wherein consensus can be reached, even if participants turn malicious or messages get intercepted. Not being able to maintain a database isn't a life-and-death situation like attacking a city without reinforcements, but the same principle holds. If there's no one to oversee the blockchain and to give users “correct” information, then the users must be able to communicate amongst themselves.
To overcome the potential failure of one (or several) users, the mechanisms of the blockchain must be carefully engineered to be resistant to such setbacks. A system that can achieve this is referred to as Byzantine fault-tolerant. As we'll see shortly, consensus algorithms are used to enforce robust rules.
Why do blockchains need to be decentralized?
You could, of course, operate a blockchain by yourself. But you'd end up with a database that's clunky in comparison to superior alternatives. Its real potential can be exploited in a decentralized environment – that is, one where all users are equal. That way, the blockchain can’t be deleted or maliciously taken over. It's a single source of truth that anyone can see.
What's the peer-to-peer network?
The peer-to-peer (P2P) network is our layer of users (or the generals in our previous example). There's no administrator, so instead of phoning into a central server anytime they want to exchange information with another user, the user sends it directly to their peers.
Consider the graphic below. On the left, A needs to route their message through the server to get it to F. On the right-hand side, however, they're connected without an intermediary.
A centralized network (left) vs. a decentralized one (right).
Normally, the server holds all the information that users need. When you access Binance Academy, you're asking its servers to feed you all the articles. If the website goes offline, you won't be able to see them. However, if you downloaded all of the content, you could load it on your computer without querying Binance Academy.
In essence, that's what every peer does with the blockchain: the entire database is stored on their computer. If anyone leaves the network, the remaining users will still be able to access the blockchain, and share information with each other. When a new block is added to the chain, the data is propagated across the network so that everyone can update their own copy of the ledger.
What are blockchain nodes?
Nodes are simply what we call the machines connected to the network – they're the ones that store copies of the blockchain, and share information with other machines. Users don't need to manually handle these processes. Generally, all they need to do is download and run the blockchain’s software, and the rest will be taken care of automatically.
The above describes what a node is in the purest sense, but the definition can also encompass other users that interact with the network in any way. In cryptocurrency, for instance, a simple wallet application on your phone is what's known as a light node.
Public vs. private blockchains
As you may know, Bitcoin laid the foundation for the blockchain industry to grow into what it is today. Ever since Bitcoin has started proving itself as a legitimate financial asset, innovators have been thinking about the potential of the underlying technology for other fields. This has resulted in an exploration of blockchain for countless use cases outside of finance.
Bitcoin is what we call a public blockchain. This means that anyone can view the transactions on it, and all it takes to join is an Internet connection and the necessary software. Since there aren't any other requirements for participation, we may refer to this as a permissionless environment.
In contrast, there are other types of blockchains out there called private blockchains. These systems establish rules regarding who can see and interact with the blockchain. As such, we refer to them as permissioned environments. While private blockchains may seem redundant at first, they do have some important applications – mainly in enterprise settings.
How do transactions work?
If Alice wants to pay Bob via bank transfer, she notifies her bank. Let’s assume that the two parties use the same bank for simplicity’s sake. The bank checks that Alice has the funds to perform the transaction, before updating its database (e.g., -$50 to Alice, +$50 to Bob).
This isn’t too dissimilar to what goes on with a blockchain. After all, it’s also a database. The key difference is that there isn’t a single party performing the checks and updating the balances. All of the nodes must do it.
If Alice wants to send five bitcoins to Bob, she broadcasts a message saying this to the network. It won’t be added to the blockchain straight away – nodes will see it, but other actions must be completed for the transaction to be confirmed.
Once that transaction is added to the blockchain, all of the nodes can see that it’s been made. They’ll update their copy of the blockchain to reflect it. Now, Alice can’t send those same five units to Carol (thus, double-spending), because the network knows that she’s already spent them in an earlier transaction.
There’s no concept of usernames and passwords – public-key cryptography is used to prove ownership of funds. To receive funds in the first place, Bob needs to generate a private key. That’s just a very long random number that would be virtually impossible for anyone to guess, even with hundreds of years at their disposal. But if he tells anyone his private key, they’ll be able to prove ownership over (and therefore spend) his funds. So it’s important that he keeps it secret.
What Bob can do, however, is derive a public key from his private one. He can then give the public key to anyone because it’s near-infeasible for them to reverse-engineer it to get the private key. In most cases, he’ll perform another operation (like hashing) on the public key to get a public address.
He’ll give Alice the public address so that she knows where to send funds. She constructs a transaction that says pay these funds to this public address. Then, to prove to the network that she isn’t trying to spend funds that aren’t hers, she generates a digital signature using her own private key. Anyone can take Alice’s signed message and compare it with her public key, and say with certainty that she has the right to send those funds to Bob.
Who invented blockchain technology?
Blockchain technology was formalized in 2009 with the release of Bitcoin – the first and most popular blockchain. However, its pseudonymous creator Satoshi Nakamoto took inspiration from earlier technologies and proposals.
Blockchains make heavy use of hash functions and cryptography, which were in existence for decades prior to the release of Bitcoin. Interestingly, the blockchain’s structure could be traced back to the early 1990s, though it was merely used for timestamping documents such that they couldn’t be altered later.
Pros and cons of blockchain technology
Properly-engineered blockchains solve a problem that plagues stakeholders in a number of industries, ranging from finance to agriculture. A distributed network presents many advantages over the traditional client-server model, but it also comes with some trade-offs.
One of the immediate benefits noted in the Bitcoin white paper is that payments could be transmitted without involving an intermediary. Subsequent blockchains have taken this even further, allowing users to send all kinds of information. Eliminating counterparties means that there’s less risk for users involved, and results in lower fees as there is no intermediary taking a cut.
As we mentioned earlier, a public blockchain network is also permissionless – there’s no barrier to entry since there’s no one in charge. If a prospective user can connect to the Internet, then they’re able to interact with other peers on the network.
Many would argue that the most important quality of blockchains is that they have a high degree of censorship-resistance. To cripple a centralized service, all that a malicious actor would need to do is target a server. But in a peer-to-peer network, every node acts as a server of its own.
A system like Bitcoin has over 10,000 visible nodes scattered around the world, making it virtually impossible for even a well-resourced attacker to compromise the network. It should be noted that there are many hidden nodes, too, which aren’t visible to the broader network.
Blockchains are not silver bullets to every problem. In being optimized for the advantages in the previous section, they end up lacking in other areas. The most obvious obstacle to mass adoption of blockchains is that they don’t scale very well.
This is true of any distributed network. Since all participants must stay in sync, new information can’t be added too fast as nodes would be unable to keep up. Therefore, developers tend to intentionally limit the speed at which the blockchain can update to ensure that the system remains decentralized.
For users of a network, this can manifest itself in lengthy waiting periods if too many people are trying to make transactions. Blocks can only hold so much data, and they’re not added to the chain instantly. If there are more transactions than can fit in the block, then any additional ones must wait for the next block.
Another possible con of decentralized blockchain systems is that they can’t easily be upgraded. If you’re building your own software, you can add new features as you please. You don’t need to work with others or ask for permission to make modifications.
In an environment with potentially millions of users, making changes is considerably more difficult. You could change some of the parameters of your node software, but you’d eventually find yourself separated from the network. If the modified software is incompatible with other nodes, they will recognize this and refuse to interact with your node.
Suppose you wanted to change a rule about how big blocks can be (from 1MB to 2MB). You could try sending this block to nodes you’re connected to, but they have a rule that says “do not accept blocks over 1MB”. If they receive anything bigger, they will not include it in their copy of the blockchain.
The only way to push changes is to have the majority of the ecosystem accept them. With major blockchains, there can be months – or even years – of intensive discussion in forums before changes can be coordinated.