A common chant from many in this space these days in response to any discussion of changes to the Bitcoin protocol is “Don’t mess with Layer 1! You can just build it on Layer 2!” This seems like a very logical thing to do, right? Why risk the security and stability of L1 when you can just build on top of it? The problem is this fundamentally fails to understand the relationship between Layer 1 and Layer 2.
An L2 protocol is an extension of the L1. Everything that an L2 is designed to do must ultimately reduce down to what the L1 is capable of. The blanket statement of “just do it on L2!” obfuscates numerous implicit realities of what can or can’t be done on an L2 given the current state of the base layer. For instance, imagine trying to build the Lightning Network without the existence of multisignature scripts. You couldn’t. It wouldn’t be possible to share control between more than one person, and the whole concept of a payment channel wouldn’t be possible.
The Evolution of Payment Channels
The entire reason that payment channels can exist in the first place is because of the fact that L1 of Bitcoin supports the ability for multiple people to share control of a UTXO with a multisig script. What is possible on a L2 is inherently constrained by what is possible on L1; yes, of course it is possible to do things on L2 that aren’t possible on L1, but the ultimately limiting factor of what you can do off-chain is what is possible on-chain. Faster payment confirmation in a payment channel is only possible because on-chain custody can be shared between multiple people.
Even that isn’t enough for a safe payment channel though. The original payment channel had a pre-signed transaction using an nLocktime timelock that gives the funder their money back after so many blocks, and only supported payment channels in one direction. Transaction malleability made these original payment channels unsafe to use. If the funding transaction was malleated by someone before confirming, then the refund transaction would become invalidated and the funder would have no way to claim their money back. The other party in the channel could effectively hold their money hostage.
CHECKLOCKTIMEVERIFY, the absolute timelock opcode, was the solution. CLTV allows you to make a coin unspendable until a certain blockheight or time in the future. This, in combination with the ability to make scripts that can be spent in multiple ways, allowed the multisig UTXO to have a script path where the funder could spend all of the funds themselves after a timelock. This guaranteed the funder would be able to claim the money back in a worst case scenario even if the funding transaction was malleated. The channel could still only facilitate one-way payments though.
In order to facilitate two-way payments, a proper solution to transaction malleability was necessary. This was a huge motivator for Segregated Witness. A timelock is all that was necessary for a one way channel because the money only increased in one direction. The only risk to the sender was that the other party would never claim what they have already been sent on-chain, leaving the rest of the sender’s money trapped. The timelock refund both gave the receiver the incentive to claim funds on-chain before the timelock, when they would lose all the funds they had already been sent, and the sender a worst-case recourse in case something happened to permanently knock the receiver offline. Script does not support enforcing certain amounts to certain future scripts, so a pre-signed transaction is the only viable initial refund mechanism if payments are to flow in both directions. This reopened the risk of funds being held hostage.
With the upgrade to Segwit, this problem was solved. In place of the timelock refund incentivizing honest behavior, the penalty key was introduced. Because the funds in a two-way channel can flow back and forth in each direction there will inevitably be a case where both sides had more money in a prior state of the channel than the current one. By establishing a branch in each channel state’s pre-signed transaction using a penalty key, users can exchange these after signing the new state and know if the other party tries to use an old transaction they can claim 100% of the funds in the channel. Timelocks are used to guarantee the normal spending path where users take their respective balances isn’t valid for a time to give channel parties the chance to use the penalty key if necessary. There’s a problem with this though, using CLTV means that at some point in the future the channel has to close or else the timelock will expire and you no longer have that safety period to penalize the dishonest party.
Bi-directional payment channels also needed CHECKSEQUENCEVERIFY, or relative timelocks, in order to solve this issue. Unlike CLTV, which specifies a specific time or blockheight in the future, CSV specifies a relative length of time or number of blocks from the time or block that the UTXO using CSV in the script is confirmed in the blockchain. This allowed the safety period to function for penalty key use without requiring channels having to close on-chain at a pre-decided time.
Even this does not give us the Lightning Network though. There is still no way to actually route a payment across multiple payment channels. They can conduct payments in both directions, but only between the two people involved in the channel. In order to route payments across multiple channels you need, you guessed it, other functionality from the L1. Hash Time Locked Contracts are how this is accomplished, and they require both CLTV as well as hashlocks. Hashlocks require providing the preimage to a hash in order to spend the coins. It’s like a signature, except you actually just reveal the “private key” instead of signing with it. This allows the receiver in a Lightning payment to provide a hashlock, and every intermediate channel between sender and receiver create a script that allows spending immediately with the hash preimage, or refunding the money backwards after a timelock. If the receiver reveals the hashlock, everyone can claim the money for forwarding the payment, if not, then the money can be claimed backwards and reversed without finalizing it.
So the Lightning Network as it exists today depends entirely on five functionalities being possible on the base layer of Bitcoin. Multisignature scripts, absolute timelocks, relative timelocks, Segregated Witness, and hashlocks. Without any one of these features existing on L1, Lightning as we know it today would not be a possible L2 we could construct. Its existence as an L2 is entirely dependent on L1’s capability to do certain things. So if one were to, in a world with a Bitcoin that did not support hashlocks, timelocks in script, and no malleability fix, simply go “Just build a bidirectional multi-hop payment channel system on Layer 2! We shouldn’t be messing around with Layer 1” it would be a completely incoherent statement.
The Catch
That said, strictly technically speaking, it still would have been possible to build that bidirectional multi-hop payment channel system in that world without those three features on L1. At a massive cost in terms of introducing trust in other people to not steal your money when they are capable of doing so. A federated sidechain. Everyone could have just set up a federated chain like Liquid or Rootstock and added those features to the sidechain, building the Lightning Network there instead of on the mainchain. The problem with that is, it’s not the same thing. On a technical level the network would function exactly the same, but no one using it would actually have the same degree of control over their coins.
When they closed out a Lightning channel it would settle on a sidechain backed by a federation, i.e. it would just be an accounting entry on top of someone else’s multisig wallet where you have no ability to control those coins on L1. You just have to trust the distributed group operating the federation to not rug everyone. Even drivechains (which ironically itself requires new L1 functionality to be done) is just another form of federation at the end of the day, with some extra restrictions added to the withdrawal process. The federation is just miners instead of people holding private keys.
This is the implicit reality, whether they understand it or not, underlying the reaction “just build it on L2!” whenever someone is discussing improvements to L1. There is the scope of what is already possible to build on L2, which is rather limited and restricted by its own scaling limitations, and then there is the scope of what is not already possible. Everything falling into the latter category is impossible to build without interjecting some trusted entity or group of entities that ultimately is in control of users’ funds for them.
What’s the Point?
“Layer 2” is not a magic incantation. You can’t just wave a magic wand and chant the words, and anything and everything becomes magically possible. There are strict inescapable limitations of what an L2 can accomplish, and those limitations are what the L1 can accomplish. This is just an inherent fact of engineering reality when looking at a system like Bitcoin. You can’t escape it in any way except by degrading the trust assumptions more and more the more flexible of an L2 you build beyond the capabilities of L1.
So when discussions around these issues occur, such as what improvements can be made to L1, two things are of utmost importance. First, those improvements to L1 are almost entirely centered around enabling the construction of more flexible and scalable L2s. Secondly, L2s cannot magically enable everything. L2s have their own limitations based on those of the L1, and to have a discussion regarding changes to L1 without acknowledging the only way around those limitations is to introduce trusted entities is not an honest conversation.
It’s time to start acknowledging reality if we are going to discuss what to do with Bitcoin going forward, otherwise nothing is happening but denial of reality and gaslighting. And that is not productive.