SegWit and Taproot: Protocol Design, Witness Economics, and the Exploitation of Bitcoin's Validation Layer
A detailed look shows how SegWit started and Taproot improved the way witness data is discounted, allowing people to use Bitcoin blockspace for non-money data in a way that makes sense financially.
1. Contextual Overview
Segregated Witness (SegWit) and Taproot, implemented via BIP141/BIP143/BIP144 and BIP341/BIP342, respectively, introduced sweeping improvements to Bitcoin’s protocol layer. SegWit restructured transaction data to eliminate txid malleability and enabled a new fee model that discounts witness information. Taproot built on these improvements by allowing Schnorr signatures and new ways to spend through a Merkleized alternative spending structure (MAST). Both were implemented as soft forks, preserving consensus compatibility for unupgraded nodes.
However, both upgrades introduced a structural asymmetry in the fee model. Witness data, which is required for consensus validation, is heavily discounted compared to base transaction data. The intention of this discount was to encourage SegWit adoption, but in reality, it paved the way for economic abuse. The mechanism is now routinely exploited to store arbitrary, non-monetary data on the blockchain—most notably through Ordinal inscriptions and BRC-20 metadata.
The next sections will look closely at how this exploitation works, starting from the basic protocol and moving up, with an emphasis on how data is organized, how witnesses are evaluated, how validation works, and how fees are calculated based on size. We will also cover how Taproot extends these capabilities and makes such exploits more efficient.
2. Pre-SegWit Transaction Structure and Malleability
In legacy Bitcoin transactions (non-SegWit), each input contains a scriptSig field that holds both the unlocking signature and public key. The scriptSig is included in the transaction ID hash, computed as a double SHA-256 hash of the entire serialized transaction. This attribute makes the txid unstable, as any non-functional change in scriptSig (such as DER encoding changes in the signature) results in a different txid. This limitation made it impossible to build chained, pre-signed transactions such as those used in payment channels.
The full legacy transaction serialization is defined as follows:
version: 4 bytesvin_count: varint
Each vin includes:
prev_txid: 32 bytes, little-endianvout: 4 bytes, little-endianscriptSig_length: varintscriptSig: variablesequence: 4 bytesvout_count: varint
Each vout includes:
value: 8 bytes, little-endianscriptPubKey_length: varintscriptPubKey: variablelocktime: 4 bytes
The txid is computed as SHA256(SHA256(serialized_tx)), with all fields included. This makes all inputs malleable unless the entire transaction is frozen.
3. SegWit Structural Changes (BIP141)
SegWit separates signature data from the base transaction by introducing a new section called the “witness.” This is not included in the txid computation. Instead, a SegWit transaction includes the following fields in the new serialization format:
version: 4 bytesmarker: 1 byte (always 0x00)flag: 1 byte (always 0x01)vin_count: varintvinvector: same structure as legacy, butscriptSigis empty for P2WPKH and P2WSH inputsvout_count: varintvoutvector: same as legacywitness: one witness vector per input; each vector includes:item_count: varintitem_1 ... item_n: varint + datalocktime: 4 bytes
Nodes compute two transaction IDs:
txidThis refers to the legacy-compatible hash of the transaction, excluding the witness section.wtxidThis hash represents the complete transaction, which includes the witness section.
4. Witness Discount and Fee Model
Block weight is defined as:
weight = (stripped_size * 4) + witness_sizeThe stripped size is the size of the transaction excluding the marker, flag, and witness sections. We count the witness size exactly as it is. This design means that witness data consumes only 1 weight unit per byte, while base data consumes 4 units per byte.
This arrangement introduces a fee asymmetry. A kilobyte of witness data costs the same in weight as 250 bytes of base data. Fee-paying users are incentivized to push as much data as possible into the witness field to reduce fee costs. This behavior is economically rational under the fee model and has enabled large-scale storage of witness-based data.
5. Signature Digest Changes (BIP143)
For SegWit inputs, the sighash computation was also updated. BIP143 defines a deterministic serialization of the transaction state for signature generation and validation. The serialized message includes:
version: 4 byteshashPrevouts: SHA256d of all input outpointshashSequence: SHA256d of all sequence numbersoutpoint: 36 bytes (txid + vout of the specific input)scriptCode: pre-defined minimal script template (e.g.,DUP HASH160 PUSH20 [pubKeyHash] EQUALVERIFY CHECKSIGfor P2WPKH)amount: 8 bytessequence: 4 byteshashOutputs: SHA256d of all outputslocktime: 4 bytessighash_type: 4 bytes
This hash excludes the scriptSig and all witness data, making the txid and sighash stable across non-functional changes to witness data.
6. Taproot Extensions (BIP341, BIP342)
Taproot introduces a new native SegWit output format:
ScriptPubKey:
OP_1 <32-byte x-only pubkey>
The x-only pubkey is the tweaked Taproot output key, computed as:
P = internal_key + H_tapTweak(internal_key || merkle_root) * GThere are two spending paths:
Key-path spend—spend with a single 64-byte Schnorr signature against the tweaked key. Witness includes only the signature.
Script-path spend—spend by revealing a Tapleaf and its inclusion proof (control block).
Witness includes:
Execution stack items
Serialized tapscript
Control block
The control block includes an internal key and a Merkle path, proving the inclusion of the Tapleaf script. Taproot sets new rules for checking transactions (BIP342), which involve standardizing leaf version bytes, hashing the Merkle path, and using simple encoding rules.
7. Exploit Pattern: Witness Data Embedding via SegWit and Taproot
The economic design of witness discounting enabled data insertion strategies using both SegWit and Taproot inputs.
With SegWit, users construct dummy P2WSH inputs where the witness stack includes arbitrary binary data. The redeem script can be an innocent one, like "Ensuring OP_TRUE validity." The transaction does not involve any meaningful value transfer, and the input’s sole purpose is to carry data in the witness field. Since each witness byte is discounted by 75%, the attacker can store 1 MB of data for roughly 0.01 to 0.02 BTC depending on fee rates.
With Taproot, the attack becomes more efficient. The user creates a Taproot output with a Merkle tree that commits to a Tapleaf script, such as:
<large_blob> OP_DROP OP_TRUEThe Tapleaf is valid under BIP342 rules, and the script evaluates to true. When the UTXO is spent, the user reveals the Tapleaf and control block in the witness, along with dummy stack values. The witness section stores all this data and applies the same 1-weight-unit-per-byte discount.
Because the Tapleaf and control block can be made arbitrarily large (within protocol and miner relay constraints), this method enables large-scale data inscription onto the Bitcoin blockchain with minimal cost and permanent storage.
8. Validation Path and Node Resource Impact
Nodes that validate a SegWit or Taproot transaction with large witness data must:
Parse and deserialize each witness stack.
Execute the associated script, either via the legacy interpreter (P2WSH) or the Tapscript interpreter (for Taproot).
To reconstruct the Merkle root for Taproot spends, follow these steps:
Parse the control block to extract the internal key and inclusion path.
The Tapleaf hash is iteratively hashed with each Merkle node in the control block.
We confirm that the tweak in the output pubkey aligns with the resulting root.
We must retain the witness data for future block revalidation, as it is crucial to consensus. We must validate all witness data upon first block download, even if a node operates in pruned mode.
Nodes thus incur non-trivial disk, bandwidth, and CPU load for non-financial data that must be parsed, validated, stored, and potentially relayed—even though it has no economic function in terms of Bitcoin transfer.
9. Mempool Policy Limitations
Node implementations can apply policy-level restrictions to mempool admission, despite the inability to change consensus rules without a soft or hard fork. Bitcoin Core has introduced policy rules that deprioritize or reject transactions based on the following criteria:
Excessively large witness fields
Tapscripts that contain known spam patterns such as
OP_FALSE OP_IF <data> OP_ENDIFLarge Taproot control blocks with no meaningful economic activity
These policies are not consensus rules. They apply only to the local node’s mempool and do not affect block validity. Regardless of local policy, all nodes must accept a transaction that a miner includes with large witness data as valid.
As a result, while policy filtering may reduce the propagation of spam, it cannot eliminate it from inclusion in the timechain.
10. Future Discussion: Protocol and Social Tradeoffs
Any mitigation at the consensus level would require a change to the protocol’s validation rules.
Proposals include:
Removing or reducing the witness discount is one solution.
One proposal is to cap the witness size either per input or per transaction.
We will prohibit Tapleaf scripts that surpass specific byte thresholds.
We are also adding "data carrier" flags to scriptPubKeys and enforcing size limits.
All such changes have tradeoffs. Reducing the witness discount could increase fees for legitimate Lightning-related activity. Capping script size could prevent valid MAST trees. Enforcing script semantics would violate Bitcoin’s philosophy of script agnosticism.
Bitcoin’s neutral validation model does not distinguish between a valid script that drops 200 KB of data and returns OP_TRUE, and a valid multi-signature spend. Both are valid under current consensus rules.
Conclusion
SegWit and Taproot were foundational upgrades to Bitcoin’s protocol. They fixed long-standing limitations and enabled second-layer protocols and contract primitives. But the fee model they introduced—specifically the heavily discounted witness area—created an exploit surface that users have leveraged for large-scale data inscription. These transactions are consensus-valid, economically rational under the weight-based fee model, and increasingly common.
Mitigating these behaviors requires balancing openness with cost control. Bitcoin needs to change by finding a balance between social rewards, standard rules, and future updates to keep space available for money transactions while also protecting against misuse of its validation processes.
You can sign up to receive emails each time I publish.
Link to the original Bitcoin White Paper: White Paper:
Dollar-Cost-Average Bitcoin ($10 Free Bitcoin): DCA-SWAN
Access to our high-net-worth Bitcoin investor technical services is available now: cccCloud
“This content is intended solely for informational use. It is not a substitute for professional financial or legal counsel. We cannot guarantee the accuracy of the information, so we recommend consulting a qualified financial advisor before making any substantial financial commitments.






