Data Availability Sampling

Problem Statement

In a blockchain, light clients store block headers, which contain things like a hash of the block contents. When a new block is published, light clients can verify that the header is valid (e.g. the hash is correct) but they do not have the full block contents.

The problem is that a malicious block producer could create a valid header but then not actually publish the full block contents referenced by that header. So nodes would see a valid header but the actual block data would not be available on the network.

This could allow attacks where the header commits to one thing (e.g. a certain transaction) but the block contains something else. Light clients would believe the transaction is included based on the header, but full nodes would see it's not actually in the published block.

DAS prevents this attack by allowing light clients to still verify the full block contents are available, without downloading everything. The light clients sample random parts of the encoded block data to check it matches the header.

If enough of these probes accept, DAS guarantees the full data must be available somewhere on the network. Otherwise, an adversary could not have responded consistently to the probes.

So in summary, DAS lets light clients verify that valid headers really do have the appropriate block contents available and unpublished, not just making up a header with no data behind it. This verifies data availability without downloading everything.

How It Works

The key idea behind how data availability sampling (DAS) works is the use of erasure codes and associated cryptographic commitments:

  1. The data is first encoded using an erasure code. This introduces some redundancy and spreads the data across a longer codeword.

  2. The codeword symbols are then committed to using a commitment scheme tailored for that erasure code. This is what the paper calls an "erasure code commitment".

  3. Clients can now randomly sample and probe positions of the encoded data. Each probe will return a codeword symbol and a commitment opening at that position.

  4. Clients verify the opening is valid against the overall commitment. Enough accepting probes ensures the full data can be reconstructed from the codeword symbols.

  5. The consistency property of the commitment scheme ensures all valid openings are for the same data.

Some key mathematical ingredients:

So in summary, the erasure code provides the sampling capability, while the cryptographic commitments ensure soundness and consistency of probes. The math ensures it all works out properly.

What is an erasure code?

An erasure code is a technique used in computer science and information theory to encode data in a redundant way such that the original data can be reconstructed even if some of the encoded data is lost or corrupted.

The main components of an erasure code are:

Some examples of erasure codes:

So in summary, erasure codes allow for efficient and redundant encoding such that data can be reliably reconstructed from a subset of the encoded data. This makes them useful for communications and storage when data may be lost or corrupted.

What is the Codeword?

In the context of erasure codes and data availability sampling, a "codeword symbol" refers to an element of the encoded data. Let me explain in a bit more detail:

So in summary, a "codeword symbol" refers to the individual elements of the encoded data that make up the full codeword. The openings to the commitment scheme are tied to specific codeword symbols.

https://eprint.iacr.org/2023/1079.pdf

!data-availability.pdf