<aside> 💬 This document is open for commenting.

</aside>

Table of contents

Intro

The state of the entire database can be described with the following data structure:

struct State {
	data_root: Document,
	...
}

enum Document {
	Leaf(Vec<u8>),
	Fork(BTreeMap<Nat, Document>), 
}

In other words, the whole database is just a very big Document that may include many sub-Documents. These documents are arranged in a tree structure - each document is a node of the tree. This tree is a BTree.

These trees use plain numbers (Nat) as keys - each Document has its own separate number space. Because of that, it is easy to create serializable references to subdocuments, stored inside the database. All we need to do is to simply join the keys which form the path to a document. Such references (which are called composite keys, but we sometimes refer to them simply as keys) can be compared against each other, even if they point to two completely different parts of the database - as long as they share a common prefix (and they always are), it is possible to decide which one goes first.

struct CompositeKey { 
  segments: Vec<Nat>,
}

Composite keys are interesting to us, because of several things:

Because of how BTrees work, it’s entries are always stored in sorted order. This means, that each Document is sorted and the whole database is also globally sorted. This will come in handy when we talk about locking in Transactions section.

state.drawio.png

<aside> 📖 This illustation uses text keys (like users or age) for educational purposes. In reality, every string would have to be transformed into a Nat in order to be stored inside our data structure. There are different ways of doing this for different use-cases. In general, hashing with some insecure, but collission resistant, hash function should be enough.

</aside>

Such a way of representing the database’s state is known as schemaless database - the storage doesn’t know anything about schema or type information at all. Note, that we are still able to fortify such a system with types - by encoding our data structures into btrees, while using them in our program, - but this can only be done as an add-on. Such an approach has its limitations (the most obvious one is performance), but it also makes the whole database a lot simpler.

<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/3a8cce60-5d0c-4b8c-aa91-8a5703b5f7d4/internet-computer-icp-logo.png" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/3a8cce60-5d0c-4b8c-aa91-8a5703b5f7d4/internet-computer-icp-logo.png" width="40px" /> While this way of representing state is very certification friendly, union-db does not pursue the goal of data certification and expects the IC itself to eventually implement a general purpose certification mechanism, aka Certified Queries.

But, if the community decides that certification is a crucial part of the database that should be available as soon as possible, this statement may change.

</aside>

You can store data as both: Leafs or Forks. The difference is that when you store it as a Leaf, you can simply encode it as bytes and just store as is. Leafs can’t be split during rebalancing - they are indivisible atoms of the state tree. And you’re guaranteed that if a shard stores a Leaf under some particular key, the value of this Leaf will be complete and ready to read. Data that is stored as a Fork behaves differently. Forks can be split during rebalancing, which means that keys of a fork can end up stored on multiple different shards. This means, that union-db cannot guarantee that when you read a Fork subdocument from the state tree, its value will be complete - some (maybe all) parts of it may reside on different shards.

Because our database is just a BTree, it means that we can treat it like a sorted list of key-value pairs. For example, the tree above could be represented as a list like this:

state1.drawio (1).png