<aside> đź’¬ This document is open for commenting.

</aside>

Table of contents

Intro

In union-db all the data is distributed across a dynamic number (from 1 to infinity) of shards. Each time a shard goes out of memory, it first tries to redistribute some of its data among its parent and child, but if they are also out of memory, it will spawn a new shard and rebalance itself into it.

Sharded state

One can reference and mutate this data only during transactions. Sharded state is composed from special stable-memory based sharded Collection. You can think of Collections as of tables in SQL-based databases. Or as of collections from NoSQL world. You can have as many different sharded Collections per database as you want.

To read more about sharded collections, visit this section:

Collections & Routing

Sharded state will eventually (with growth) occupy all the stable memory available to a shard, that is not occupied by pending transactions or the cache.

Sharded state is the main way for your dapp to scale. You can measure it in any units - from kilobytes to petabytes and beyond. You should store all the data you have in such sharded Collections, even if this data is itself unique. For example, if you’re building an infinite token dapp, then store your accounts (and balances) inside sharded state. If you’re building an orderbook based DEX, store your orders in sharded state. If you have a unique data object, like admin’s principal - you should still create a separate Collection for it, even if it would only store a single object.

Read more about transactions here:

Transactions

Sharding and rebalancing

Memory on the Internet Computer is very dynamic. Even if at some moment t there is 10GB of free memory available in a subnet, at moment t+1 a few more canisters may be deployed to that subnet ocuppying all the available space (by using stable_grow() api or by using memory_allocation deployment config parameter). Heap memory and stable memory for the IC is the same thing, so when some canister occupies heap memory, other canisters have less available stable memory on that subnet. This is why it’s very hard to make any assumptions about the memory on the IC. The best strategy is to simply assume that a canister can run out of it any time.

union-db is designed to be conflict-free under condition of following some rules. You are guaranteed that your database will never run out of stable memory storing sharded state. This is how it goes:

  1. a developer deploys the root shard;
  2. sharded state grows, until the shard occupies, for example, 10GB of memory;
  3. the shard finds out the subnet is out of memory, so it needs to scale horizontally, deploying a new shard;
  4. our new shard has to fit at least some of the sharded state, to make room for new data;