<aside> 💬 This document is open for commenting.
</aside>
This functionality is based on ic-stable-memory library. In particular, we’re interested in SBTreeMap, which is an infinite (up to 2**64 entries) collection, where all keys are sorted. BTrees are good, because they provide us with a way to implement other efficient data structures on top. For example you can transform it into:
()
as a value;All of these use cases will perform decently even on huge datasets, because BTree’s performance for each basic operation is O(logN). This versatility makes BTreeMap
a good basic collection to build a database on top on.
One interesting aspect of ic-stable-memory
's BTreeMap implementation is that it is aware of possible OutOfMemory
events - each method of this collection, that allocates memory, returns a Result
, where the Err
variant represents an OutOfStableMemory
error. This allows developers to programmatically react to these events and scale horizontally on-demand.
<aside>
📖 And if you think about this feature closely, you’ll see that it is actually amazing. Especially when you remember that on the IC the only thing you need to spawn new software instances is money, and money is also programmable. If we somehow could connect OutOfMemory
errors with deployment of new canisters, then it would be possible to implement an infinite database that scales automatically, without us even noticing it. union-db
is an implementation of that idea.
</aside>
Collection
typeThe Collection
data type is just a wrapper around SBTreeMap
with extra functionality.
Collection
exposes bounds
- minimum and maximum keys living in this collection. Bounds are required for the routing protocol (details below).Collection
contains the code that automatically triggers sharding (horizontal scaling). You can read more about this in State & Rebalancing section.Collection
may appear in two states:
unlocked
(or default) - this state means that an entry is not in use by any transaction and has only one publicly available value;locked
- this state means that an entry is currently being accessed by a transaction and has two values:
committed
- this is a value that will be assigned to an entry if the transaction fails;uncommitted
- this is a value that will be assigned to an entry if the transaction succeeds.Collection
is a transparent interface for developers to interact with other shards. We want developers to have an illusion of interacting with a bunch of collections, exactly in a way they would interact with regular std
collections, and let the undelying system do all the heavylifting. For example, here is incomplete API of the Collection
type:impl<K, V> Collection<K, V> {
// returns a coroutine, that will find the shard responsible for storing the key
// and insert the entry into its state, responding back to the invoking shard
// submits Option<V> into the parent coroutine on completion
// panics if the key is in `unlocked` state
fn insert(key: K, value: V) -> <Coroutine> { ... }
// returns a coroutine, that will find the shard responsible for storing the key
// and remove the entry from its state, if there is one, responding back to the invoking shard
// submits Option<V> into the parent coroutine on completion
// panics if the key is in `unlocked` state
fn remove(key: &K) -> <Coroutine> { ... }
// returns a coroutine, that will REBASE the transaction execution
// to the shard responsible for storing the key
// panics if the key is in `unlocked` state
// submits Option<V> into the parent coroutine on completion
// never panics
fn get(key: &K) -> <Coroutine> { ... }
// returns a coroutine, that will find the shard responsible for storing the key
// and responds the value of the entry back to the invoking shard
// if the entry is in `locked` state, responds with `committed` value
// submits Option<V> into the parent coroutine on completion
// never panics
fn fetch(key: &K) -> <Coroutine> { ... }
}