<aside> 💬 This document is open for commenting.
</aside>
This functionality is based on ic-stable-memory library. In particular, we’re interested in SBTreeMap, which is an infinite (up to 2**64 entries) collection, where all keys are sorted. BTrees are good, because they provide us with a way to implement other efficient data structures on top. For example you can transform it into:
() as a value;All of these use cases will perform decently even on huge datasets, because BTree’s performance for each basic operation is O(logN). This versatility makes BTreeMap a good basic collection to build a database on top on.
One interesting aspect of ic-stable-memory's BTreeMap implementation is that it is aware of possible OutOfMemory events - each method of this collection, that allocates memory, returns a Result, where the Err variant represents an OutOfStableMemory error. This allows developers to programmatically react to these events and scale horizontally on-demand.
<aside>
📖 And if you think about this feature closely, you’ll see that it is actually amazing. Especially when you remember that on the IC the only thing you need to spawn new software instances is money, and money is also programmable. If we somehow could connect OutOfMemory errors with deployment of new canisters, then it would be possible to implement an infinite database that scales automatically, without us even noticing it. union-db is an implementation of that idea.
</aside>
Collection typeThe Collection data type is just a wrapper around SBTreeMap with extra functionality.
Collection exposes bounds - minimum and maximum keys living in this collection. Bounds are required for the routing protocol (details below).Collection contains the code that automatically triggers sharding (horizontal scaling). You can read more about this in State & Rebalancing section.Collection may appear in two states:
unlocked (or default) - this state means that an entry is not in use by any transaction and has only one publicly available value;locked - this state means that an entry is currently being accessed by a transaction and has two values:
committed - this is a value that will be assigned to an entry if the transaction fails;uncommitted - this is a value that will be assigned to an entry if the transaction succeeds.Collection is a transparent interface for developers to interact with other shards. We want developers to have an illusion of interacting with a bunch of collections, exactly in a way they would interact with regular std collections, and let the undelying system do all the heavylifting. For example, here is incomplete API of the Collection type:impl<K, V> Collection<K, V> {
// returns a coroutine, that will find the shard responsible for storing the key
// and insert the entry into its state, responding back to the invoking shard
// submits Option<V> into the parent coroutine on completion
// panics if the key is in `unlocked` state
fn insert(key: K, value: V) -> <Coroutine> { ... }
// returns a coroutine, that will find the shard responsible for storing the key
// and remove the entry from its state, if there is one, responding back to the invoking shard
// submits Option<V> into the parent coroutine on completion
// panics if the key is in `unlocked` state
fn remove(key: &K) -> <Coroutine> { ... }
// returns a coroutine, that will REBASE the transaction execution
// to the shard responsible for storing the key
// panics if the key is in `unlocked` state
// submits Option<V> into the parent coroutine on completion
// never panics
fn get(key: &K) -> <Coroutine> { ... }
// returns a coroutine, that will find the shard responsible for storing the key
// and responds the value of the entry back to the invoking shard
// if the entry is in `locked` state, responds with `committed` value
// submits Option<V> into the parent coroutine on completion
// never panics
fn fetch(key: &K) -> <Coroutine> { ... }
}