HedgeDB

Built for the Hardware.

HedgeDB is a key-value store, built on a partitioned LSM-tree, C++20 coroutines, and io_uring. Larger-than-memory, persisted, tuned for modern NVMe SSDs and modern CPUs.

Architecture deep-dive » Getting started » GitHub » Discord » C++20 Status License Build


Features and core design

HedgeDB is an LSM-Tree engine designed to saturate the NVMe device. Inspired by RocksDB, the engine targets write-heavy workloads with uniformly-distributed keys (UUIDs, hashes), and is structured around:

  • Concurrent execution. io_uring + C++20 coroutines via the TooManyCooks work-stealing scheduler. Every I/O is a co_await; no callbacks, no thread-per-request.

  • Partitioned LSM-tree. The key space is sharded into 2^N independent partitions (default 16 partitions). Compactions on different partitions run fully in parallel.

  • Size-tiered compaction. Lower write amplification than leveled, with a quotient filter on the read path to skip SSTs that can’t contain a key.

  • Per-thread WAL. Each writer thread owns its own WAL file — no inode contention.

  • Direct I/O. files opened with O_DIRECT flag on the SST path: predictable latencies and no opaque OS-managed cache layer or buffering.

  • MVCC: HedgeDB implements MVCC for snapshot isolation over range scans.

Warning

HedgeDB is currently a prototype and there are a few known gaps. The codebase features a set of test cases, and it’s been extensively tested with sanitizers on. However, it has never been tested in any production environment.


Performance

Performance comparison with RocksDB on a 13th Gen Intel i7-13700H (6 P-cores + 4 E-cores, 32 GB DDR5 RAM) with a Samsung 980 Pro 1TB NVMe; 100M records, 24-byte keys, 100-byte values:

Workload

HedgeDB

RocksDB

HedgeDB / RocksDB

Load (100M puts)

3.97M ops/s

1.14M ops/s

3.5×

Load + compactions drained

3.59M ops/s

1.13M ops/s

3.2×

Read (100M random gets)

1.03M ops/s

194K ops/s

5.3×

Mixed 50/50 read-write

1.33M ops/s

262K ops/s

5.1×


Quickstart

Linux only. See Getting started for full prerequisites and the larger API surface.

Build

# install dependencies (liburing is mandatory; hwloc/jemalloc are recommended)
sudo sh install_liburing.sh
sudo apt install libhwloc-dev

# configure & build
cmake . -B build -DCMAKE_BUILD_TYPE=Release -DUSE_JEMALLOC=1 -Wno-dev
cmake --build build -j$(nproc)

Try it out with benchtool

The fastest way to confirm everything works end-to-end is to run a small load through the bundled benchmark CLI:

# Bump the FD limit (HedgeDB keeps many SST files open)
ulimit -n 1048576

# Write 1M keys with 100-byte values, then read them back
./build/benchtool -m write -n 1000000 -v 100 -p /tmp/hedge_demo
./build/benchtool -m read  -n 1000000 -v 100 -p /tmp/hedge_demo

Hello world from the API

A minimal HedgeDB Hello World! example (check examples/hello_world.cc):

#include <filesystem>
#include <iostream>
#include <string_view>
#include <vector>

#include "db/database.h"
#include "io/static_pool.h"
#include "tmc/sync.hpp"
#include "tmc/task.hpp"

tmc::task<void> run(std::shared_ptr<hedge::db::database> db)
{
    hedge::key_t k{"test_key"};
    std::string v{"Hello, world!"};

    if(auto s = co_await db->put_async(k, std::as_bytes(std::span{v})); !s)
    {
        std::cerr << "put failed: " << s.error().to_string() << "\n";
        co_return;
    }

    auto maybe_value = co_await db->get_async(k);
    if(!maybe_value)
    {
        std::cerr << "get failed: " << maybe_value.error().to_string() << "\n";
        co_return;
    }

    const auto& bytes = maybe_value.value();
    std::string_view readback(reinterpret_cast<const char*>(bytes.data()), bytes.size());
    std::cout << "read back: " << readback << "\n";
}

int main()
{
    const std::filesystem::path db_path = "/tmp/examples_db";
    if(std::filesystem::exists(db_path))
        std::filesystem::remove_all(db_path);

    hedge::io::static_pool::instance()->init(hedge::io::executor_config{
        .name        = "examples-pool",
        .queue_depth = 16,
        .n_threads   = 4,
    });

    auto maybe_db = hedge::db::database::make_new(db_path, hedge::db::db_config{});
    if(!maybe_db)
    {
        std::cerr << "failed to create database: " << maybe_db.error().to_string() << "\n";
        return 1;
    }

    std::shared_ptr<hedge::db::database> db = std::move(maybe_db.value());
    tmc::post_waitable(*hedge::io::static_pool::instance(), run(db)).wait();

    db.reset();
    hedge::io::static_pool::instance()->shutdown();
}

What’s missing

HedgeDB is a prototype. Things that aren’t here yet:

  • Full-fledged crash recovery — WAL replay works, but partial-write and corrupted-file edge cases aren’t handled.

  • Battle-testing & hardening — never run against real-world workloads or for long execution periods.

  • Cross-platform support — Linux only (io_uring).

  • Block compression — many workloads would see meaningful size, space, and write-amplification reduction from lossless compression.

  • Batched operations — no batch put/get APIs to amortize call overhead.

  • Column families — single keyspace per database.

  • Large values support — if key.size() + value.size() (a bit less than 4KB) exceeds the index-block page size, the flush will break.

Future plans

  • Hyper-Clock Cache — approximate LRU Cache that trades counting precision for a faster algorithm — and works well with Direct I/O.

  • Key-value separation — SSTs would store keys + pointers, with values in separate append-only .vlog files; dramatically reduces value write-amplification during compaction (paired with a GC story).

  • Rate-limiting — soft-stall writes when L0 SSTs cross a threshold, smoothing long-tail latencies due to compaction backlog.

  • Rewrite in Rust? — 👀


Built and maintained by Federico Vaccaro. Questions, ideas, or war stories — open an issue or reach out via GitHub.

Last updated: 2026-05-10 · Version: prototype (v0.0.1)