Performance¶
HedgeDB vs RocksDB on the same machine, same workload, same harness.
Setup¶
CPU |
13th Gen Intel i7-13700H (14 cores / 20 threads) |
RAM |
32 GB DDR5 |
Storage |
Samsung 980 Pro 1TB NVMe |
Records |
100M, 24-byte keys, 100-byte values (~12 GB raw) |
Key space |
uniformly-distributed random |
Both RocksDB and HedgeDB have been tested with O_DIRECT I/O mode, with 12 threads plus 8 background threads (for flush and compaction), reflecting the test CPU architecture (6 P-cores with SMT and 4+4 E-cores).
In the HedgeDB benchmarks, the operations are submitted through the TooManyCooks coroutine-based threadpool; in the RocksDB the operations are submitted just via std::thread.
RocksDB have been tested with Universal Compaction (size-tiered). RocksDB has been provided with 1GB worth of cache and, pin_l0_filter_and_index_blocks_in_cache was enable.
RocksDB was configured in the attempt of matching HedgeDB features. For the specific configurations check src/benchtool/utils.cc and rocksdb/benchtool.cc.
Throughput¶
Workload |
HedgeDB |
RocksDB |
HedgeDB / RocksDB |
|---|---|---|---|
Load (100M puts) |
3.97M ops/s |
1.14M ops/s |
3.5× |
Load + compactions drained |
3.59M ops/s |
1.13M ops/s |
3.2× |
Read (100M random gets) |
1.03M ops/s |
194K ops/s |
5.3× |
Mixed 50/50 read-write |
1.33M ops/s |
262K ops/s |
5.1× |
Latency¶
Read (read-only workload)¶
HedgeDB’s per-request latency is higher than RocksDB’s despite its 5.3× throughput advantage. This is the expected tradeoff of the batching model: each thread runs its own io_uring ring at QD16, keeping multiple I/O requests in flight simultaneously. More requests in flight means higher aggregate throughput, but each individual request spends more time waiting in the queue. See the Queue-depth effect section below for a direct QD8 vs QD16 comparison.
Percentile |
HedgeDB |
RocksDB |
|---|---|---|
avg |
185 µs |
60 µs |
p50 |
155 µs |
61 µs |
p90 |
298 µs |
112 µs |
p99 |
632 µs |
198 us |
p99.9 |
1.05 ms |
295 us |
Write (memtable insert+WAL append)¶
Percentile |
HedgeDB |
RocksDB |
|---|---|---|
avg |
2.73 µs |
10.28 µs |
p50 |
2.0 µs |
9.5 µs |
p99 |
6.0 µs |
17.0 µs |
p99.9 |
23.5 µs |
25.5 µs |
Read latency under the mixed workload¶
Percentile |
HedgeDB |
RocksDB |
|---|---|---|
avg |
285 µs |
84 µs |
p50 |
237 µs |
72 µs |
p90 |
430 µs |
136 µs |
p99 |
1.09 ms |
281 µs |
Range scans¶
Range size |
Metric |
HedgeDB |
RocksDB |
HedgeDB / RocksDB |
|---|---|---|---|---|
Small (1–100) |
scans/s |
87.5K |
26.3K |
3.3× |
Small (1–100) |
keys/s |
4.38M |
1.32M |
3.3× |
Medium (512–1024) |
scans/s |
24.9K |
6.7K |
3.7× |
Medium (512–1024) |
keys/s |
19.2M |
5.12M |
3.7× |
Large (114K–131K) |
scans/s |
240 |
192 |
1.25× |
Large (114K–131K) |
keys/s |
29.5M |
23.7M |
1.25× |
Small and medium scans favor HedgeDB by ~3.3–3.7×. Very large scans converge — at that range size both engines are bottlenecked by sequential SSD bandwidth, not the index structure.
Memory (peak RSS)¶
Workload |
HedgeDB |
RocksDB |
|---|---|---|
Load (100M puts) |
1.53 GB |
1.03 GB |
Read (100M gets) |
455 MB |
1.30 GB |
Range scans |
633 MB |
1.30 GB |
Mixed 50/50 read-write |
1.82 GB |
1.89 GB |
HedgeDB uses more memory during load — the memtable holds pending writes
before they flush to SSTs. On the read path it is significantly lighter:
the SST index cache is demand-filled and shares nothing with the OS page
cache (all reads go through O_DIRECT), so memory usage tracks actual
working set rather than page-cache accumulation.
io_uring Queue-depth effect on read latencies¶
The tests that are shown above, have been executed with the thread-local io_uring instance configured with queue-depth 16.
For very latency-sensitive workloads, the io_uring depth queue can be tuned while still maintaining high bandwidth utilization.
Let’s see what happens if we reduce the QD to 8 instead:
Measurement |
HedgeDB QD8 |
HedgeDB QD16 |
RocksDB |
|---|---|---|---|
Throughput (reads/s) |
881K |
1.03M |
193K |
avg |
108 us |
185 us |
60 us |
p50 |
99 us |
155 us |
61 us |
p90 |
153.5 us |
298 us |
112 us |
p99 |
237.5 us |
632 us |
198 us |
p99.9 |
331.5 us |
1025 us |
295 us |
With this configuration, despite not being able to maximize the device bandwidth (14.5% lower than the peak), we gain substantial improvements on the measured latencies (62.5% decrease). HedgeDB now behaves much closer to RocksDB, proving that it can be adapted even to latency-sensitive scenarios.
Q: Did you try RocksDB’s MultiGet? It even support io_uring!
A: I did try it, but I did not register any meaningful throughput gain, only higher latencies.
Conclusions¶
From the results, we can deduce that HedgeDB multi-core and NVMe aware architecture produce the wanted results.
Writes have 3x more throughput compared to RocksDB and lower latencies, thanks to the high degree of parallelism, fast synchronization structures and the per-thread WAL.
Random reads can finally saturate the NVMe bandwidth thanks to the
io_uringintegration. However, the maximum throughput comes at the cost of higher latency.Short and medium range scan workloads are IOPS-bound, and here the asynchronous architecture shines the most.
Long range scans are bandwidth-intensive rather than IOPS-intensive, so the concurrent model is less of a differentiator.
Reproducing¶
The benchtool and rocksdb_benchtool binaries that produced these
numbers live in src/benchtool* in the repo. See
Getting started for the build steps and CLI flags.