April 9, 2024 · databases

tmpdb

tmpdb is a embedded key value store based off bitcask and go-caskdb written in NodeJS. The code can be found here.

Why

The main reason I worked on this is because I am interested in database internals and I wanted to work on a project related to that. I also thought this could be used in something like AWS Lambda as a cache – where it could be used like a hash map that stores the values on disk. It is not intended to be a real persistence layer since the Lambda instance would be torn down anyways.

AWS Lambda offers an ephemeral /tmp directory in the file system that could go up to 10GB. tmpdb might be able to take advantage of that to cache data on disk, as opposed to using an in memory Map to cache data.

Internals

The bitcask paper describes “a Log-Structured Hash Table for Fast Key/Value Data”. In a nutshell, it appends serialized key-value pairs to a file on disk, and then for an index, it uses an in memory hash map to map between the key and the location (offset) of the value data on disk.

data format

The storage format can be found in format.ts

It consists of a header and then the key and value pair, serialized to binary.

The header is the timestamp, key size, and value size.

So each record would look like this: Timestamp (4 bytes) , key size (4 bytes), value size (4 bytes), key, value

Then the KeyDir, or the in memory hash\map that maps the key to the offset of the value on disk looks like this: key: string → value (serialized to binary): timestamp (4 bytes), value size (4 bytes), value offset (4 bytes)

I did some rough tests and found serializing values to binary in an in memory hash map uses much less memory than to store the value as objects. Probably not a surprising result.

Next, I will describe how the data is written and retrieved. The code can be found in disk_store.ts

get

To get the value for a key, first we need to look up the value in the KeyDir. With that value, we will have the offset to locate the value in the file on disk. We also have the value size here, and that tells us how many bytes to read starting at the offset.

set

To set a value, we first encode the record into the appropriate format for tmpdb. Then we simply append this to the end of the database file on disk. If that is successful, we follow up by adding (or updating) the entry in the KeyDir.

setMany

In my tests, I found that writing lots of data could take a while. Perhaps someone wants to populate the cache with data from somewhere else, and they want to write lots of values quickly.

I tried filling up a buffer up to an arbitrary length, and then once the buffer hits that length, flushing the values to disk. I think it works fairly well, and I saw a good speed up. I also collected the KeyDir entries in an array, and then commited those entries to the KeyDir after flushing the buffer of records disk.

measurements

On my laptop, I saw write speeds consistent with what was reported in the paper; on average about 6 ms per record. Using the setMany I saw about 0.05 ms per record (the time to write the number of records / # of records). Reading a value took less than 0.1 ms.

I was curious to compare tmpdb to sqlite – if I used sqlite to store key value pairs. I created a table with a key column and value column, and then created a B-Tree index on the key column. I used Prisma ORM to query the sqlite database. I saw ~20 ms for writing and ~0.3 ms to read a value.

Conclusion

This was a fun, short project, and I learned several new things.