Links for Feb and March

Sat Apr 13 2024E.W.Ayers

1. PostgreSQL

I read PostgreSQL Internals. Very well written. I have mad respect for Postgres. It is a good supplement to Pavlo's database course. Some features that are worth knowing about:

1.1. Techdebt in Postgres

However, there were lots of points reading it where you can see the pain from keeping decisions made in the 80s piling up:

The problem is you have N complicated components, which are tightly coupled. Then some fundamental issue is found with component i (eg row size limit), so a solution is duct-taped on in a backwards compatible manner. But then there are N² interactions from adding this duct-tape, so you have a cascade of mess throughout the entire architecture.

1.2. Postgres dreams

The main thing that frustrates me about PostgreSQL is there is almost a full-featured programming environment in there, but then people just write their business logic in Python, Javascript, Java and stick an API on the front. If Postgres had gone a bit further (support HTTP, better auth, better DX, maybe even html templating), you could stick it directly on the internet and have it as your backend without needing to wrap it in nonsense backend layer. The reasons why this isn't done seem to be:

I think that most of these bullet points could be addressed with a carefully designed extension to Postgres, but the reality is that making postgres extensions is extremely hard for all of the tech-debt reasons discussed above. There are loads of startups (see [/blog/dbs](my database roundup)) trying to solve this but they are all doing it by adding overcomplicated application-layer cruft on top of Postgres so they can sell it as a service.

There is a Figma blog post on 'scaling postgres'. It just feels like they are bending over backwards to keep using Postgres but it's not fit for purpose.

2. RocksDB

It's a key-value (KV) store database. Keys and values are just byte arrays. It is really cool. Reading about the internals feels lovely compared to Postgres. Basically all the hot new databases cash out as a fancy frontend on a KV store. Then they choose RocksDB for the store and it is fast.

Something about the levelled compaction mechanism captures my imagination. The way I see it: you can either store state as logs, where it's an append-only stream of actions (fast write, easy to version), or you can store it as a mutating table or state (fast query, hard to version). LST seems enlightened in that there is not a hard switch between logs and state, the logs are gradually converted to a sorted KV store through compaction. Since the blocks are immutable we can also keep snapshots without a lot of trouble.

Part of me still wants a 'time-travelling' DB where you can rewind and replay transactions to arbitrary points in time. However there is a performance cost to this which is unacceptable in real-world systems. You usually only want this for certain user-facing tables, and it's probably better to have the time-travel auditing managed at the application layer.

I like how explicit the docs are about the tradeoffs between read/write space/time. Compare with Postgres, where figuring out what the performance tradeoffs are is difficult because everything is wrapped in a query planner abstraction.

Oh yeah, also rocksdb is embedded, so you can just import it as a library rather than needing a ratsnest of dedicated processes like postgres.

Conclusion: next time I need a low-level persistence layer I'm choosing RocksDB.

3. Zig

I have been writing a GLR parser in Zig. It's just for fun. I was inspired to do so by this video about TreeSitter.

I'm starting to get into the flow of Zig. But there are still some things that I really miss:

good things: