Links for January 2025

Sun Jan 26 2025E.W.Ayers

I'm trying to get back into the habit of writing these links posts :-) Some of these are links from last year that I never posted.

1. Tripping over the potholes in too many libraries

Tripping over the potholes in too many libraries. This is a controversial opinion that I hold about software development. There are too many libraries. I have some additional thoughts.

Lots of hackers will happily depend on huge numbers of libraries to avoid writing their own code. This is fine, but we should be more biased away from depending on random code from the internet. Taking on a new external code dependency isn’t a costless operation:

Meanwhile

There are a few cases where you really do want a library

The mistake that library dependers are making is that each additional dependency multiplies the cost, since each library interacts with the others. You end up writing workarounds to get the library working in your specific circumstance, and then eventually the workarounds are stacked on top of each other to the point where maybe you should have just written the exact thing you wanted yourself in the first instance.

So when do you want a library? I think the rule I am going with is:

  1. Write down a specific problem that I need to solve with a library

  2. Look at the ecosystem and see what libraries are available.

  3. Investigate the library:

    • are github issues being replied to?

    • When was the last commit?

    • When was the last release?

    • Are the library maintainers using static typing? (ie TypeScript or Python type annotations)

    • Is there more than autogenned documentation and a single tutorial?

    • Find the 'workhorse' source file in the library that is doing most of the work and make sure it is something you would be happy checking in to your source. For example in starlette, it's routing.py.

    • Are they OOP architecture astronauts? My check for this is whether there is dependency injection, but the only two items that are ever injected are the actual implementation and mocks for testing. It's ok to use design patterns like visitor etc.

    • look at its dependencies: does it pull in half of the internet? Red flags are when it pulls in multiple libraries that do the same thing (eg both httpx and requests in python), or it depends on is-even.

  4. grudgingly take on the library as a dependency.

  5. As soon as it breaks, or has missing features, or causes your ci to fail: rip it out, paste in the inlined 'workhorse' file and DIY. You can even copy in their unit tests!

A large number of libraries offer little benefit to writing your own:

This is especially true now we have LLMs, which can copy the bit of library code that you need inline and tailor it to your exact requirements.

This whole section might all be NIH cope.

2. Video: Bet against SQL

Bet Against SQL

Thesis: SQL is a bad abstraction

It's a declarative, logic programming language like Prolog. Prolog looks great in the tutorial, but the moment you try to do something complicated, it will slow to a crawl. And now you have to reason about how your prolog query will get executed, which means you have to think carefully about what the compiler is doing. At this point the abstraction is in the way and is actively making your job harder. Same for SQL. The SQL query is a declaration of what data you want, and it is intended to be completely decoupled from the execution of that query. If you want to use SQL in a production setting, you need to have an intimate knowledge of the query planner. Query optimiser is a whole job category that fetches high/mid six-figure salaries at enterprises.

An interesting note is query planners are incompatible with serverless dbs (planetscale, supabase, ...). Your costs are proportional to rows read, so the serverless provider is not incentivised to write a query planner that minimises table scans etc.

SQL's type system sucks. Clear failure. SQL is not modular and composable at all. Any isolation level less than serializable is just too hard for the programmer to think about.

What about NoSQL? It took off 10 years ago, but then everyone went back to relational. This is mainly because Mongo threw out loads of important stuff that relational dbs got right: transactions, relational models, schemas, consistency.

So what do they propose? Ah they are trying to sell yet another framework Convex. They don't give a lot of detail on how it works other than it uses a DynamoDB-style key-only querying thing. I'll check it out.

Cool papers they reference

3. Erg

I found another language called Erg. Someone shared Pylyzer which is a spinoff Python typechecker from this Erg project.

They have a specific type for functions with side-effects (called 'procedures') that must be decorated with an exclamation mark. There are some primitive procedures for mutating external state like print!. You can define a mutable variable as x =! 0. Then mutating this state within a procedure will taint the procedure as mutating x. So in this way a procedure will build up a catalogue of side-effects that it has like a good effect system.