Links for January 2025
Sun Jan 26 2025E.W.Ayers
I'm trying to get back into the habit of writing these links posts :-) Some of these are links from last year that I never posted.
an API for changes in data -- good stuff
Hot Module Replacement is Easy -- an explanation of how hot module reloading works in Vite.
an api for changes -- really good.
video: Deno 2 -- top quality video
Red knot is Astral's typechecker project! I am greatly looking forward to throwing pyright in the bin.
I found a new blog with loads of posts on Zig: Karl Seguin.
I got excited reading about automerge, reading their binary format spec gave me a good feel for how it works.
There is this dude Anselm Eickhoff who I followed ages ago because he made a city simulator, but now he's working on a web persistence layer Jazz. I am very excited by this next round of local-first tooling coming through. The best place to stay up to date with it is localfirst.fm.
In that same space we've also got PocketBase and Convex.
1. Tripping over the potholes in too many libraries
Tripping over the potholes in too many libraries. This is a controversial opinion that I hold about software development. There are too many libraries. I have some additional thoughts.
Lots of hackers will happily depend on huge numbers of libraries to avoid writing their own code. This is fine, but we should be more biased away from depending on random code from the internet. Taking on a new external code dependency isn’t a costless operation:
Libraries have bugs. And you can’t control when the fixes get released.
Libraries mutate uncontrollably. The reality of patch releases is they usually break things or introduce bugs.
The library you want doesn’t quite do what you want and you end up spending 5 hours writing workarounds and scrolling through library docs, when you could have written the code from scratch in 5 hours. If an abstraction isn’t working, go down not up.
Supply chain attacks are a real threat. A spectacular example was the xz backdoor. You implicitly trust the library writer at the same level that you trust a colleague pushing directly to your source.
IP issues (for use at your company): the explicit dep might be MIT. But child dependencies might have a more restrictive licence.
Each library adds package-manager bloat, particularly with JavaScript ecosystem. Each dependency increases the friction of installing dependencies. It's only a matter of time before something halfway down the dependency graph becomes deprecated or has some weird, incomprehensible build step that breaks
npm install
.
Meanwhile
The code you actually want to do the task is usually ~100 lines, with lots of cruft around it to cater to the 50 other use cases that don’t matter to you.
LLMs already know the libraries’ source back to front (prompt: write a mini react-router library in one file). They already know how to write unit tests for your code.
If you learn how library code works, you can simplify it and make it work really well for your actual use case, instead of working around it.
There are a few cases where you really do want a library
Anything to do with cryptography -- although you have to be very careful it's done properly.
Any implementation of a protocol such as HTTP, MessagePack, compression.
Don't write your own graphics or ui framework -- just use web technology.
Self contained datastructures eg btrees
The mistake that library dependers are making is that each additional dependency multiplies the cost, since each library interacts with the others. You end up writing workarounds to get the library working in your specific circumstance, and then eventually the workarounds are stacked on top of each other to the point where maybe you should have just written the exact thing you wanted yourself in the first instance.
So when do you want a library? I think the rule I am going with is:
Write down a specific problem that I need to solve with a library
Look at the ecosystem and see what libraries are available.
Investigate the library:
are github issues being replied to?
When was the last commit?
When was the last release?
Are the library maintainers using static typing? (ie TypeScript or Python type annotations)
Is there more than autogenned documentation and a single tutorial?
Find the 'workhorse' source file in the library that is doing most of the work and make sure it is something you would be happy checking in to your source. For example in
starlette
, it'srouting.py
.Are they OOP architecture astronauts? My check for this is whether there is dependency injection, but the only two items that are ever injected are the actual implementation and mocks for testing. It's ok to use design patterns like visitor etc.
look at its dependencies: does it pull in half of the internet? Red flags are when it pulls in multiple libraries that do the same thing (eg both
httpx
andrequests
in python), or it depends onis-even
.
grudgingly take on the library as a dependency.
As soon as it breaks, or has missing features, or causes your ci to fail: rip it out, paste in the inlined 'workhorse' file and DIY. You can even copy in their unit tests!
A large number of libraries offer little benefit to writing your own:
They are a wrapper for HTTP requests or for CLI calls (which are often themselves wrapping HTTP requests):
python docker library
boto3
(AWS python client)azure python client
They are <100 lines of code to replicate
tqdm
asyncio-stdlib
They are implementing a special case of recursion
pydantic
,attrs
,dacite
= recursion on dataclasses
Harmful abstraction
SQLalchemy
and other ORMs -- just write SQL.ui frameworks like React, Vue, etc. Just manipulate the DOM and use CSS. The whole point of the browser is it abstracts the ui for you already.
This is especially true now we have LLMs, which can copy the bit of library code that you need inline and tailor it to your exact requirements.
This whole section might all be NIH cope.
2. Video: Bet against SQL
Thesis: SQL is a bad abstraction
It's a declarative, logic programming language like Prolog. Prolog looks great in the tutorial, but the moment you try to do something complicated, it will slow to a crawl. And now you have to reason about how your prolog query will get executed, which means you have to think carefully about what the compiler is doing. At this point the abstraction is in the way and is actively making your job harder. Same for SQL. The SQL query is a declaration of what data you want, and it is intended to be completely decoupled from the execution of that query. If you want to use SQL in a production setting, you need to have an intimate knowledge of the query planner. Query optimiser is a whole job category that fetches high/mid six-figure salaries at enterprises.
An interesting note is query planners are incompatible with serverless dbs (planetscale, supabase, ...). Your costs are proportional to rows read, so the serverless provider is not incentivised to write a query planner that minimises table scans etc.
SQL's type system sucks. Clear failure. SQL is not modular and composable at all. Any isolation level less than serializable is just too hard for the programmer to think about.
What about NoSQL? It took off 10 years ago, but then everyone went back to relational. This is mainly because Mongo threw out loads of important stuff that relational dbs got right: transactions, relational models, schemas, consistency.
So what do they propose? Ah they are trying to sell yet another framework Convex. They don't give a lot of detail on how it works other than it uses a DynamoDB-style key-only querying thing. I'll check it out.
Cool papers they reference
3. Erg
I found another language called Erg. Someone shared Pylyzer which is a spinoff Python typechecker from this Erg project.
The syntax looks like a cross between F# and Python.
It looks like they have thought a great deal about the type system. Really complicated. People need to stop making complicated type systems.
They have a specific type for functions with side-effects (called 'procedures') that must be decorated with an exclamation mark.
There are some primitive procedures for mutating external state like print!
.
You can define a mutable variable as x =! 0
. Then mutating this state within a procedure will taint the procedure as mutating x
.
So in this way a procedure will build up a catalogue of side-effects that it has like a good effect system.