My dream citation manager
Mon Aug 21 2023E.W.Ayers
Here is a list of features that I want for a reference manager.
any time you are browsing Google Scholar, Arxiv, etc you can add papers to your library (supported in paperpile).
Web pages are also valid documents.
Given a paper, automatically determine links to supplementary material, reviews, talks, images and videos, project webpages, twitter threads etc.
papers are decompiled to HTML that you can read them properly on a screen (eg see ar5iv)
headings, figures and paragraphs are assigned unique ids so you can reference them
Tables can easily be extracted to csv or dataframe
in-document links are handled properly
figures are ideally stored in SVG or canonical image file is sought.
math, code and other formatting are stored properly with semantics, not as janky images etc.
Citations:
each citation is replaced with a direct link to the paper cited.
bibliography is replaced with properly resolved list of citations
each citation appears as a 'card' with info like author, citations, versions, canonical URL, 'save to library' button.
see papers, webpages, tweets that cite this document.
usual features for exporting to bibtex or whatever.
Versioning:
easily see all versions of a document and what changed.
Much better object identity resolution for papers. even google scholar does this badly sometimes.
Preprints and published versions should be stored as different versions of the same document.
PDFs, source and HTML versions are stored together.
each document artefact is identified by its SHA-1.
Personal Annotations:
Highlight text
Add inline comments (shown in a sidebar)
Summary document (shown side-by-side or block-inline)
full markdown and math supported.
user can add tags, coloured etc
AI summaries, tagging.
Community annotations (optional)
Users can help maintain metadata, versioning and formatting issues for each document.
Each document eventually has a community driven QA, reviews, summary, links, wiki discussion on it.
Users can rate papers, save 'playlists' of papers etc.
frequently highlighted parts of the paper
It's not primarily a social-media-for-academic-papers tool, but its all there if needed.
There is a 'stream' page which gives you all the latest papers from a variety of news sources; arxiv with filters, openreview, papers tweeted by channels that you follow. Authors you follow.
reading state management;
I want a to-read queue, custom status fields like in Notion.
I want to know stats like whether I have seen it and how long I've looked at it etc.
reading history (like in google chrome)
I can bookmark points in papers
If I open a paper, it automatically scrolls to the point I was last looking at (this is infuriatingly absent in most browsers and pdf readers).
tabs, tab trees, tab groups, tab windows etc etc.
storage:
there is a managed cloud service that stores all the metadata.
But there are copyright issues with storing content on this server. So you can also maintain a local store of the content and only content-hashes of content are stored on the cloud instance. (This is what Paperpile does; the content is stored on user's Google Drive to avoid the copyright issue).
People can run their own content-addressed servers that users can plug into (at their own legal risk) if they can't access the official source.
Galaxy-brain move: do all of the above, but with all web-pages and get it to replace your browser (at least for consuming content, not for webapps). Ditto with books.
My understanding is the Arc browser is trying to do this but it never stuck with me. I really want the focus to be on having a library of static content rather than dynamic pages.