My first contact with the ideas of formalised mathematics came from reading the anonymously authored *QED Manifesto* [Ano94[Ano94]Anonymous*The QED manifesto* (1994)Automated Deduction--CADE]In this thesis, brief details on citations appear in the sidebar, a full bibliography is provided at the end of the document. which envisions a 'QED system' in which all mathematical knowledge is stored in a single, computer-verified repository.
This idea dizzied me: perhaps review of mathematics will amount to remarking on style and interest, with checking of proofs performed automatically from a machine readable document.

The general term that I will use for software that works towards this vision is **proof assistant** or **Interactive Theorem Prover** ITP. A proof assistant at its most general is a piece of software that allows users to create and verify mathematical proofs. In Section 2.1 I will provide more detail how proof assistants are generally constructed.

In 2007, Freek Wiedijk [Wie07[Wie07]Wiedijk, Freek*The QED manifesto revisited* (2007)Studies in Logic, Grammar and Rhetoric] pronounced the QED project to have "not been a success (yet)", citing not enough people working on formalised mathematics and the severe differences between formalised and 'real' mathematics, both at a syntactic level (formalised mathematics resembles source code) and at a foundational level (formalised mathematics is usually constructive and procedural as opposed to classical and declarative).
Similarly, Alan Bundy [Bun11[Bun11]Bundy, Alan*Automated theorem provers: a practical tool for the working mathematician?* (2011)Annals of Mathematics and Artificial Intelligence] notes that although mathematicians have readily adopted computational tools such as $TE X$ [Knu86[Knu86]Knuth, Donald E.*The TeXbook* (1986)] and computer algebra systemsA computer algebra system (CAS) is a tool for symbolic manipulation of formulae and expressions, without necessarily having a formalised proof that the manipulation is sound. Examples of CASes include Maple and Mathematica., computer aided proving has had very little impact on the workflow of a working mathematician. Bundy cites several reasons for this which will be discussed in Section 1.1.

Now, a decade later, the tide may be turning.
In 2021, proof assistants are pretty good. There are several well-supported large-scale systems such as Isabellehttps://isabelle.in.tum.de/ [Pau89], Coqhttps://coq.inria.fr/ [Coq], Leanhttps://leanprover.github.io/ [MKA+15], HOL Lighthttps://www.cl.cam.ac.uk/~jrh13/hol-light/ [Har09], Agdahttps://wiki.portal.chalmers.se/agda/pmwiki.php [Nor08], Mizarhttp://mizar.org/ [GKN15], PVShttps://pvs.csl.sri.com/ [SORS01] and many more. These systems are used to define and prove mathematical facts in a variety of logics (e.g. FOL, HOL, CIC, univalent foundations). These systems are bridged to powerful, automated reasoning systems (e.g. Vampirehttps://vprover.github.io [RV02], Z3https://github.com/Z3Prover/z3 [MB08], Ehttps://www.eprover.org [SCV19] and Leo-IIIhttp://page.mi.fu-berlin.de/lex/leo3/ [SB18a]. Within these systems, many theorems big and small (4-colour theorem [Gon05], Feit-Thompson theorem [GAA+13], Kepler conjecture [HAB+17]) have been proved in a variety of fields, accompanied by large mathematical libraries (Isabelle's Archive of Formal Proofshttps://www.isa-afp.org/, Lean's mathlibhttps://github.com/leanprover-community/mathlib, Coq's *Mathematical Components*https://math-comp.github.io/, Mizar's *Formalized Mathematics*https://fm.mizar.org/) whose intersection with undergraduate and research level mathematics is steadily growingSee, for example, the rate of growth of the Lean 3 mathematical library..

However, in spite of these advances, we are still yet to see widespread adoption of ITP by mathematicians outside of some (growing) cliques of enthusiasts. In this thesis I wish to address this problem through engaging with how mathematicians use and understand proofs to create new ways of interacting with formalised proof. Let's first expand on the problem a little more and then use this to frame the research questions that I will tackle for the remainder of the thesis.

# 1.1. Mathematicians and proof assistants

Here I offer 3 possible explanations for why mathematicians have not adopted proof assistants. Many have commented on these before: Bundy [Bun11] summarises the main challenges well.

**1. Differing attitudes towards correctness and errors.**
Mathematicians don't worry about mistakes in the same way as proof assistants doI will present some evidence for this in Section 2.6..
Mathematicians care deeply about correctness, but historically the dynamics determining whether a result is considered to be true are also driven by sociological mechanisms such as peer-review; informal correspondences; 'folk' lemmas and principles; reputation of authors; and so on [MUP80[MUP80]de Millo, Richard A; Upton, Richard J; *et al.**Social processes and proofs of theorems and programs* (1980)The mathematical intelligencer].
A proxy for trustworthiness of a result is the number of other mathematicians that have scrutinized the work.
That is, if the proof is found on an undergraduate curriculum, you can predict with a high degree of confidence that any errors in the proof will be brought to the lecturer's attention.
In contrast, a standalone paper that has not yet been used for any subsequent work by others is typically treated with some degree of caution.

**2. High cost.**
Becoming proficient in an ITP system such as Isabelle or Coq can require a lot of time.
And then formalising an area of maths can take around ten times the amount of time required to write a corresponding paper or textbook on the topic.
This time quickly balloons if it is also necessary to write any underlying assumed knowledge of the topic (e.g., measure theory first requires real analysis).
This 'loss factor' of the space cost of developing a formalised proof over that of a natural language proof was first noted by de Bruijn in relation to his AUTOMATH prover [DeB80[DeB80]De Bruijn, Nicolaas Govert*A survey of the project AUTOMATH* (1980)To H.B.Curry: Essays on Combinatory Logic,Lambda Calculus and Formalism]. De Bruijn estimates a factor of 20 for AUTOMATH, and Wiedijk later estimates this factor to be closer to three or four in Mizarhttps://mizar.org [Wie00[Wie00]Wiedijk, Freek*The de Bruijn Factor* (2000)]. There are costs other than space too, the main one of concern here being the time to learn to use the tools and the amount of work required per proof.

**3. Low reward.**
What does a mathematician have to gain from formalising their research? In many cases, there is little to gain other than confirming something the researcher knew to be true anyway. The process of formalisation may bring to light 'bugs' in the work: perhaps there is a trivial case that wasn't accounted for or an assumption needs to be weakened.
Sometimes the reward *is* high enough that there is a clear case for formalisation, particularly when the proof involves some computer-generated component.
This is exemplified by Hales' proof [Hal05[Hal05]Hales, Thomas C*A proof of the Kepler conjecture* (2005)Annals of mathematics] and later formalised proof [HAB+17[HAB+17]Hales, Thomas C; Adams, Mark; *et al.**A formal proof of the Kepler conjecture* (2017)Forum of Mathematics, Pi] of the Kepler conjecture. The original proof involved lengthy computer generated steps that were difficult for humans to check, and so Hales led the *Flyspeck* project to formalise it, taking 21 collaborators around a decade to complete.
Another celebrated example is Gonthier's formalisation of the computer-generated proof of the four-colour theorem [Gon05[Gon05]Gonthier, Georges*A computer-checked proof of the four colour theorem* (2005)].
Formalisation is also used regularly in formalising expensive hardware and safety-critical computer software (e.g., [KEH+09[KEH+09]Klein, Gerwin; Elphinstone, Kevin; *et al.**seL4: Formal verification of an OS kernel* (2009)Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, Pau98[Pau98]Paulson, Lawrence C*The inductive approach to verifying cryptographic protocols* (1998)Journal of computer security]).

The economics of the matter are such that the gains of using ITP are too low compared to the benefits for the majority of cases. Indeed, since mathematicians have a different attitude to correctness, there are sometimes *no* benefits to formalisation.
As ITP developers, we can improve the situation by either decreasing the learning cost or increasing the utility.

How can we make ITP easier to learn?
One way is to teach it in undergraduate mathematics curricula (not just computer science).
An example of such a course is Massot's *Introduction aux mathématiques formaliséeshttps://www.imo.universite-paris-saclay.fr/~pmassot/enseignement/math114/* taught at the Université Paris Sud.
Another way is to improve the usability of the user interface for the proof assistant; I will consider this point in more detail in Section 2.5.

How can we increase the utility that mathematicians gain from using a proof assistant?
In this thesis I will argue that one way to help with these three issues is to put more emphasis on interactive theorem provers providing *explanations* rather than a mere guarantee of correctness.
We can see that explanations are important because mathematicians care about new proofs of old results that treat the problem in a new way.
*Proofs from the Book* [AZHE10[AZHE10]Aigner, Martin; Ziegler, Günter M; *et al.**Proofs from the Book* (2010)] catalogues some particularly lovely examples of this.

Can computers also provide informal proofs with more emphasis on explanations?
Gowers [Gow10[Gow10]Gowers, W. T.*Rough structure and classification* (2010)Visions in Mathematics §2] presents an imagined interaction between a mathematician and a proof assistant of the future.

Mathematician.Is the following true? Let $δ>0$. Then for $N$ sufficiently large, every set $A⊆{1,2,...,N}$ of size at least $δN$ contains a subset of the form ${a,a+d,a+2d}$?

Computer.Yes. If $A$ is non-empty, choose $a∈A$ and set $d=0$.

M.All right all right, but what if $d$ is not allowed to be zero?

C.Have you tried induction on $N$, with some $δ=δ(N)$ tending to zero?

M.That idea is no help at all. Give me some examples please.

C.The obvious greedy algorithm gives the set ${1,2,4,5,10,11,13,14,28,29,31...}$

An interesting feature of this conversation is that the status of the formal correctness of any of the statements conveyed by the computer is not mentioned.
Similar notions are brought to light in the work of Corneli *et al.* [CMM+17[CMM+17]Corneli, Joseph; Martin, Ursula; *et al.**Modelling the way mathematics is actually done* (2017)Proceedings of the 5th ACM SIGPLAN International Workshop on Functional Art, Music, Modeling, and Design] in their modelling of informal mathematical dialogues and exposition.

Why not have both explanatory and verified proofs? I suspect that if an ITP system is to be broadly adopted by mathematicians, it must concisely express theorems and their proofs in a way similar to that which a mathematician would communicate with fellow mathematicians. This not only requires constructing human-readable explanations, but also a reimagining of how the user can interact with the prover.

In this thesis, I will focus on problems that are considered 'routine' for a mathematician.
That is, problems that a mathematician would typically do 'on autopilot' or by 'following their nose' For example, showing that $(a+b)_{2}=a_{2}+2ab+b_{2}$ from the ring axioms..
I choose to focus on this class of problem because I believe it is an area where ITP could produce proofs that explain why they are true rather than merely provide a certificate of correctness. The typical workflow when faced with a problem like this is to either meticulously provide a low-level proof or apply automation such as Isabelle's `auto`

, or an automation orchestration tool such as Isabelle's Sledgehammer [BN10[BN10]Böhme, Sascha; Nipkow, Tobias*Sledgehammer: judgement day* (2010)International Joint Conference on Automated Reasoning]. In the case of using an automation tacticBroadly, a tactic is a program for creating proofs. I will drill down on how this works in Chapter 2. like `auto`

the tactic will either fail or succeed, leaving the user with little feedback on why the result is true.
There are some tools for producing intelligible proofs from formalised ones, for example, the creation of Isar [Wen99[Wen99]Wenzel, Makarius*Isar-a generic interpretative approach to readable formal proof documents* (1999)TPHOLs] proofs from Sledgehammer by Blanchette *et al.* [BBF+16[BBF+16]Blanchette, Jasmin Christian; Böhme, Sascha; *et al.**Semi-intelligible Isar proofs from machine-generated proofs* (2016)Journal of Automated Reasoning].
However, gaining an intuition for a proof will be easier if the proof is generated in a way that reflects how a human would solve the problem, and so translating a machine proof to a proof which a human will extract meaning from is an uphill battle.

# 1.2. Research questions

The general arc of this thesis is to help make ITP systems more appealing to mathematicians. In the context of this arc, there arise some key research questions that I seek to study.

**Question 1.** What constitutes a human-like, understandable proof?

Objectives:

Identify what 'human-like' and 'understandable' mean to different people.

Distinguish between human-like and machine-like proofs in the context of ITP.

Merge these strands to determine a working definition of human-like proof.

**Question 2.** How can human-like reasoning be represented within an interactive theorem prover to produce *formalised*, understandable proofs?

Objectives:

Form a calculus of representing goal states and inference steps that act at the abstraction layer that a human uses when solving proofs.

Create a system for producing natural language proofs from this calculus.

Evaluate the resulting system by performing a study on real mathematicians.

**Question 3.** How can this mode of human-like reasoning be presented to the user in an interactive, multimodal way?

Objectives:

Investigate new ways of interacting with proof objects.

Make it easier to create novel

*graphical user interfaces*(GUIs) for interactive theorem provers.Produce an interactive interface for a human-like reasoning system.

# 1.3. Contributions

This thesis presents a number of contributions towards the above research questions:

An abstract calculus for developing human-like proofs (Chapter 3).

An interface between this abstraction layer and a metavariable-driven tactic state, as is used in theorem provers such as Coq and Lean, producing formally verified proofs (Chapter 3 and Appendix A).

A procedure for generating natural language proofs from this calculus (Section 3.6).

The 'subtasks' algorithm, a system for automating the creation of chains of equalities and inequalities. This work has been published in [AGJ19[AGJ19]Ayers, E. W.; Gowers, W. T.;

*et al.**A human-oriented term rewriting system*(2019)KI] (Chapter 4).A graphical user interface framework for interactive theorem provers (Chapter 5). This has been published in [AJG21[AJG21]Ayers, E. W.; Jamnik, Mateja;

*et al.**A graphical user interface framework for formal verification*(2021)Interactive Theorem Proving].An implementation of all of the above contributions in the Lean 3 theorem prover. Link to source code coming soon.

A study assessing the impact of natural language proofs with practising mathematicians (Chapter 6).

# 1.4. Structure of this document

In Chapter 2, I will provide an overview of the background material needed for the remaining chapters. Next, in Chapter 3, I introduce the HumanProof software for producing human-like proofs within the Lean proof assistant. I provide motivation of the design in Section 3.1, an overview of the system in Section 3.2 and then dive in to the details of how the system is designed, including the natural-language generation engine in Section 3.6. Chapter 4 discusses a system for producing equational reasoning proofs called the subtask algorithm. Chapter 5 details the ProofWidgets system, which is used to produce the user interface of HumanProof. Chapter 6 provides the design and results of a user study that I conducted on mathematicians to determine whether HumanProof really does provide understandable proofs. Finally, Chapter 7 wraps things up with some reflection on my progress and a look ahead to future work.

There are also four appendices:

Appendix A presents some additional technical detail on interfacing HumanProof with Lean.

Appendix B is a tutorial for using ProofWidgets.

Appendix C is some additional detail on the algorithms used by ProofWidgets.

Appendix D provides supplementary material for Chapter 6.

# 1.5. Previously published work and collaboration

The work in Chapter 3 is my own, although the box calculus presented is inspired through many sessions of discussion with W.T. Gowers and the design of Gowers' previous collaboration with Ganesalingam [GG17[GG17]Ganesalingam, Mohan; Gowers, W. T.*A fully automatic theorem prover with human-style output* (2017)J. Automated Reasoning]. More on this will be given when it is surveyed in Section 2.7 and Section 3.3.5.

The work in Chapter 4 is previously published at KI 2019 [AGJ19[AGJ19]Ayers, E. W.; Gowers, W. T.; *et al.**A human-oriented term rewriting system* (2019)KI].

The work presented in Chapter 5 is pending publication in ITP 2021 [AJG21[AJG21]Ayers, E. W.; Jamnik, Mateja; *et al.**A graphical user interface framework for formal verification* (2021)Interactive Theorem Proving] and is also merged in to the Lean 3 community repositoryhttps://github.com/leanprover-community/lean. The design is strongly influenced by Elm and React; however, there are a number of novel architectural contributions necessitated by the unique challenges of implementing a portable framework within a proof assistant.

The user study presented in Chapter 6 is all my own work with a lot of advisory help from Mateja Jamnik, Gem Stapleton and Aaron Stockdill on designing the study.

# 1.6. Acknowledgements

I thank my supervisors W. T. Gowers and Mateja Jamnik for their ideas, encouragement and support and for letting me work on such a wacky topic.
I thank Gabriel Ebner and Brian Gin-ge Chen for reading through my ProofWidgets PRs. I thank Patrick Massot, Kevin Buzzard and the rest of the Lean Prover community for complaining about my PRs after the fact.
I thank Jeremy Avigad for taking the time to introduce me to Lean at the *Big Proof* conference back in 2017.
I thank Bohua Zhan, Chris Sangwin, and Makarius Wenzel and many more for the enlightening conversations on automation for mathematicians at *Big Proof* and beyond.
I thank Peter Koepke for being so generous in inviting me to Bonn to investigate Naproche/SAD with Steffan Frerix and Andrei Paskevich.
I thank Larry Paulson and the ALEXANDRIA team for letting me crash their weekly meetings.
I thank my parents for letting me write up in the house during lockdown.

I thank my friends and colleagues in the CMShttp://www.cms.cam.ac.uk. Andrew, Eric, Sammy P, Sven, Ferdia, Mithuna, Oliver×2, Kasia, Sam O-T, Bhavik, Wojciech, ... all kinds of people. In parallel, the CLhttps://www.cst.cam.ac.uk: Chaitanya, Botty, Duo, Daniel, Aaron, Angeliki, Yiannos, Wenda, Zoreh.

I thank everyone in the world.

This research was supported by EPSRC and the Cantab Capital Institute for the Mathematics of Informationhttps://www.ccimi.maths.cam.ac.uk/.

## 1.6.1. Typesetting acknowledgements

I decided to typeset this thesis as HTML-first, print second. The digital copy may be found at edayers.com There is no $TE X$ anywhere, and so I would first like to apologise to Donald Knuth. The printed version of this thesis was generated by printing out the website version and concatenating.

I was able to create the thesis in this way thanks to many open-source projects. I will acknowledge the main ones here. Reacthttps://reactjs.org/, Gatsbyhttps://www.gatsbyjs.org/, Tachyonshttps://tachyons.io, PrismJShttps://prismjs.com. Thanks to Titus Woormerhttps://github.com/wooorm for remarkJShttp://github.com/remarkjs and also adding my feature request in less than 24 hours! The code font is PragmataProhttps://fsd.it/pp created by Fabrizio Schiavi. The style of the site is a modified version of the Edward Tufte Handout stylehttps://github.com/edwardtufte/tufte-css. The syntax colouring style is based on the VS theme by Andrew Lockhttps://andrewlock.net. I also use some of the vscode-iconshttps://github.com/microsoft/vscode-icons icons.