E.W.Ayers
Chapter 7
Conclusion

We made it.

7.1. Revisiting the research questions

Let's review the research questions that I set out to answer in Section 1.2 and outline the contributions that I have made towards them.

7.1.1. What constitutes a human-like, understandable proof?

Identify what 'human-like' and 'understandable' mean to different people.

In Section 2.6, I investigated the work of other mathematicians and mathematics educators on the question of what it means to understand a proof of a theorem. This review took me from educational research to the works of Spinoza, but yielded little in the way of explicit answers to this question. Perhaps this is expected, the answer of understandability In Chapter 6, I asked some practicing mathematics students at the University of Cambridge what features of a proof made it understandable to them. The participants remarked that a proof being understandable is a function of numerous factors: providing the intuition or motivation of a proof first, signposting the purpose of various sections of a proof and providing the right level of detail. One thing that was frequently stressed, however, was that syntax and notation of proofs only played a minor role in how understandable a proof is; while unfamiliar syntax only hinders understanding temporarily and may be overcome by becoming familiar with the notation.

Distinguish between human-like and machine-like in the context of ITP. A similar review was undertaken in Section 2.7 for what constitutes 'human-like' reasoning. The topic has received attention from early efforts to create proof assistants and automated theorem provers up to the present day. My conclusion from this review is that 'human-like' is best understood as referring to a general approach to ATP algorithm design, in contrast to 'machine-like'. Human-like proving techniques emphasise reasonsing methods that are compatible with how humans reason, in the sense that a proof is intelligible for a human. This can be Pre-80s, for example, Robinson's resolution theorem proving was the dominant architecture of provers [BG01[BG01]Bachmair, Leo; Ganzinger, HaraldResolution theorem proving (2001)Handbook of automated reasoning]. However as noted by Bledsoe [Ble81[Ble81]Bledsoe, Woodrow WNon-resolution theorem proving (1981)Readings in Artificial Intelligence], repeated application of the resolution rule (AB)(¬AC)BC can hardly be called 'human-like', and such a proof would not be found in a mathematical textbook. Since human-like is defined more in terms of what it is not, there are a wide variety of approaches which may all be described as human-like: proof planning, diagrammatic reasoning, and graphical discourse models. However I chose to focus on human-like logical reasoning and modelling how an undergraduate may approach writing a proof.

Merge these strands to create a determine a working definition of human-like. In Section 3.1, I decided that I would deem the design of the system as human-like if it was similar enough to the resoning of humans that it could produce natural language write-ups that were convincing enough for mathematicians. I also restricted myself to only look at elementary 'follow your nose' proofs.

7.1.2. How can human-like reasoning be represented within an interactive theorem prover to produce formalised, understandable proofs?

Form a calculus of representing goal states and inference steps that acts at the abstraction layer that a human uses when solving proofs. In Chapter 3 I detail a development calculus created for the purpose of representing human-like goal states in accordance with the working definition given in Section 7.1.2. The calculus (defined in Section 3.3.2) makes use of a hierarchical proof state structure that also incrementally constructs a formal proof term in dependent type theory. The calculus is compared with other designs (Section 3.3.5), of which the closest is the design to Gowers and Ganesalingam's Robot prover [GG17[GG17]Ganesalingam, Mohan; Gowers, W. T.A fully automatic theorem prover with human-style output (2017)J. Automated Reasoning] and McBride's OLEG [McB00[McB00]McBride, ConorDependently typed functional programs and their proofs (2000)]. I then provided a set of moves for manipulating these proof states and provide proofs that these moves are sound (Section 3.4 and Appendix A).

In Chapter 4, I also introduce a new algorithm for creating human-like equational reasoning proofs, developed in collaboration with W.T. Gowers and Mateja Jamnik.

Create a system for also producing natural language proofs from this calculus. The component that performs this is detailed in Section 3.6.

Evaluate the resulting system by performing a study on real mathematicians. This was discussed in Chapter 6.

I made some significant progress towards this research goal, however the solution that I have implemented can be found to stumble upon given harder examples, both in terms of automation and in the write-ups getting progressively clunkier upon growing in size. The implementation as it stands also does not extend to more difficult domains where some detail must be hidden in the name of brevity. I will outline some specific solutions to these issues in Section 7.2. In the end, I chose to focus less on extending the automation of moves beyond what was available in Robot and instead focus on subtasks Chapter 4 and interactive theorem proving through a graphical user interface.

I believe that these defects could be fixed with more research, however one has to ask whether such a human-research-intensive approach is going to be a good long-term solution. This question becomes particularly salient when faced with the advent of large-scale deep learning language models:

Very recently, we are starting to see applications of attention based models [VSP+17[VSP+17]Vaswani, Ashish; Shazeer, Noam; et al.Attention is All you Need (2017)Neural Information Processing Systems] (also known as transformers) to the problem of predicting human-written proofs of mathematics with promising results: Li et al direct transformer models towards predicting steps in Isabelle/Isar[Wen99], see Section 2.7.1[Wen99]Wenzel, MakariusIsar-a generic interpretative approach to readable formal proof documents (1999)TPHOLs proofs [LYWP21[LYWP21]Li, Wenda; Yu, Lei; et al.IsarStep: a Benchmark for High-level Mathematical Reasoning (2021)International Conference on Learning Representations]. See also Lample and Charton's work on applying transformers to algebraic reasoning and integration [LC20[LC20]Lample, Guillaume; Charton, FrançoisDeep Learning For Symbolic Mathematics (2020)ICLR]. Some work in this space that I have been involved with is with Han, Rute, Wu and Polu [HRW+21[HRW+21]Han, Jesse Michael; Rute, Jason; et al.Proof Artifact Co-training for Theorem Proving with Language Models (2021)arXiv preprint arXiv:2102.06203] on training GPT3 [BMR+20[BMR+20]Brown, Tom B.; Mann, Benjamin; et al.Language Models are Few-Shot Learners (2020)NeurIPS] to predict Lean 3 tactics. The success of this approach strongly suggests that deep learning methods will play a critical role to the future of human-like automated reasoning. Through the use of statistical learning, the nuances of generating natural language and determining a precise criterion for what counts as 'human-like' can be avoided by simply providing a corpus of examples of human-like reasoning. Deep learning models are notorious for being data-hungry, and so there are still many questions remaining on how the data will be robustly extracted from our mathematical texts and formal proof archives. Perhaps few-shot techniques (see [BMR+20]) will help here. The research touched on above indicates that this method is not incompatible with also producing formalised proofs, although some care will need to be taken to be sure that the formal proofs and the human-readable accounts correspond correctly to each other.

7.1.3. How can this mode of human-like reasoning be presented to the user in an interactive, multimodal way?

Investigate new ways of interacting with proof objects. The result of working on this subgoal was the interactive expression engine of the ProofWidgets framework as discussed in Section 5.4. This system follows a long history of research on 'proof-by-pointing' starting with Bertot and Théry [BT98[BT98]Bertot, Yves; Théry, LaurentA generic approach to building user interfaces for theorem provers (1998)Journal of Symbolic Computation], and my approach mainly follows similar work in other systems, for example the system found in KeY [ABB+16[ABB+16]Ahrendt, Wolfgang; Beckert, Bernhard; et al.Deductive Software Verification - The KeY Book (2016)]. My approach is unique in the coupling of the implementation of proof-by-pointing with the general purpose ProofWidgets framework.

Make it easier to create novel GUIs for interactive theorem provers. This was the primary mission of Chapter 5. As noted in Section 2.5, there are many existing GUI systems that are used to create user interfaces for interactive theorem proving. In Chapter 5 I contribute an alternative paradigm for creating user interfaces where the metalanguage of the prover itself is used to create proofs. The ProofWidgets system as implemented in Lean 3 is already in use today.

Produce an interactive interface for a human-like reasoning system. In Section 5.7, I connected the ProofWidgets framework to the Box datastructure to create an interactive, formalised human-like proof assistant. There are still plenty of things to do, but as a prototype it serves to achieve the research goal.

I regard this research goal as being achieved. As mentioned there are many more implementation improvements that could be made and I will provide some things that can be done in Section 7.2.

I hope that this work will be viewed as a modern revival of the spirit and approach taken by the older, proof-planning-centric provers such as LΩUI for Ωmega [SHB+99[SHB+99]Siekmann, Jörg; Hess, Stephan; et al.LOUI: Lovely OMEGA user interface (1999)Formal Aspects of Computing, BCF+97[BCF+97]Benzmüller, Christoph; Cheikhrouhou, Lassaad; et al.Omega: Towards a Mathematical Assistant (1997)Automated Deduction - CADE-14] and XBarnacle for CLAM [LD97[LD97]Lowe, Helen; Duncan, DavidXBarnacle: Making theorem provers more accessible (1997)Automated Deduction—CADE-14]. This spirity was to make proof assistants accessible to a wider userbase through the use of multi-modal user interfaces that could represent the proofs in many different ways. I want to rekindle some of this optimism that better user interfaces can lead to wider adoption of ITP.

7.2. Future work and closing remarks

There are still many things that I want to do to HumanProof and to investigate the world of human-like automated reasoning. Some of the more technical and chapter-specific ideas for future work are covered in their respective chapters (Section 3.7, Section 4.6, Section 5.8, Section 6.8), in this section I will restrict my attention to future research directions in a broader sense.

In this thesis, the purpose of the natural language generator was to demonstrate that the system was human-like. However one question that arises from the evaluation study in Chapter 7 is whether natural language generation is useful for creating accessible ITPs. Participants were generally willing to learn to use a new syntax and language for mathematicsAlthough note that this may be due to sample selection bias Section 6.7., which suggests that the main hurdle to adoption is not the use of a technical language. An additional focus group or study investigating whether natural language proofs play a role in ease of use of a theorem prover would be helpful in determining whether natural langauge generation of mathematics should be pursued in the future.

The evaluation showed that mathematicians value signpostingAs discussed in Section 6.6.1, signposting here means an indication of how the proof is going to progress without actually performing any inference, motivation and the right level of detail. I didn't focus on these aspects so much in the design of HumanProof. Is there a way of automating these more expository aspects of human-written proofs? The question of determining the right level of exposition has some subjective and audience-specific component to it, however I suspect that it is still possible to make some progress in this direction: the gap in comprehensibility between a human-written proof and a generated proof for any non-trivial example is undeniable. Rather than try to build an 'expert system' of determining the right level of exposition, I think that the right appraoch is to use modern machine learning approaches as touched on in Section 7.1.2.

7.2.1. Closing remarks

Thank you for reading my thesis. HumanProof is not production-ready software, and so I would not recommend using it for the day-to-day of formalisation. I do hope that HumanProof will provide some ideas and inspiration in how the theorem provers of the future are designed.

Bibliography for this chapter

  • [ABB+16]Ahrendt, Wolfgang; Beckert, Bernhard; Bubel, Richard; Hähnle, Reiner; Schmitt, Peter H.; Ulbrich, MattiasDeductive Software Verification - The KeY Book (2016)volume 10001publisher Springerdoi 10.1007/978-3-319-49812-6isbn 978-3-319-49811-9view online
  • [BCF+97]Benzmüller, Christoph; Cheikhrouhou, Lassaad; Fehrer, Detlef; Fiedler, Armin; Huang, Xiaorong; Kerber, Manfred; Kohlhase, Michael; Konrad, Karsten; Meier, Andreas; Melis, Erica; Schaarschmidt, Wolf; Siekmann, Jörg H.; Sorge, VolkerOmega: Towards a Mathematical Assistant (1997)Automated Deduction - CADE-14volume 1249pages 252--255editor McCune, Williampublisher Springerdoi 10.1007/3-540-63104-6_23view online
  • [BG01]Bachmair, Leo; Ganzinger, HaraldResolution theorem proving (2001)Handbook of automated reasoningpages 19--99editors n.b.publisher Elsevier
  • [BMR+20]Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, DarioLanguage Models are Few-Shot Learners (2020)NeurIPSeditors n.b.view online
  • [BT98]Bertot, Yves; Théry, LaurentA generic approach to building user interfaces for theorem provers (1998)Journal of Symbolic Computationvolume 25number 2pages 161--194publisher Elsevierview online
  • [Ble81]Bledsoe, Woodrow WNon-resolution theorem proving (1981)Readings in Artificial Intelligencepages 91--108publisher Elsevierview online
  • [GG17]Ganesalingam, Mohan; Gowers, W. T.A fully automatic theorem prover with human-style output (2017)J. Automated Reasoningvolume 58number 2pages 253--291doi 10.1007/s10817-016-9377-1view online
  • [HRW+21]Han, Jesse Michael; Rute, Jason; Wu, Yuhuai; Ayers, Edward W; Polu, StanislasProof Artifact Co-training for Theorem Proving with Language Models (2021)arXiv preprint arXiv:2102.06203view online
  • [LC20]Lample, Guillaume; Charton, FrançoisDeep Learning For Symbolic Mathematics (2020)ICLRpublisher OpenReview.netview online
  • [LD97]Lowe, Helen; Duncan, DavidXBarnacle: Making theorem provers more accessible (1997)Automated Deduction—CADE-14pages 404--407publisher Springerview online
  • [LYWP21]Li, Wenda; Yu, Lei; Wu, Yuhuai; Paulson, Lawrence C.IsarStep: a Benchmark for High-level Mathematical Reasoning (2021)International Conference on Learning Representationsview online
  • [McB00]McBride, ConorDependently typed functional programs and their proofs (2000)view online
  • [SHB+99]Siekmann, Jörg; Hess, Stephan; Benzmüller, Christoph; Cheikhrouhou, Lassaad; Fiedler, Armin; Horacek, Helmut; Kohlhase, Michael; Konrad, Karsten; Meier, Andreas; Melis, Erica; Pollet, Martin; Sorge, VolkerLOUI: Lovely OMEGA user interface (1999)Formal Aspects of Computingvolume 11number 3pages 326--342publisher Springerview online
  • [VSP+17]Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, IlliaAttention is All you Need (2017)Neural Information Processing Systemspages 5998--6008editors n.b.view online
  • [Wen99]Wenzel, MakariusIsar-a generic interpretative approach to readable formal proof documents (1999)TPHOLsvolume 99pages 167--184organization Springer
© 2021 E.W.Ayers. Built with Gatsby