This chapter is about the
subtask algorithm. This algorithm produces human-like equality proofs by representing equality reasoning problems using a hierarchy of subtasks.
The work in this chapter also appears as a paper published in KI 2019 [AGJ19[AGJ19]Ayers, E. W.; Gowers, W. T.; et al.A human-oriented term rewriting system (2019)KI].
The work was a collaboration with my PhD supervisors W.T. Gowers and Mateja Jamnik.
I'm going to keep using the first person to keep style with the rest of the document.
Equality and inequality chains are ubiquitous in mathematics.
Here, by equality chain, I mean parts of proofs found in mathematical literature that consist of a list of expressions separated by
= or some other transitive relation symbol.
The chains are chosen such that each pair of adjacent expressions are clearly equal to the reader, in the sense that the equality does not need to be explicitly justified. And hence, by transitivity, the chain shows that first expression is equal to the last one.
For example, take some vector space
V. Suppose that one wishes to prove that given a linear map
𝐴 : V ⟶ V, its adjoint
𝐴† : V* → V* (
V* denotes the dual spacehttps://en.wikipedia.org/wiki/Dual_space of
V) is linear.
To do so one typically provides the equality chain (4.1) for all vectors
𝑥 and all dual vectors
The equations that we can compose our reasoning chain from (e.g.,
⟨𝐴† 𝑎, 𝑏⟩ = ⟨𝑎, 𝐴 𝑏⟩) are called
For the example (4.1), there are a large number of axiomatic rewrite rules available (4.2) and still more rules derived from these.
A central part of automated theorem proving (ATP) is constructing equality proofs such as (4.1) from (4.2) automatically. This can be done with well-researched techniques from the field of term rewriting systems [BN98[BN98]Baader, Franz; Nipkow, TobiasTerm rewriting and all that (1998)]. These techniques take advantage of the fact that computers can perform many operations per second, and large search spaces can be explored quickly, though heuristic functions are still needed to prevent a combinatorial explosion. Many domains - such as checking that two expressions are equal using the ring axioms - also have specialised decision procedures available for them. I'll call these approaches to solving equalities machine-oriented; this contrasts with human-oriented as discussed in Section 2.7.
In accordance with the research goals of this thesis (Section 1.2), the purpose here is to investigate alternative, human-like ways of producing equality proofs. As motivated in Section 1.1, this serves the purpose of improving the usability of proof assistants by making the proofs generated more understandable (Section 2.6). The goal of this chapter is not to develop methods that compete with machine-oriented techniques to prove more theorems or prove them faster. Instead, I want to focus on the abstract reasoning that seems to occur in the mind of a human mathematician when they encounter an equality reasoning problem such as (4.1).
With this in mind, the goal of this chapter is to create an algorithm which:
can solve simple equality problems of the kind that an undergraduate might find easy;
does not encode any domain-specific knowledge of mathematics. That is, it does not invoke specialised procedures if it detects that the problem lies in a particular domain such as Presburger arithmetichttps://en.wikipedia.org/wiki/Presburger_arithmetic;
is efficient in the sense that it does not store a large state and does not perform a significant search when a human would not.
A side-effect of this approach is that such an approach may be more efficient than existing ATP methods. Typically, existing ATP methods do not scale well with the number of competing rules introduced, as one would expect of algorithms that make use of significant amounts of brute-force search. If we can devise new architectures that solve simple equalities with less search, then it may be possible to scale up these techniques to larger problems and improve the efficiency of established ATP methods.
This chapter presents the
subtask algorithm which has some success with respect to the above goals.
The algorithm is written in Lean 3 [MKA+15[MKA+15]de Moura, Leonardo; Kong, Soonho; et al.The Lean theorem prover (system description) (2015)International Conference on Automated Deduction] and can be found on GitHubhttps://github.com/EdAyers/lean-subtask.
In the remainder of the paper we give a motivating example (Section 4.2) followed by a description of the algorithm (Section 4.3).
The algorithm is then contrasted with existing approaches (Section 4.4) and evaluated against the above goals (Section 4.5).
Let us begin with the example (4.1) in elementary linear algebra mentioned above.
We have to solve the target of the
BoxSee Chapter 3 for more detail on
Box. (4.3) using some subset of the rewrite rules given in (4.2).
To do this, a human's (not fully conscious) thought process might proceed as follows.
I need to create the expression
⟨𝐴† 𝑢 + 𝐴† 𝑣, 𝑥⟩.
In particular, I need to make the subexpressions
𝐴† 𝑣. Let's focus on
The only sensible way I can get this is to use the definition
⟨𝑢, 𝐴 ?𝑧⟩ = ⟨𝐴† 𝑢, ?𝑧⟩, presumably with
?𝑧 = 𝑥.
In particular, I'll need to make the subterm
𝐴 ?𝑧for some
I can do that straight away:
⟨𝐴† (𝑢 + 𝑣), 𝑥⟩ = ⟨𝑢 + 𝑣, 𝐴 𝑥⟩.
Now I'm in a position to obtain the subexpression
⟨𝑢, 𝐴 𝑥⟩I wanted in step 3, so let me do that using bilinearity:
⟨𝑢 + 𝑣, 𝐴 𝑥⟩ = ⟨𝑢, 𝐴 𝑥⟩ + ⟨𝑣, 𝐴 𝑥⟩.
And now I can get the subexpression
𝐴† 𝑢I wanted even earlier in step 2, so let me do that:
⟨𝑢, 𝐴 𝑥⟩ + ⟨𝑣, 𝐴 𝑥⟩ = ⟨𝐴† 𝑢, 𝑥⟩ + ⟨𝑣, 𝐴 𝑥⟩.
In step 2 I also wanted to create
𝐴† 𝑣, which I can now get too:
⟨𝐴† 𝑢, 𝑥⟩ + ⟨𝑣, 𝐴 𝑥⟩ = ⟨𝐴† 𝑢, 𝑥⟩ + ⟨𝐴† 𝑣, 𝑥⟩.
And with one application of bilinearity I'm home:
⟨𝐴† 𝑢, 𝑥⟩ + ⟨𝐴† 𝑣, 𝑥⟩ = ⟨𝐴† 𝑢 + 𝐴† 𝑣, 𝑥⟩.
The key aspect of the above kind of thought process to model is the setting of intermediate aims, such as obtaining certain subexpressions when we do not immediately see how to obtain the entire expression. Let's do this by creating a tree of subtasks Figure 4.4.
The tree in Figure 4.4 represents what the algorithm does with the additivity-of-adjoint problem (4.3).
It starts with the subtask
create_all ⟨𝐴† 𝑢 + 𝐴† v, x⟩ at
Since it cannot achieve that in one application of an available rule, it creates a set of subtasks and then chooses the one that is most promising: later in Section 4.3, I will explain how it generates and evaluates possible subtasks.
In this case the most promising subtask is
create 𝐴† 𝑢, so it selects that in
② and identifies a rewrite rule - the definition of adjoint:
∀ 𝑤 𝑧, ⟨𝐴† 𝑤, 𝑧⟩ = ⟨𝑤, 𝐴 𝑧⟩ - that can achieve it; adding
use ⟨𝑢, 𝐴 ?𝑧⟩ = ⟨𝐴† 𝑢, ?𝑧⟩ to the tree at
?𝑧 that appears at
③ in Figure 4.4 is a metavariableThat is, a placeholder for an expression to be chosen later. See Section 2.4 for background information on metavariables. that will in due course be assigned to
Now the process repeats on
③, a set of subtasks are again created for the lhs of
⟨𝑢, 𝐴 ?𝑧⟩ = ⟨𝐴† 𝑢, ?𝑧⟩ and the subtask
create 𝐴 ?𝑧 is selected (
Now, there does exist a rewrite rule that will achieve
create 𝐴 ?𝑧:
⟨𝐴† (𝑢 + 𝑣), 𝑥⟩ = ⟨𝑢 + 𝑣, 𝐴 𝑥⟩, so this is applied and now the algorithm iterates back up the subtask tree, testing whether the new expression
⟨𝑢 + 𝑣, 𝐴 𝑥⟩ achieves any of the subtasks and whether any new rewrites rules can be used to achieve them.
Now let's see how we can design an algorithm that behaves according to these motivating principles.
4.3. Design Of The Algorithm
subtask algorithm acts on a tree of tasks (as depicted in Figure 4.4) and
an expression called the current expression (CE).
A task is any object which implements the following methods:
refine : Task → List Task
test : Task → Boolwhich returns true when the task is achieved for the current expression.
execute : Task → Unitwhich updates the current expression
𝑦by providing a proof of
𝑥 = 𝑦. Tasks with
executemethods are called strategies. In this case,
testreturns true when
executecan be applied successfully.
The main tasks are given in Table 4.5, however more are added in the software implementation.
reduce_distance (𝑥, 𝑦) will greedily apply any rewrite that causes
𝑦 to be closer in the parse tree.
The algorithm is summarised in the pseudocode in (4.6).
A table of some of the tasks that are used by the
subtasks algorithm. The tasks system is designed to be extensible so other tasks can be added.
|Returns a list of ||True whenever the CE is ||none|
|Returns a list ||True whenever the CE contains ||none|
|Returns a list of ||True whenever the rule ||Apply |
explore phase, we take a task
𝑋 on the tree and
refine it to produce a list of child tasks
𝐶₁, 𝐶₂ ⋯.
We add these to the task tree if they are not already present on it.
We then score the strategies
𝑆₁, 𝑆₂ ⋯ in this list - that is, score the children where
execute is defined.
The score is intended to represent the likelihood of the strategy being successful and is determined by heuristics discussed in Section 4.3.1.
The reason why the algorithm focusses on strategies before non-strategies is a heuristic that seems to work well in practice.
The underlying idea behind the heuristic is that often the first sensible strategy found is enough of a signpost to solve simple problems.
That is, once one has found one plausible strategy of solving a simple problem it is often fruitful to stop looking for other strategies which achieve the same thing and to get on with finding a way of performing the new strategy.
If the overall score is above zero then add a backtrack point and take the highest-scoring strategy
test 𝑆 is false then
𝑆 otherwise execute
𝑆 and ascend
𝑆's parents until a task
𝑌 is found that can not be achieved then
Otherwise if the overall score is less than or equal to zero then
explore a non-strategy child task of
𝑋 or backtrack if none exist.
The backtracking procedure works by taking the list of backtracking points and choosing the one with the highest overall score if the failed branch is removed.
𝑙 = 𝑟, the algorithm is initialised with the tree
create_all 𝑟 and the current expression
We then run
execute (create_all 𝑟) until a timeout is reached or we run out of backtracking points.
Both lists of strategies and individual strategies are scored using a heuristic to guide the exploration of the tree. The system prioritises strategies if they:
achieve a task higher in the task tree;
achieve a task on a different branch of the task tree;
have a high degree of term overlap with the current expression (this is measured using symbol counting and finding largest common subterms);
use local hypotheses;
can be achieved in one rewrite step from the current expression.
The overall score heuristic evaluates sets of strategies.
If there is only one strategy then it scores
If there are multiple strategies, it discards any scoring less than
If there are positive-scoring strategies then all negative-scoring strategies are discarded.
The overall score is then set to be
5 minus the number of strategies in the list.
The intention of this simple procedure is that we should prefer smaller sets of strategies, even if their scores are bad because it limits choice in what to do next.
4.4. Related Work
As mentioned in Section 4.1 there has been a substantial amount of research on the automation of solving equality chain problems over the last decade. Again, I recommend Term Rewriting And All That [BN98[BN98]Baader, Franz; Nipkow, TobiasTerm rewriting and all that (1998)] as an introduction to this topic and appropriate chapters of The Handbook of Automated Reasoning for a more in-depth reference.
The approach of the subtasks algorithm is to combine these rewriting techniques with a hierarchical search. As such there are a few different fields of related work which we may compare subtasks which are each addressed in the following subsections.
4.4.1. Term Rewriting
One way to find equality proofs is to perform a graph search using a heuristic.
This is the approach of the
rewrite-search algorithm by Hoek and Morrison [HM19[HM19]Hoek, Keeley; Morrison, Scottlean-rewrite-search GitHub repository (2019)], which uses the heuristic of string edit-distance between the strings' two pretty-printed expressions.
rewrite-search algorithm does capture some human-like properties in the heuristic, since the pretty printed expressions are intended for human consumption.
Our algorithm is different from
rewrite-search in that we guide search according to achieving sequences of tasks.
Since both our software and
rewrite-search are written in Lean, some future work could be to investigate a combination of both systems.
A term rewriting system (TRS)
𝑅 is a set of oriented rewrite rules.
There are many techniques available for turning a set of rewrite rules in to procedures that check whether two terms are equal.
One technique is completion, where
𝑅 is converted into an equivalent TRS
𝑅' that is convergent.
This means that any two expressions
𝑏 are equal under
𝑅 if and only if repeated application of rules in
𝑏 will produce the same expression.
Finding equivalent convergent systems, if not by hand, is usually done by finding decreasing orderings on terms and using Knuth-Bendix completion [KB83[KB83]Knuth, Donald E; Bendix, Peter BSimple word problems in universal algebras (1983)Automation of Reasoning].
When such a system exists, automated rewriting systems can use these techniques to quickly find proofs, but the proofs are often overly long and needlessly expand terms.
Another method is rewrite tables, where a lookup table of representatives for terms is stored in a way that allows for two terms to be matched through a series of lookups.
Both completion and rewrite tables can be considered machine-oriented because they rely on large datastructures and systematic applications of rewrite rules. Such methods are certainly highly useful, but they can hardly be said to capture the process by which humans reason.
Finally, there are many normalisation and decision procedures for particular domains, for example on rings [GM05[GM05]Grégoire, Benjamin; Mahboubi, AssiaProving equalities in a commutative ring done right in Coq (2005)International Conference on Theorem Proving in Higher Order Logics]. Domain specific procedures do not satisfy our criterion of generality.
4.4.2. Proof Planning
Background information on proof planning is covered in Section 2.7.2.
The subtasks algorithm employs a structure that is similar to a hierarchical task network (HTN) [Sac74[Sac74]Sacerdoti, Earl DPlanning in a hierarchy of abstraction spaces (1974)Artificial intelligence, Tat77[Tat77]Tate, AustinGenerating project networks (1977)IJCAI, MS99[MS99]Melis, Erica; Siekmann, JörgKnowledge-based proof planning (1999)Artificial Intelligence]. The general idea of a hierarchical task network is to break a given abstract task (e.g. "exit the room") in to a sequence of subtasks ("find a door" then "go to door" then "walk through the door") which may themselves be recursively divided into subtasks ("walk through the door" may have a subtask of "open the door" which may in turn have "grasp doorhandle" until bottoming out with a ground actuation such as "move index finger 10°"). This approach has found use for example in the ICARUS robotics architecture [CL18[CL18]Choi, Dongkyu; Langley, PatEvolution of the ICARUS cognitive architecture (2018)Cognitive Systems Research, LCT08[LCT08]Langley, Pat; Choi, Dongkyu; et al.Icarus user’s manual (2008)]. HTNs have also found use in proof planning [MS99[MS99]Melis, Erica; Siekmann, JörgKnowledge-based proof planning (1999)Artificial Intelligence].
The main difference between the approach used in the subtasks algorithm and proof planning and hierarchical task networks is that the subtasks algorithm is greedier: the subtasks algorithm generates enough of a plan to have little doubt what the first rewrite rule in the sequence should be, and no more. I believe that this reflects how humans reason for solving simple problems: favouring just enough planning to decide on a good first step, and then planning further only once the step is completed and new information is revealed.
A difference between HTNs and subtasks is that the the chains of subtasks do not necessarily reach a ground subtask (for subtasks this is a rewrite step that can be performed immediately). This means that the subtasks algorithm needs to use heuristics to determine whether it is appropriate to explore a subtask tree or not instead of relying on the task hierarchy eventually terminating with ground tasks. The subtasks algorithm also inherits all of the problems found in hierarchical planning: the main one being finding heuristics for determining whether a subtask should be abandoned or refined further. The heuristics given in Section 4.3.1 help with this but there are plenty more ideas from the hierarchical task planning literature that could be incorporated also. Of particular interest for me are the applications of hierarchical techniques from the field of reinforcement learningA good introductory text to modern reinforcement learning is Reinforcement Learning; An Introduction by Sutton and Barto [SB18b]. Readers wishing to learn more about hierarchical reinforcement learning may find this survey article by Flet-Berliac to be a good jumping-off point [Fle19].[SB18b]Sutton, Richard S; Barto, Andrew GReinforcement learning: An introduction (2018)[Fle19]Flet-Berliac, YannisThe Promise of Hierarchical Reinforcement Learning (2019)The Gradient.
Our ultimate motivation is to make an algorithm that behaves as a human mathematician would. We do not wish to claim that we have fully achieved this, but we can evaluate our algorithm with respect to some general goals that we mentioned in Section 4.1.
Scope: can it solve simple equations?
Generality: does it avoid techniques specific to a particular area of mathematics?
Reduced search space: does the algorithm avoid search when finding proofs that humans can find easily without search?
Straightforwardness of proofs: for easy problems, does it give a proof that an experienced human mathematician might give?
Our method of evaluation is to use the algorithm implemented as a tactic in Lean on a library of thirty or so example problems. This is not large enough for a substantial quantitative comparison with existing methods, but we can still investigate some properties of the algorithm. The source code also contains many examples which are outside the abilities of the current implementation of the algorithm. Some ways to address these issues are discussed in Section 4.6.
Table 4.7 gives some selected examples. These are all problems that the algorithm can solve with no backtracking.
subtask's performance on some example problems. Steps gives the number of rewrite steps in the final proof. Location gives the file and declaration name of the example in the source code.
From this Table 4.7 we can see that the algorithm solves problems from several different domains. We did not encode any decision procedures for monoids or rings.
In fact we did not even include reasoning under associativity and commutativity, although we are not in principle against extending the algorithm to do this.
The input to the algorithm is simply a list of over 100 axioms and equations for sets, rings, groups and vector spaces which can be found in the file
equate.lean in the source code. Thus, the algorithm exhibits considerable generality.
All of the solutions to the above examples are found without backtracking, which adds support to the claim that our algorithm requires less search. There are other examples in the source where backtracking occurs, so there is still some work to be done on choosing scoring heuristics here.
Our final criterion is that the proofs are more straightforward than those produced by machine-oriented special purpose tactics. This is a somewhat subjective measure but there are some proxies that indicate that
subtasks can be used to generate simpler proofs.
To illustrate this point, consider the problem of proving
(𝑥 + 𝑦)² + (𝑥 + 𝑧)² = (𝑧 + 𝑥)² + (𝑦 + 𝑥)² within ring theory.
We choose this example because it is easy for a human to spot how to do it with three uses of commutativity, but it is easy for a program to be led astray by expanding the squares.
subtask proves this equality with 3 uses of commutativity and with no backtracking or expansion of the squares.
This is an example where domain specific tactics do worse than
ring tactic for reasoning on problems in commutative rings will produce a proof by expanding out the squares.
The built-in tactics
blast in Lean which reason under associativity and commutativity both use commutativity 5 times.
If one is simply interested in verification, then such a result is perfectly acceptable.
However, we are primarily interested in modelling how humans would solve such an equality, so we want our algorithm not to perform unnecessary steps such as this.
It is difficult to fairly compare the speed of
subtask in the current implementation because it is compiled to Lean bytecode which is much slower than native built-in tactics that are written in C++.
However it is worth noting that, even with this handicap,
subtask takes 1900ms to find the above proof whereas
blast take 600ms and 900ms respectively.
There are still proofs generated by
subtask that are not straightforward.
For example, the lemma
(𝑥 * 𝑧) * (𝑧⁻¹ * 𝑦) = 𝑥 * 𝑦 in group theory is proved by
subtask with a superfluous use of the rule
e = 𝑥 * 𝑥⁻¹. We hope that some of these defects will be ironed out in future versions of the program.
4.6. Conclusions and Further Work
In this paper, we introduced a new, task-based approach to finding equalities in proofs and provided a demonstration of the approach by building the
subtask tactic in Lean.
We show that the approach can solve simple equality proofs with very little search.
I hope that our work will renew interest in proof planning and spark interest in human-oriented reasoning for at least some classes of problems.
In future work, we wish to add more subtasks and better heuristics for scoring them. The framework we outlined here allows for easy experimentation with such different sets of heuristics and subtasks. In this way, we also wish to make the subtask framework extensible by users, so that they may add their own custom subtasks and scoring functions.
There are times when the algorithm fails and needs guidance from the user.
We wish to study further how the subtask paradigm might be used to enable more human-friendly interactivity than is currently possible.
For example, in real mathematical textbooks, if an equality step is not obvious, a relevant lemma will be mentioned.
Similarly, we wish to investigate ways of passing 'hint' subtasks to the tactic.
For example, when proving
𝑥 * 𝑦 = (𝑥 * 𝑧) * (𝑧⁻¹ * 𝑦), the algorithm will typically get stuck (although it can solve the flipped problem), because there are too many ways of creating
However, the user - upon seeing
subtask get stuck - could steer the algorithm with a suggested subtask such as
create (𝑥 * (𝑧 * 𝑧⁻¹)).
Using subtasks should help to give better explanations to the user. The idea of our algorithm is that the first set of strategies in the tree roughly corresponds to the high-level actions that a human would first consider when trying to solve the problem. Thus, the algorithm could use the subtask hierarchy to determine when no further explanation is needed and thereby generate abbreviated proofs of a kind that might be found in mathematical textbooks.
Another potential area to explore is to perform an evaluation survey where students are asked to determine whether an equality proof was generated by our software or a machine.
Bibliography for this chapter
- [AGJ19]Ayers, E. W.; Gowers, W. T.; Jamnik, MatejaA human-oriented term rewriting system (2019)KIpages 76--86organization Springerview online
- [BN98]Baader, Franz; Nipkow, TobiasTerm rewriting and all that (1998)publisher Cambridge University Pressisbn 978-0-521-45520-6
- [CL18]Choi, Dongkyu; Langley, PatEvolution of the ICARUS cognitive architecture (2018)Cognitive Systems Researchvolume 48pages 25--38publisher Elsevier
- [Fle19]Flet-Berliac, YannisThe Promise of Hierarchical Reinforcement Learning (2019)The Gradientview online
- [GM05]Grégoire, Benjamin; Mahboubi, AssiaProving equalities in a commutative ring done right in Coq (2005)International Conference on Theorem Proving in Higher Order Logicspages 98--113organization Springerview online
- [HM19]Hoek, Keeley; Morrison, Scottlean-rewrite-search GitHub repository (2019)view online
- [KB83]Knuth, Donald E; Bendix, Peter BSimple word problems in universal algebras (1983)Automation of Reasoningpages 342--376publisher Springerview online
- [LCT08]Langley, Pat; Choi, Dongkyu; Trivedi, NishantIcarus user’s manual (2008)institution Institute for the Study of Learning and Expertiseview online
- [MKA+15]de Moura, Leonardo; Kong, Soonho; Avigad, Jeremy; Van Doorn, Floris; von Raumer, JakobThe Lean theorem prover (system description) (2015)International Conference on Automated Deductionpages 378--388organization Springerview online
- [MS99]Melis, Erica; Siekmann, JörgKnowledge-based proof planning (1999)Artificial Intelligencevolume 115number 1pages 65--105publisher Elsevierview online
- [SB18b]Sutton, Richard S; Barto, Andrew GReinforcement learning: An introduction (2018)publisher MIT pressview online
- [Sac74]Sacerdoti, Earl DPlanning in a hierarchy of abstraction spaces (1974)Artificial intelligencevolume 5number 2pages 115--135publisher Elsevierdoi 10.1016/0004-3702(74)90026-5view online
- [Tat77]Tate, AustinGenerating project networks (1977)IJCAIpages 888--893organization Morgan Kaufmann Publishers Inc.view online