This chapter is about the `subtask`

algorithm. This algorithm produces human-like equality proofs by representing equality reasoning problems using a hierarchy of subtasks.
The work in this chapter also appears as a paper published in KI 2019 [AGJ19[AGJ19]Ayers, E. W.; Gowers, W. T.; *et al.**A human-oriented term rewriting system* (2019)KI].
The work was a collaboration with my PhD supervisors W.T. Gowers and Mateja Jamnik.
I'm going to keep using the first person to keep style with the rest of the document.

# 4.1. Introduction

Equality and inequality chains are ubiquitous in mathematics.
Here, by **equality chain**, I mean parts of proofs found in mathematical literature that consist of a list of expressions separated by `=`

or some other transitive relation symbol.
The chains are chosen such that each pair of adjacent expressions are clearly equal to the reader, in the sense that the equality does not need to be explicitly justified. And hence, by transitivity, the chain shows that first expression is equal to the last one.

For example, take some vector space `V`

. Suppose that one wishes to prove that given a linear map `π΄ : V βΆ V`

, its adjoint `π΄β : V* β V*`

(`V*`

denotes the dual spacehttps://en.wikipedia.org/wiki/Dual_space of `V`

) is linear.
To do so one typically provides the equality chain (4.1) for all vectors `π₯`

and all dual vectors `π’`

, `π£`

.

The equations that we can compose our reasoning chain from (e.g., `β¨π΄β π, πβ© = β¨π, π΄ πβ©`

) are called
**rewrite rules**.
For the example (4.1), there are a large number of axiomatic rewrite rules available (4.2) and still more rules derived from these.

A central part of automated theorem proving (ATP) is constructing equality proofs such as (4.1) from (4.2) automatically.
This can be done with well-researched techniques from the field of **term rewriting systems** [BN98[BN98]Baader, Franz; Nipkow, Tobias*Term rewriting and all that* (1998)].
These techniques take advantage of the fact that computers can perform many operations per second, and large search spaces can be explored quickly, though heuristic functions are still needed to prevent a combinatorial explosion.
Many domains - such as checking that two expressions are equal using the ring axioms - also have specialised decision procedures available for them.
I'll call these approaches to solving equalities **machine-oriented**; this contrasts with human-oriented as discussed in Section 2.7.

In accordance with the research goals of this thesis (Section 1.2), the purpose here is to investigate alternative, human-like ways of producing equality proofs.
As motivated in Section 1.1, this serves the purpose of improving the usability of proof assistants by making the proofs generated more understandable (Section 2.6).
The goal of this chapter is *not* to develop methods that compete with machine-oriented techniques to prove more theorems or prove them faster.
Instead, I want to focus on the abstract reasoning that seems to occur in the mind of a human mathematician when they encounter an equality reasoning problem such as (4.1).

With this in mind, the goal of this chapter is to create an algorithm which:

can solve simple equality problems of the kind that an undergraduate might find easy;

does not encode any domain-specific knowledge of mathematics. That is, it does not invoke specialised procedures if it detects that the problem lies in a particular domain such as Presburger arithmetichttps://en.wikipedia.org/wiki/Presburger_arithmetic;

is efficient in the sense that it does not store a large state and does not perform a significant search when a human would not.

A side-effect of this approach is that such an approach may be more efficient than existing ATP methods. Typically, existing ATP methods do not scale well with the number of competing rules introduced, as one would expect of algorithms that make use of significant amounts of brute-force search. If we can devise new architectures that solve simple equalities with less search, then it may be possible to scale up these techniques to larger problems and improve the efficiency of established ATP methods.

This chapter presents the `subtask`

algorithm which has some success with respect to the above goals.
The algorithm is written in Lean 3 [MKA+15[MKA+15]de Moura, Leonardo; Kong, Soonho; *et al.**The Lean theorem prover (system description)* (2015)International Conference on Automated Deduction] and can be found on GitHubhttps://github.com/EdAyers/lean-subtask.
In the remainder of the paper we give a motivating example (Section 4.2) followed by a description of the algorithm (Section 4.3).
The algorithm is then contrasted with existing approaches (Section 4.4) and evaluated against the above goals (Section 4.5).

# 4.2. Motivation

Let us begin with the example (4.1) in elementary linear algebra mentioned above.
We have to solve the target of the `Box`

See Chapter 3 for more detail on `Box`

. (4.3) using some subset of the rewrite rules given in (4.2).

To do this, a human's (not fully conscious) thought process might proceed as follows.

I need to create the expression

`β¨π΄β π’ + π΄β π£, π₯β©`

.In particular, I need to make the subexpressions

`π΄β π’`

and`π΄β π£`

. Let's focus on`π΄β π’`

.The only sensible way I can get this is to use the definition

`β¨π’, π΄ ?π§β© = β¨π΄β π’, ?π§β©`

, presumably with`?π§ = π₯`

.In particular, I'll need to make the subterm

`π΄ ?π§`

for some`?π§`

.I can do that straight away:

`β¨π΄β (π’ + π£), π₯β© = β¨π’ + π£, π΄ π₯β©`

.Now I'm in a position to obtain the subexpression

`β¨π’, π΄ π₯β©`

I wanted in step 3, so let me do that using bilinearity:`β¨π’ + π£, π΄ π₯β© = β¨π’, π΄ π₯β© + β¨π£, π΄ π₯β©`

.And now I can get the subexpression

`π΄β π’`

I wanted even earlier in step 2, so let me do that:`β¨π’, π΄ π₯β© + β¨π£, π΄ π₯β© = β¨π΄β π’, π₯β© + β¨π£, π΄ π₯β©`

.In step 2 I also wanted to create

`π΄β π£`

, which I can now get too:`β¨π΄β π’, π₯β© + β¨π£, π΄ π₯β© = β¨π΄β π’, π₯β© + β¨π΄β π£, π₯β©`

.And with one application of bilinearity I'm home:

`β¨π΄β π’, π₯β© + β¨π΄β π£, π₯β© = β¨π΄β π’ + π΄β π£, π₯β©`

.

The key aspect of the above kind of thought process to model is the setting of intermediate aims, such as obtaining certain subexpressions when we do not immediately see how to obtain the entire expression. Let's do this by creating a tree of subtasks Figure 4.4.

The tree in Figure 4.4 represents what the algorithm does with the additivity-of-adjoint problem (4.3).
It starts with the subtask `create_all β¨π΄β π’ + π΄β v, xβ©`

at `β `

.
Since it cannot achieve that in one application of an available rule, it creates a set of subtasks and then chooses the one that is most promising: later in Section 4.3, I will explain how it generates and evaluates possible subtasks.
In this case the most promising subtask is `create π΄β π’`

, so it selects that in `β‘`

and identifies a rewrite rule - the definition of adjoint: `β π€ π§, β¨π΄β π€, π§β© = β¨π€, π΄ π§β©`

- that can achieve it; adding `use β¨π’, π΄ ?π§β© = β¨π΄β π’, ?π§β©`

to the tree at `β’`

.
The `?π§`

that appears at `β’`

in Figure 4.4 is a metavariableThat is, a placeholder for an expression to be chosen later. See Section 2.4 for background information on metavariables. that will in due course be assigned to `π₯`

.
Now the process repeats on `β’`

, a set of subtasks are again created for the lhs of `β¨π’, π΄ ?π§β© = β¨π΄β π’, ?π§β©`

and the subtask `create π΄ ?π§`

is selected (`β£`

).
Now, there does exist a rewrite rule that will achieve `create π΄ ?π§`

: `β¨π΄β (π’ + π£), π₯β© = β¨π’ + π£, π΄ π₯β©`

, so this is applied and now the algorithm iterates back up the subtask tree, testing whether the new expression `β¨π’ + π£, π΄ π₯β©`

achieves any of the subtasks and whether any new rewrites rules can be used to achieve them.

Now let's see how we can design an algorithm that behaves according to these motivating principles.

# 4.3. Design Of The Algorithm

The `subtask`

algorithm acts on a tree of **tasks** (as depicted in Figure 4.4) and
an expression called the **current expression** (CE).
A task is any object which implements the following methods:

`refine : Task β List Task`

`test : Task β Bool`

which returns true when the task is**achieved**for the current expression.optionally,

`execute : Task β Unit`

which updates the current expression`π₯`

to`π¦`

by providing a proof of`π₯ = π¦`

. Tasks with`execute`

methods are called**strategies**. In this case,`test`

returns true when`execute`

can be applied successfully.

The main tasks are given in Table 4.5, however more are added in the software implementation.
For example `reduce_distance (π₯, π¦)`

will greedily apply any rewrite that causes `π₯`

and `π¦`

to be closer in the parse tree.
The algorithm is summarised in the pseudocode in (4.6).

**Table 4.5.**

A table of some of the tasks that are used by the `subtasks`

algorithm. The tasks system is designed to be extensible so other tasks can be added.

subtask | refine | test | execute |
---|---|---|---|

`create_all π` | Returns a list of `create π` subtasks where each `π` is a minimal subterm of `π` not present in the CE. | True whenever the CE is `π` . | none |

`create π` | Returns a list `use (π = π)` subtasks where each `π` occurs within `π` . | True whenever the CE contains `π` . | none |

`use (π = π)` | Returns a list of `create π` subtasks where each `π` is a minimal subterm of `π` not present in the CE. | True whenever the rule `π = π` can be applied to the CE. | Apply `π = π` to the CE. |

In the `explore`

phase, we take a task `π`

on the tree and
`refine`

it to produce a list of child tasks `πΆβ, πΆβ β―`

.
We add these to the task tree if they are not already present on it.
We then score the strategies `πβ, πβ β―`

in this list - that is, score the children where `execute`

is defined.
The score is intended to represent the likelihood of the strategy being successful and is determined by heuristics discussed in Section 4.3.1.
The reason why the algorithm focusses on strategies before non-strategies is a heuristic that seems to work well in practice.
The underlying idea behind the heuristic is that often the first sensible strategy found is enough of a signpost to solve simple problems.
That is, once one has found one plausible strategy of solving a simple problem it is often fruitful to stop looking for other strategies which achieve the same thing and to get on with finding a way of performing the new strategy.

If the overall score is above zero then add a backtrack point and take the highest-scoring strategy `π`

.

If `test π`

is false then `explore`

`π`

otherwise execute `π`

and ascend `π`

's parents until a task `π`

is found that can not be achieved then `explore`

`π`

.
Otherwise if the overall score is less than or equal to zero then `explore`

a non-strategy child task of `π`

or backtrack if none exist.
The backtracking procedure works by taking the list of backtracking points and choosing the one with the highest overall score if the failed branch is removed.

To find `π = π`

, the algorithm is initialised with the tree `create_all π`

and the current expression `π`

.
We then run `execute (create_all π)`

until a timeout is reached or we run out of backtracking points.

## 4.3.1. Heuristics

Both lists of strategies and individual strategies are scored using a heuristic to guide the exploration of the tree. The system prioritises strategies if they:

achieve a task higher in the task tree;

achieve a task on a different branch of the task tree;

have a high degree of term overlap with the current expression (this is measured using symbol counting and finding largest common subterms);

use local hypotheses;

can be achieved in one rewrite step from the current expression.

The overall score heuristic evaluates sets of strategies.
If there is only one strategy then it scores `10`

.
If there are multiple strategies, it discards any scoring less than `-5`

.
If there are positive-scoring strategies then all negative-scoring strategies are discarded.
The overall score is then set to be `5`

minus the number of strategies in the list.
The intention of this simple procedure is that we should prefer smaller sets of strategies, even if their scores are bad because it limits choice in what to do next.

# 4.4. Related Work

As mentioned in Section 4.1 there has been a substantial amount of research on the automation of solving equality chain problems over the last decade.
Again, I recommend *Term Rewriting And All That* [BN98[BN98]Baader, Franz; Nipkow, Tobias*Term rewriting and all that* (1998)] as an introduction to this topic and appropriate chapters of *The Handbook of Automated Reasoning* for a more in-depth reference.

The approach of the subtasks algorithm is to combine these rewriting techniques with a hierarchical search. As such there are a few different fields of related work which we may compare subtasks which are each addressed in the following subsections.

## 4.4.1. Term Rewriting

One way to find equality proofs is to perform a graph search using a heuristic.
This is the approach of the `rewrite-search`

algorithm by Hoek and Morrison [HM19[HM19]Hoek, Keeley; Morrison, Scott*lean-rewrite-search GitHub repository* (2019)], which uses the heuristic of string edit-distance between the strings' two pretty-printed expressions.
The `rewrite-search`

algorithm does capture some human-like properties in the heuristic, since the pretty printed expressions are intended for human consumption.
Our algorithm is different from `rewrite-search`

in that we guide search according to achieving sequences of tasks.
Since both our software and `rewrite-search`

are written in Lean, some future work could be to investigate a combination of both systems.

A term rewriting system (TRS) `π
`

is a set of oriented rewrite rules.
There are many techniques available for turning a set of rewrite rules in to procedures that check whether two terms are equal.
One technique is **completion**, where `π
`

is converted into an equivalent TRS `π
'`

that is **convergent**.
This means that any two expressions `π`

, `π`

are equal under `π
`

if and only if repeated application of rules in `π
'`

to `π`

and `π`

will produce the same expression.
Finding equivalent convergent systems, if not by hand, is usually done by finding decreasing orderings on terms and using Knuth-Bendix completion [KB83[KB83]Knuth, Donald E; Bendix, Peter B*Simple word problems in universal algebras* (1983)Automation of Reasoning].
When such a system exists, automated rewriting systems can use these techniques to quickly find proofs, but the proofs are often overly long and needlessly expand terms.

Another method is rewrite tables, where a lookup table of representatives for terms is stored in a way that allows for two terms to be matched through a series of lookups.

Both completion and rewrite tables can be considered machine-oriented because they rely on large datastructures and systematic applications of rewrite rules. Such methods are certainly highly useful, but they can hardly be said to capture the process by which humans reason.

Finally, there are many normalisation and decision procedures for particular domains, for example on rings [GM05[GM05]GrΓ©goire, Benjamin; Mahboubi, Assia*Proving equalities in a commutative ring done right in Coq* (2005)International Conference on Theorem Proving in Higher Order Logics].
Domain specific procedures do not satisfy our criterion of generality.

## 4.4.2. Proof Planning

Background information on proof planning is covered in Section 2.7.2.

The subtasks algorithm employs a structure that is similar to a **hierarchical task network** (HTN) [Sac74[Sac74]Sacerdoti, Earl D*Planning in a hierarchy of abstraction spaces* (1974)Artificial intelligence, Tat77[Tat77]Tate, Austin*Generating project networks* (1977)IJCAI, MS99[MS99]Melis, Erica; Siekmann, JΓΆrg*Knowledge-based proof planning* (1999)Artificial Intelligence]. The general idea of a hierarchical task network is to break a given abstract task (*e.g.* "exit the room") in to a sequence of subtasks ("find a door" then "go to door" then "walk through the door") which may themselves be recursively divided into subtasks ("walk through the door" may have a subtask of "open the door" which may in turn have "grasp doorhandle" until bottoming out with a ground actuation such as "move index finger 10Β°").
This approach has found use for example in the ICARUS robotics architecture [CL18[CL18]Choi, Dongkyu; Langley, Pat*Evolution of the ICARUS cognitive architecture* (2018)Cognitive Systems Research, LCT08[LCT08]Langley, Pat; Choi, Dongkyu; *et al.**Icarus userβs manual* (2008)].
HTNs have also found use in proof planning [MS99[MS99]Melis, Erica; Siekmann, JΓΆrg*Knowledge-based proof planning* (1999)Artificial Intelligence].

The main difference between the approach used in the subtasks algorithm and proof planning and hierarchical task networks is that the subtasks algorithm is greedier: the subtasks algorithm generates enough of a plan to have little doubt what the first rewrite rule in the sequence should be, and no more. I believe that this reflects how humans reason for solving simple problems: favouring just enough planning to decide on a good first step, and then planning further only once the step is completed and new information is revealed.

A difference between HTNs and subtasks is that the the chains of subtasks do not necessarily reach a ground subtask (for subtasks this is a rewrite step that can be performed immediately).
This means that the subtasks algorithm needs to use heuristics to determine whether it is appropriate to explore a subtask tree or not instead of relying on the task hierarchy eventually terminating with ground tasks. The subtasks algorithm also inherits all of the problems found in hierarchical planning: the main one being finding heuristics for determining whether a subtask should be abandoned or refined further. The heuristics given in Section 4.3.1 help with this but there are plenty more ideas from the hierarchical task planning literature that could be incorporated also. Of particular interest for me are the applications of hierarchical techniques from the field of reinforcement learningA good introductory text to modern reinforcement learning is *Reinforcement Learning; An Introduction* by Sutton and Barto [SB18b]. Readers wishing to learn more about hierarchical reinforcement learning may find this survey article by Flet-Berliac to be a good jumping-off point [Fle19].[SB18b]Sutton, Richard S; Barto, Andrew G*Reinforcement learning: An introduction* (2018)[Fle19]Flet-Berliac, Yannis*The Promise of Hierarchical Reinforcement Learning* (2019)The Gradient.

# 4.5. Evaluation

Our ultimate motivation is to make an algorithm that behaves as a human mathematician would. We do not wish to claim that we have fully achieved this, but we can evaluate our algorithm with respect to some general goals that we mentioned in Section 4.1.

Scope: can it solve simple equations?

Generality: does it avoid techniques specific to a particular area of mathematics?

Reduced search space: does the algorithm avoid search when finding proofs that humans can find easily without search?

Straightforwardness of proofs: for easy problems, does it give a proof that an experienced human mathematician might give?

Our method of evaluation is to use the algorithm implemented as a tactic in Lean on a library of thirty or so example problems. This is not large enough for a substantial quantitative comparison with existing methods, but we can still investigate some properties of the algorithm. The source code also contains many examples which are outside the abilities of the current implementation of the algorithm. Some ways to address these issues are discussed in Section 4.6.

Table 4.7 gives some selected examples. These are all problems that the algorithm can solve with no backtracking.

**Table 4.7.**

`subtask`

's performance on some example problems. **Steps** gives the number of rewrite steps in the final proof. **Location** gives the file and declaration name of the example in the source code.

Problem | Steps | Location |
---|---|---|

`π : Ξ±` `π π : List Ξ±` `rev(π ++ π ) = rev(π ) ++ rev(π)` `β π, rev(π :: π) = rev(π) ++ [π]` `rev(π :: π ++ π ) = rev(π ) ++ rev(π :: π)` | 5 | `datatypes.lean/rev_app_rev` |

`A : Monoid` `π : A` `π π : β` `π ^ (π + π) = π ^ π * π ^ π` `π ^ (succ(π) + π) = π ^ succ(π) * π ^ π` | 8 | `groups.lean/my_pow_add` |

`R : Ring` `π π π π π π : R` `π * π = π * π` `π * π = π * π` `π * (π * π) = π * (π * π)` | 9 | `rat.lean` |

`R : Ring` `π π : R` `(π + π) * (π + π) = π * π + 2 * (π * π) + π * π` | 7 | `rings.lean/sumsq_with_equate` |

`π΅ πΆ π : set` `π \ (π΅ βͺ πΆ) = (π \ π΅) \ πΆ` | 4 | `sets.lean/example_4` |

From this Table 4.7 we can see that the algorithm solves problems from several different domains. We did not encode any decision procedures for monoids or rings.
In fact we did not even include reasoning under associativity and commutativity, although we are not in principle against extending the algorithm to do this.
The input to the algorithm is simply a list of over 100 axioms and equations for sets, rings, groups and vector spaces which can be found in the file `equate.lean`

in the source code. Thus, the algorithm exhibits considerable generality.

All of the solutions to the above examples are found without backtracking, which adds support to the claim that our algorithm requires less search. There are other examples in the source where backtracking occurs, so there is still some work to be done on choosing scoring heuristics here.

Our final criterion is that the proofs are more straightforward than those produced by machine-oriented special purpose tactics. This is a somewhat subjective measure but there are some proxies that indicate that `subtasks`

can be used to generate simpler proofs.

To illustrate this point, consider the problem of proving `(π₯ + π¦)Β² + (π₯ + π§)Β² = (π§ + π₯)Β² + (π¦ + π₯)Β²`

within ring theory.
We choose this example because it is easy for a human to spot how to do it with three uses of commutativity, but it is easy for a program to be led astray by expanding the squares. `subtask`

proves this equality with 3 uses of commutativity and with no backtracking or expansion of the squares.
This is an example where domain specific tactics do worse than `subtask`

, the `ring`

tactic for reasoning on problems in commutative rings will produce a proof by expanding out the squares.
The built-in tactics `ac_refl`

and `blast`

in Lean which reason under associativity and commutativity both use commutativity 5 times.
If one is simply interested in verification, then such a result is perfectly acceptable.
However, we are primarily interested in modelling how humans would solve such an equality, so we want our algorithm not to perform unnecessary steps such as this.

It is difficult to fairly compare the speed of `subtask`

in the current implementation because it is compiled to Lean bytecode which is much slower than native built-in tactics that are written in C++.
However it is worth noting that, even with this handicap, `subtask`

takes 1900ms to find the above proof whereas `ac_refl`

and `blast`

take 600ms and 900ms respectively.

There are still proofs generated by `subtask`

that are not straightforward.
For example, the lemma `(π₯ * π§) * (π§β»ΒΉ * π¦) = π₯ * π¦`

in group theory is proved by `subtask`

with a superfluous use of the rule `e = π₯ * π₯β»ΒΉ`

. We hope that some of these defects will be ironed out in future versions of the program.

# 4.6. Conclusions and Further Work

In this paper, we introduced a new, task-based approach to finding equalities in proofs and provided a demonstration of the approach by building the `subtask`

tactic in Lean.
We show that the approach can solve simple equality proofs with very little search.
I hope that our work will renew interest in proof planning and spark interest in human-oriented reasoning for at least some classes of problems.

In future work, we wish to add more subtasks and better heuristics for scoring them. The framework we outlined here allows for easy experimentation with such different sets of heuristics and subtasks. In this way, we also wish to make the subtask framework extensible by users, so that they may add their own custom subtasks and scoring functions.

There are times when the algorithm fails and needs guidance from the user.
We wish to study further how the subtask paradigm might be used to enable more human-friendly interactivity than is currently possible.
For example, in real mathematical textbooks, if an equality step is not obvious, a relevant lemma will be mentioned.
Similarly, we wish to investigate ways of passing 'hint' subtasks to the tactic.
For example, when proving `π₯ * π¦ = (π₯ * π§) * (π§β»ΒΉ * π¦)`

, the algorithm will typically get stuck (although it can solve the flipped problem), because there are too many ways of creating `π§`

.
However, the user - upon seeing `subtask`

get stuck - could steer the algorithm with a suggested subtask such as `create (π₯ * (π§ * π§β»ΒΉ))`

.

Using subtasks should help to give better explanations to the user. The idea of our algorithm is that the first set of strategies in the tree roughly corresponds to the high-level actions that a human would first consider when trying to solve the problem. Thus, the algorithm could use the subtask hierarchy to determine when no further explanation is needed and thereby generate abbreviated proofs of a kind that might be found in mathematical textbooks.

Another potential area to explore is to perform an evaluation survey where students are asked to determine whether an equality proof was generated by our software or a machine.

# Bibliography for this chapter

- [AGJ19]Ayers, E. W.; Gowers, W. T.; Jamnik, Mateja
*A human-oriented term rewriting system*(2019)KI**pages**76--86**organization**Springerview online - [BN98]Baader, Franz; Nipkow, Tobias
*Term rewriting and all that*(1998)**publisher**Cambridge University Press**isbn**978-0-521-45520-6 - [CL18]Choi, Dongkyu; Langley, Pat
*Evolution of the ICARUS cognitive architecture*(2018)Cognitive Systems Research**volume**48**pages**25--38**publisher**Elsevier - [Fle19]Flet-Berliac, Yannis
*The Promise of Hierarchical Reinforcement Learning*(2019)The Gradientview online - [GM05]GrΓ©goire, Benjamin; Mahboubi, Assia
*Proving equalities in a commutative ring done right in Coq*(2005)International Conference on Theorem Proving in Higher Order Logics**pages**98--113**organization**Springerview online - [HM19]Hoek, Keeley; Morrison, Scott
*lean-rewrite-search GitHub repository*(2019)view online - [KB83]Knuth, Donald E; Bendix, Peter B
*Simple word problems in universal algebras*(1983)Automation of Reasoning**pages**342--376**publisher**Springerview online - [LCT08]Langley, Pat; Choi, Dongkyu; Trivedi, Nishant
*Icarus userβs manual*(2008)**institution**Institute for the Study of Learning and Expertiseview online - [MKA+15]de Moura, Leonardo; Kong, Soonho; Avigad, Jeremy; Van Doorn, Floris; von Raumer, Jakob
*The Lean theorem prover (system description)*(2015)International Conference on Automated Deduction**pages**378--388**organization**Springerview online - [MS99]Melis, Erica; Siekmann, JΓΆrg
*Knowledge-based proof planning*(1999)Artificial Intelligence**volume**115**number**1**pages**65--105**publisher**Elsevierview online - [SB18b]Sutton, Richard S; Barto, Andrew G
*Reinforcement learning: An introduction*(2018)**publisher**MIT pressview online - [Sac74]Sacerdoti, Earl D
*Planning in a hierarchy of abstraction spaces*(1974)Artificial intelligence**volume**5**number**2**pages**115--135**publisher**Elsevier**doi**10.1016/0004-3702(74)90026-5view online - [Tat77]Tate, Austin
*Generating project networks*(1977)IJCAI**pages**888--893**organization**Morgan Kaufmann Publishers Inc.view online