-\section{{\tt walk}, the code walker}\r
-\r
-When you write advanced macros, or when you're analyzing some code to check for\r
-some property, you often need to design a function that walks through an\r
-arbitrary piece of code, and does complex stuff on it. Such a function is called\r
-a code walker. Code walkers can be used for some punctual adjustments, e.g.\r
-changing a function's {\tt return} statements into something else, or checking\r
-that a loop performs no {\tt break}, up to pretty advanced transformations, such\r
-as CPS transformation (a way to encode full continuations into a language taht\r
-doesn't support them natively; see Paul Graham's On Lisp for an accessible\r
-description of how it works), lazy semantics...\r
-\r
-Anyway, code walkers are tricky to write, can involve a lot of boilerplate code,\r
-and are generally brittle. To ease things as much as possible, Metalua comes\r
-with a walk library, which intends to accelerate code walker implementation.\r
-Since code walking is intrinsically tricky, the lib won't magically make it\r
-trivial, but at least it will save you a lot of time and code, when compared to\r
-writing all walkers from scratch. Moreover, other people who took the time to\r
-learn the walker generator's API will enter into your code much faster.\r
-\r
-\subsection{Principles}\r
-\r
-Code walking is about traversing a tree, first from root to leaves, then from\r
-leaves back to the root. This tree is not uniform: some nodes are expressions,\r
-some others statements, some others blocks; and each of these node kinds is\r
-subdivided in several sub-cases (addition, numeric for loop...). The basic code\r
-walker just goes from root to leaves and back to root without doing anything.\r
-Then it's up to you to plug some action callbacks in that walker, so that it\r
-does interesting things for you.\r
-\r
-Without entering into the details of AST structure, here is a simplified version\r
-of the walking algorithm, pretending to work on a generic tree:\r
-\r
-\begin{verbatim}\r
-function traverse(node)\r
- local down_result = down(node)\r
- if down_result ~= 'break' then \r
- for c in children(node) do\r
- traverse(node)\r
- end\r
- end\r
- up(node)\r
-end\r
-\end{verbatim}\r
-\r
-The algorithm behind 'walk' is almost as simple as the one above, except that\r
-it's specialized for AST trees. You can essentially specialize it by providing\r
-your own up() and down() functions. These visitor functions perform whatever\r
-action you want on the AST; moreover, down() has the option to return 'break':\r
-in that case, the sub-nodes are not traversed. It allows the user to shun parts\r
-of the tree, or to provide his own special traversal method in the down()\r
-visitor.\r
-\r
-The real walker generator is only marginally more complex than that:\r
-\begin{itemize}\r
-\item It lets you define your up() and down(), and down() can return 'break' to\r
- cut the tree traversal; however, these up() and down() functions are\r
- specialized by node kind: there can be separate callbacks for expr nodes, stat\r
- nodes, block nodes.\r
-\item There's also a binder() visitor for identifier binders. Binders are\r
- variables which declare a new local variable; you'll find them in nodes\r
- `Local, `Localrec, `Forin, `Fornum, `Function. The binders are visited just\r
- before the variable's scope begins, i.e. before traversing a loop or a\r
- function's body, after traversing a `Local's right-hand side, before\r
- traversing a `Localrec's right-hand side. \\ \r
- Notice that separate up() and down() visitors wouldn't make sense for\r
- binders, since they're leave nodes.\r
-\item Visitor functions don't only take the visited node as parameter: they also\r
- take the list of all expr, stat and block nodes above it, up to the AST's\r
- root. This allows a child node to act on its parent nodes.\r
-\end{itemize}\r
-\r
-\subsection{API}\r
-There are 3 main tree walkers: {\tt walk.expr()}, {\tt walk.stat()} and {\tt\r
- walk.block()}, to walk through the corresponding kinds of ASTs. Each of these\r
-walker take as parameters a table {\tt cfg} containing the various visitor\r
-functions, and the AST to walk throuhg. the configuration table {\tt cfg} can\r
-contain fields:\r
-\begin{itemize}\r
-\item {\tt cfg.stat.down(node, parent, grandparent...)}, which applies when\r
- traversing a statement down, i.e. before its children nodes are parsed, and\r
- can modify the tree, and return {\tt nil} or {\tt'break'}. The way children\r
- are traversed is decided {\em after} the {\tt down()} visitor has been run:\r
- this point matters when the visitor modifies its children nodes.\r
-\item {\tt cfg.stat.up(node, parent, grandparent...)}, which is applies on the\r
- way back up. It is applied even if {\tt cfg.stat.down()} returned\r
- {\tt'break'}, but in that case, the children have not been (and will not be)\r
- traversed. \r
-\item {\tt cfg.expr.down()} and {\tt cfg.expr.up()}, which work just as their\r
- {\tt stat} equivalent, but apply to expression nodes.\\\r
- Notice that in Lua, function calls and method invocations can be used as\r
- statements as well as as espressions: in such cases, they are visited only by\r
- the statement visitor, not by the expression visitor.\r
-\item {\tt cfg.block.down()} and {\tt cfg.block.up()} do the same for statements\r
- blocks: loops, conditional and function bodies.\r
-\item {\tt cfg.binder(identifier, id\_parent, id\_grandparent...)}: this\r
- is run on identifiers which create a new local variable, jsut before that\r
- variable's scope begins.\r
-\end{itemize}\r
-\r
-Moreover, there is a {\tt walk.guess(cfg, ast)} walker which tries to guess the\r
-type of the AST it receives, and applies the appropriate walker. When an AST can\r
-be either an expression or a statement (nodes {\tt`Call} and {\tt`Invoke}), it\r
-is interpreted as an expression.\r
-\r
-\subsection{Examples}\r
-\r
-A bit of practice now. Let's build the simplest walker possible, that does\r
-nothing:\r
-\r
-\begin{verbatim}\r
-cfg = { }\r
-walker = |ast| walk.block(cfg, ast)\r
-\end{verbatim}\r
-\r
-Now, let's say we want to catch and remove all statement calls to function\r
-assert(). This can be done by removing its tag and content: an empty list is\r
-simply ignored in an AST. So we're only interested by `Call nodes, and within\r
-these nodes, we want the function to be `Id 'assert'. All of this is only\r
-relevant to stat nodes:\r
-\r
-\begin{verbatim}\r
-function cfg.stat.down (x)\r
- match x with\r
- | `Call{ `Id 'assert', ... } -> x.tag=nil; x <- { }\r
- | _ -> -- not interested by this node, do nothing\r
- end\r
-end\r
-\end{verbatim}\r
-\r
-You'll almost always want to use the 'match' extension to implement visitors.\r
-The imperative table overrider ({\tt x <- y} a.k.a. {\tt table.override(x, y)}\r
-also often comes handy to modify an AST.\r
-\r
-We'll now remove {\tt assert()} calls in non-statement; we cannot replace an\r
-expression by nothing, so we'll replace these nodes by these will simply be\r
-replaced by {\tt nil}:\r
-\r
-\begin{verbatim}\r
-function cfg.expr.down (x)\r
- match x with\r
- | `Call{ `Id 'assert', ... } -> x <- `Nil\r
- | _ -> -- not interested by this node, do nothing\r
- end\r
-end\r
-\end{verbatim}\r
-\r
-\r
-Here's a remark for functional programmers: this API is very imperative; you\r
-might cringe at seeing the `Call nodes transformed in-place. Well, I tend to\r
-agree but it's generally counter-productive to work against the grain of the\r
-wood: Lua is imperative at heart, and design attempts at doing this functionally\r
-sucked more than approaches that embraced imperativeness. \r
-\r
-\paragraph{Cuts}\r
-By making down() return 'break', you can prevent the traversal to go further\r
-down. This might be either because you're not interested by the subtrees, or\r
-because you want to traverse them in a special way. In that later case, just do\r
-the traversal by yourself in the down() function, and cut the walking by\r
-returning 'break', so that nodes aren't re-traversed by the default walking\r
-algorithm. We'll see that in the next, more complex example, listing of free\r
-variables.\r
-\r
-This example is exclusively there for demonstration purposes. For actual work on\r
-identifiers that require awareness of an identifier's binder of freedom, there\r
-is a dedicated {\tt walk.id} library.\r
-\r
-We'll progressively build a walker that gathers all global variables used in a\r
-given AST. This involves keeping, at all times, a set of the identifiers\r
-currently bound by a "local" declaration, by function parameters, as for loop\r
-variables etc. Then, every time an identifier is found in the AST, its presence\r
-is checked in the current set of bound variables. If it isn't in it, then it's a\r
-free (global) identifier.\r
-\r
-The first thing we'll need is a scope handling system: something that keeps\r
-track of what identifiers are currently in scope. It must also allow to save the\r
-current scope (e.g. before we enter a new block) and restore it afterwards (e.g.\r
-after we leave the block). This is quite straightforward and unrelated to code\r
-walking; here is the code:\r
-\r
-\begin{Verbatim}[fontsize=\scriptsize]\r
-require 'std'\r
-require 'walk'\r
-\r
--{ extension 'match' }\r
-\r
---------------------------------------------------------------------------------\r
--- Scope handling: ':push()' saves the current scope, ':pop()'\r
--- restores the previously saved one. ':add(identifiers_list)' adds\r
--- identifiers to the current scope. Current scope is stored in\r
--- '.current', as a string->boolean hashtable.\r
---------------------------------------------------------------------------------\r
-\r
-local scope = { }\r
-scope.__index = scope\r
-\r
-function scope:new()\r
- local ret = { current = { } }\r
- ret.stack = { ret.current }\r
- setmetatable (ret, self)\r
- return ret\r
-end\r
-\r
-function scope:push()\r
- table.insert (self.stack, table.shallow_copy (self.current))\r
-end\r
-\r
-function scope:pop()\r
- self.current = table.remove (self.stack)\r
-end\r
-\r
-function scope:add (vars)\r
- for id in values (vars) do\r
- match id with `Id{ x } -> self.current[x] = true end\r
- end\r
-end\r
-\end{Verbatim}\r
-\r
-(There is an improved version of that class in library {\tt walk.scope}; cf.\r
-its documentation for details).\r
-\r
-Now let's start designing the walker. We'll keep a scope object up to date, as\r
-well as a set of found free identifiers, gathered every time we find an `Id{ }\r
-node. To slightly simplify matter, we'll consider that the AST represent a\r
-block.\r
-\r
-\begin{Verbatim}[fontsize=\scriptsize]\r
-local function fv (term)\r
- local freevars = { }\r
- local scope = scope:new()\r
- local cfg = { expr = { } }\r
-\r
- function cfg.expr.down(x)\r
- match x with\r
- | `Id{ name } -> if not scope.current[name] then freevars[name] = true end\r
- | _ -> -- pass\r
- end\r
- end\r
-\r
- walk.guess(cfg, term)\r
- return freevars\r
-end\r
-\end{Verbatim}\r
-\r
-Since we don't ever add any identifier to the scope, this will just list all the\r
-identifiers used in the AST, bound or not. Now let's start doing more\r
-interesting things:\r
-\r
-\begin{itemize}\r
-\item We'll save the scope's state before entering a new block, and restore it\r
- when we leave it. That will be done by providing functions {\tt cfg.block.down()}\r
- and {\tt cfg.block.up()}. Saving and restoring will be performed by methods\r
- {\tt :push()} and {\tt :pop()}.\r
-\item Whenever we find a {\tt local} declaration, we'll add the list of\r
- identifiers to the current scope, so that they won't be gathered when parsing\r
- the {\tt `Id} expression nodes. Notice that the identifiers declared by the\r
- 'local' statement only start being valid after the statement, so we'll add\r
- them in the {\tt cfg.stat.up()} function rather than {\tt cfg.stat.down()}.\r
-\end{itemize}\r
-\r
-\begin{Verbatim}[fontsize=\scriptsize]\r
-local cfg = { expr = { },\r
- stat = { },\r
- block = { } }\r
- [...]\r
- function cfg.stat.up(x)\r
- match x with\r
- | `Local{ vars, ... } -> scope:add(vars)\r
- | _ -> -- pass\r
- end\r
- end\r
-\r
- -----------------------------------------------------------------------------\r
- -- Create a separate scope for each block, close it when leaving.\r
- -----------------------------------------------------------------------------\r
- function cfg.block.down() scope:push() end\r
- function cfg.block.up() scope:pop() end \r
-\end{Verbatim}\r
-\r
-This starts to be useful. We can also easily add the case for `Localrec{ } nodes\r
-(the ones generated by {\tt "local function foo() ... end"}), where the variable\r
-is already bound in the {\tt`Localrec} statement's right-hand side; so we do the\r
-same as for {\tt`Local}, but we do it in the {\tt down()} function rather than\r
-in the up() one.\r
-\r
-We'll also take care of {\tt`Function}, {\tt`Forin} and {\tt`Fornum} nodes,\r
-which introduce new bound identifiers as function parameters or loop variables.\r
-This is quite straightforward; the only thing to take care of is to save the\r
-scope before entering the function/loop body (in {\tt down()}), and restore it\r
-when leaving (in {\tt up()}). The code becomes:\r
-\r
- \r
-\begin{Verbatim}[fontsize=\scriptsize]\r
-local function fv (term)\r
- local freevars = { }\r
- local scope = scope:new()\r
- local cfg = { expr = { },\r
- stat = { },\r
- block = { } }\r
-\r
- -----------------------------------------------------------------------------\r
- -- Check identifiers; add functions parameters to newly created scope.\r
- -----------------------------------------------------------------------------\r
- function cfg.expr.down(x)\r
- match x with\r
- | `Id{ name } -> if not scope.current[name] then freevars[name] = true end\r
- | `Function{ params, _ } -> scope:push(); scope:add (params)\r
- | _ -> -- pass\r
- end\r
- end\r
-\r
- -----------------------------------------------------------------------------\r
- -- Close the function scope opened by 'down()'.\r
- -----------------------------------------------------------------------------\r
- function cfg.expr.up(x) \r
- match x with\r
- | `Function{ ... } -> scope:pop()\r
- | _ -> -- pass\r
- end\r
- end\r
-\r
- -----------------------------------------------------------------------------\r
- -- Create a new scope and register loop variable[s] in it\r
- -----------------------------------------------------------------------------\r
- function cfg.stat.down(x)\r
- match x with\r
- | `Forin{ vars, ... } -> scope:push(); scope:add(vars)\r
- | `Fornum{ var, ... } -> scope:push(); scope:add{var}\r
- | `Localrec{ vars, ... } -> scope:add(vars)\r
- | `Local{ ... } -> -- pass\r
- | _ -> -- pass\r
- end\r
- end\r
-\r
- -----------------------------------------------------------------------------\r
- -- Close the scopes opened by 'up()'\r
- -----------------------------------------------------------------------------\r
- function cfg.stat.up(x)\r
- match x with\r
- | `Forin{ ... } | `Fornum{ ... } -> scope:pop()\r
- | `Local{ vars, ... } -> scope:add(vars)\r
- | _ -> -- pass\r
- end\r
- end\r
-\r
- -----------------------------------------------------------------------------\r
- -- Create a separate scope for each block, close it when leaving.\r
- -----------------------------------------------------------------------------\r
- function cfg.block.down() scope:push() end\r
- function cfg.block.up() scope:pop() end\r
-\r
- walk.guess(cfg, term)\r
- return freevars\r
-end\r
-\end{Verbatim}\r
-\r
-This is almost correct now. There's one last tricky point of Lua's semantics\r
-that we need to address: in {\tt repeat foo until bar} loops, "bar" is included\r
-in "foo"'s scope. For instance, if we write {\tt repeat local x=foo() until\r
- x>3}, the "x" in the condition is the local variable "x" declared inside the\r
-body. This violates our way of handling block scopes: the scope must be kept\r
-alive after the block is finished. We'll fix this by providing a custom walking\r
-for the block inside `Repeat, and preventing the normal walking to happen by\r
-returning 'break':\r
-\r
-\begin{Verbatim}[fontsize=\scriptsize]\r
- -----------------------------------------------------------------------------\r
- -- Create a new scope and register loop variable[s] in it\r
- -----------------------------------------------------------------------------\r
- function cfg.stat.down(x)\r
- match x with\r
- | `Forin{ vars, ... } -> scope:push(); scope:add(vars)\r
- | `Fornum{ var, ... } -> scope:push(); scope:add{var}\r
- | `Localrec{ vars, ... } -> scope:add(vars)\r
- | `Local{ ... } -> -- pass\r
- | `Repeat{ block, cond } -> -- 'cond' is in the scope of 'block'\r
- scope:push()\r
- for s in values (block) do walk.stat(cfg)(s) end -- no new scope\r
- walk.expr(cfg)(cond)\r
- scope:pop()\r
- return 'break' -- No automatic walking of subparts 'cond' and 'body'\r
- | _ -> -- pass\r
- end\r
- end\r
-\end{Verbatim}\r
-\r
-That's it, we've now got a full free variables lister, and have demonstrated\r
-most APIs offered by the basic 'walk' library. If you want to walk through\r
-identifiers in a scope-aware way, though, you'll want to look at the {\tt\r
- walk.id} library.\r
-\r
-\subsection{Library {\tt walk.id}, the scope-aware walker}\r
-\r
-This library walks AST to gather information about the identifiers in it. It\r
-call distinct visitor functions depending on whether an identifier is bound or\r
-free; moreover, when an identifier is bound, the visitor also receives its\r
-binder node as a parameter. For instance, in {\tt +\{function(x) print(x)\r
- end\}}, the bound identifier walker will be called on the \verb|+{x}| in the\r
-\verb|print| call, and the visitor's second parameter will be the {\tt`Function}\r
-node which created the local variable {\tt x}.\r
-\r
-\paragraph{API}\r
-The library is loaded with \verb|require 'walk.id'|. The walkers provided are:\r
-\begin{itemize}\r
-\item {\tt walk\_id.expr()};\r
-\item {\tt walk\_id.stat()};\r
-\item {\tt walk\_id.block()};\r
-\item {\tt walk\_id.guess()}.\r
-\end{itemize}\r
-\r
-They take the same config tables as regular walkers, except that they also\r
-recognize the following entries:\r
-\begin{itemize}\r
-\item {\tt cfg.id.free(identifier, parent, grandparent...)}, which is run on\r
- free variables;\r
-\item {\tt cfg.id.bound(identifier, binder, parent, grandparent...)}, which is\r
- run on bound variables. The statement or expression which created this bound\r
- veriable's scope is passed as a second parameter, before the parent nodes.\r
-\end{itemize}\r
-\r
-\paragraph{Examples}\r
-Let's rewrite the free variables walker above, with the id walker:\r
-\r
-\begin{Verbatim}[fontsize=\scriptsize]\r
-function fv (term)\r
- local cfg = { id = { } }\r
- local freevars = { }\r
- function cfg.id.free(id)\r
- local id_name = id[1]\r
- freevars[id_name] = true\r
- end\r
- walk_id.guess (cfg, term)\r
- return freevars\r
-end\r
-\end{Verbatim}\r
-\r
-Now, let's $\alpha$-rename all bound variables in a term. This is slightly\r
-trickier than one could think: we need to first walk the whole tree, then\r
-perform all the replacement. If we renamed binders as we went, they would stop\r
-binding their variables, and something messy would happen. For instance, if we\r
-took {\tt +\{function(x) print(x) end\}} and renamed the binder {\tt x} into\r
-{\tt foo}, we'd then start walking the function body on the tree {\tt\r
- +\{function(foo) print(x) end\}}, where {\tt x} isn't bound anymore.\r
-\r
-\begin{Verbatim}[fontsize=\scriptsize]\r
---------------------------------------------------------------------------------\r
--- bound_vars keeps a binder node -> old_name -> new_name dictionary.\r
--- It allows to associate all instances of an identifier together,\r
--- with the binder that created them\r
---------------------------------------------------------------------------------\r
-local bound_vars = { }\r
-\r
---------------------------------------------------------------------------------\r
--- local_renames will keep all identifier nodes to rename as keys,\r
--- and their new name as values. The renaming must happen after \r
--- the whole tree has been visited, in order to avoid breaking captures.\r
---------------------------------------------------------------------------------\r
-local local_renames = { }\r
-\r
---------------------------------------------------------------------------------\r
--- Associate a new name in bound_vars when a local variable is created.\r
---------------------------------------------------------------------------------\r
-function cfg.binder (id, binder)\r
- local old_name = id[1]\r
- local binder_table = bound_vars[binder]\r
- if not binder_table then\r
- -- Create a new entry for this binder:\r
- binder_table = { }\r
- bound_vars[binder] = binder_table\r
- end\r
- local new_name = mlp.gensym(old_name)[1] -- generate a new name\r
- binder_table[old_name] = new_name -- remember name for next instances\r
- local_renames[id] = new_name -- add to the rename todo-list\r
-end\r
-\r
---------------------------------------------------------------------------------\r
--- Add a bound variable the the rename todo-list\r
---------------------------------------------------------------------------------\r
-function cfg.id.bound (id, binder)\r
- local old_name = id[1]\r
- local new_name = bound_vars[binder][old_name]\r
- local_renames[id] = new_name\r
-end\r
-\r
--- walk the tree and fill laocal_renames:\r
-walk_id.guess(cfg, ast)\r
-\r
--- perform the renaming actions:\r
-for id, new_name in pairs(local_renames) do id[1] = new_name end\r
-\end{Verbatim}\r
-\r
-\subsection{Library {\tt walk.scope}, the scope helper}\r
-This library allows to easily store data, in an AST walking function, in a scope\r
-aware way. Cf. comments in the sources for details.\r
+\section{{\tt walk}, the code walker}
+
+When you write advanced macros, or when you're analyzing some code to check for
+some property, you often need to design a function that walks through an
+arbitrary piece of code, and does complex stuff on it. Such a function is called
+a code walker. Code walkers can be used for some punctual adjustments, e.g.
+changing a function's {\tt return} statements into something else, or checking
+that a loop performs no {\tt break}, up to pretty advanced transformations, such
+as CPS transformation (a way to encode full continuations into a language taht
+doesn't support them natively; see Paul Graham's On Lisp for an accessible
+description of how it works), lazy semantics...
+
+Anyway, code walkers are tricky to write, can involve a lot of boilerplate code,
+and are generally brittle. To ease things as much as possible, Metalua comes
+with a walk library, which intends to accelerate code walker implementation.
+Since code walking is intrinsically tricky, the lib won't magically make it
+trivial, but at least it will save you a lot of time and code, when compared to
+writing all walkers from scratch. Moreover, other people who took the time to
+learn the walker generator's API will enter into your code much faster.
+
+\subsection{Principles}
+
+Code walking is about traversing a tree, first from root to leaves, then from
+leaves back to the root. This tree is not uniform: some nodes are expressions,
+some others statements, some others blocks; and each of these node kinds is
+subdivided in several sub-cases (addition, numeric for loop...). The basic code
+walker just goes from root to leaves and back to root without doing anything.
+Then it's up to you to plug some action callbacks in that walker, so that it
+does interesting things for you.
+
+Without entering into the details of AST structure, here is a simplified version
+of the walking algorithm, pretending to work on a generic tree:
+
+\begin{verbatim}
+function traverse(node)
+ local down_result = down(node)
+ if down_result ~= 'break' then
+ for c in children(node) do
+ traverse(node)
+ end
+ end
+ up(node)
+end
+\end{verbatim}
+
+The algorithm behind 'walk' is almost as simple as the one above, except that
+it's specialized for AST trees. You can essentially specialize it by providing
+your own up() and down() functions. These visitor functions perform whatever
+action you want on the AST; moreover, down() has the option to return 'break':
+in that case, the sub-nodes are not traversed. It allows the user to shun parts
+of the tree, or to provide his own special traversal method in the down()
+visitor.
+
+The real walker generator is only marginally more complex than that:
+\begin{itemize}
+\item It lets you define your up() and down(), and down() can return 'break' to
+ cut the tree traversal; however, these up() and down() functions are
+ specialized by node kind: there can be separate callbacks for expr nodes, stat
+ nodes, block nodes.
+\item There's also a binder() visitor for identifier binders. Binders are
+ variables which declare a new local variable; you'll find them in nodes
+ `Local, `Localrec, `Forin, `Fornum, `Function. The binders are visited just
+ before the variable's scope begins, i.e. before traversing a loop or a
+ function's body, after traversing a `Local's right-hand side, before
+ traversing a `Localrec's right-hand side. \\
+ Notice that separate up() and down() visitors wouldn't make sense for
+ binders, since they're leave nodes.
+\item Visitor functions don't only take the visited node as parameter: they also
+ take the list of all expr, stat and block nodes above it, up to the AST's
+ root. This allows a child node to act on its parent nodes.
+\end{itemize}
+
+\subsection{API}
+There are 3 main tree walkers: {\tt walk.expr()}, {\tt walk.stat()} and {\tt
+ walk.block()}, to walk through the corresponding kinds of ASTs. Each of these
+walker take as parameters a table {\tt cfg} containing the various visitor
+functions, and the AST to walk throuhg. the configuration table {\tt cfg} can
+contain fields:
+\begin{itemize}
+\item {\tt cfg.stat.down(node, parent, grandparent...)}, which applies when
+ traversing a statement down, i.e. before its children nodes are parsed, and
+ can modify the tree, and return {\tt nil} or {\tt'break'}. The way children
+ are traversed is decided {\em after} the {\tt down()} visitor has been run:
+ this point matters when the visitor modifies its children nodes.
+\item {\tt cfg.stat.up(node, parent, grandparent...)}, which is applies on the
+ way back up. It is applied even if {\tt cfg.stat.down()} returned
+ {\tt'break'}, but in that case, the children have not been (and will not be)
+ traversed.
+\item {\tt cfg.expr.down()} and {\tt cfg.expr.up()}, which work just as their
+ {\tt stat} equivalent, but apply to expression nodes.\\
+ Notice that in Lua, function calls and method invocations can be used as
+ statements as well as as espressions: in such cases, they are visited only by
+ the statement visitor, not by the expression visitor.
+\item {\tt cfg.block.down()} and {\tt cfg.block.up()} do the same for statements
+ blocks: loops, conditional and function bodies.
+\item {\tt cfg.binder(identifier, id\_parent, id\_grandparent...)}: this
+ is run on identifiers which create a new local variable, jsut before that
+ variable's scope begins.
+\end{itemize}
+
+Moreover, there is a {\tt walk.guess(cfg, ast)} walker which tries to guess the
+type of the AST it receives, and applies the appropriate walker. When an AST can
+be either an expression or a statement (nodes {\tt`Call} and {\tt`Invoke}), it
+is interpreted as an expression.
+
+\subsection{Examples}
+
+A bit of practice now. Let's build the simplest walker possible, that does
+nothing:
+
+\begin{verbatim}
+cfg = { }
+walker = |ast| walk.block(cfg, ast)
+\end{verbatim}
+
+Now, let's say we want to catch and remove all statement calls to function
+assert(). This can be done by removing its tag and content: an empty list is
+simply ignored in an AST. So we're only interested by `Call nodes, and within
+these nodes, we want the function to be `Id 'assert'. All of this is only
+relevant to stat nodes:
+
+\begin{verbatim}
+function cfg.stat.down (x)
+ match x with
+ | `Call{ `Id 'assert', ... } -> x.tag=nil; x <- { }
+ | _ -> -- not interested by this node, do nothing
+ end
+end
+\end{verbatim}
+
+You'll almost always want to use the 'match' extension to implement visitors.
+The imperative table overrider ({\tt x <- y} a.k.a. {\tt table.override(x, y)}
+also often comes handy to modify an AST.
+
+We'll now remove {\tt assert()} calls in non-statement; we cannot replace an
+expression by nothing, so we'll replace these nodes by these will simply be
+replaced by {\tt nil}:
+
+\begin{verbatim}
+function cfg.expr.down (x)
+ match x with
+ | `Call{ `Id 'assert', ... } -> x <- `Nil
+ | _ -> -- not interested by this node, do nothing
+ end
+end
+\end{verbatim}
+
+
+Here's a remark for functional programmers: this API is very imperative; you
+might cringe at seeing the `Call nodes transformed in-place. Well, I tend to
+agree but it's generally counter-productive to work against the grain of the
+wood: Lua is imperative at heart, and design attempts at doing this functionally
+sucked more than approaches that embraced imperativeness.
+
+\paragraph{Cuts}
+By making down() return 'break', you can prevent the traversal to go further
+down. This might be either because you're not interested by the subtrees, or
+because you want to traverse them in a special way. In that later case, just do
+the traversal by yourself in the down() function, and cut the walking by
+returning 'break', so that nodes aren't re-traversed by the default walking
+algorithm. We'll see that in the next, more complex example, listing of free
+variables.
+
+This example is exclusively there for demonstration purposes. For actual work on
+identifiers that require awareness of an identifier's binder of freedom, there
+is a dedicated {\tt walk.id} library.
+
+We'll progressively build a walker that gathers all global variables used in a
+given AST. This involves keeping, at all times, a set of the identifiers
+currently bound by a "local" declaration, by function parameters, as for loop
+variables etc. Then, every time an identifier is found in the AST, its presence
+is checked in the current set of bound variables. If it isn't in it, then it's a
+free (global) identifier.
+
+The first thing we'll need is a scope handling system: something that keeps
+track of what identifiers are currently in scope. It must also allow to save the
+current scope (e.g. before we enter a new block) and restore it afterwards (e.g.
+after we leave the block). This is quite straightforward and unrelated to code
+walking; here is the code:
+
+\begin{Verbatim}[fontsize=\scriptsize]
+require 'std'
+require 'walk'
+
+-{ extension 'match' }
+
+--------------------------------------------------------------------------------
+-- Scope handling: ':push()' saves the current scope, ':pop()'
+-- restores the previously saved one. ':add(identifiers_list)' adds
+-- identifiers to the current scope. Current scope is stored in
+-- '.current', as a string->boolean hashtable.
+--------------------------------------------------------------------------------
+
+local scope = { }
+scope.__index = scope
+
+function scope:new()
+ local ret = { current = { } }
+ ret.stack = { ret.current }
+ setmetatable (ret, self)
+ return ret
+end
+
+function scope:push()
+ table.insert (self.stack, table.shallow_copy (self.current))
+end
+
+function scope:pop()
+ self.current = table.remove (self.stack)
+end
+
+function scope:add (vars)
+ for id in values (vars) do
+ match id with `Id{ x } -> self.current[x] = true end
+ end
+end
+\end{Verbatim}
+
+(There is an improved version of that class in library {\tt walk.scope}; cf.
+its documentation for details).
+
+Now let's start designing the walker. We'll keep a scope object up to date, as
+well as a set of found free identifiers, gathered every time we find an `Id{ }
+node. To slightly simplify matter, we'll consider that the AST represent a
+block.
+
+\begin{Verbatim}[fontsize=\scriptsize]
+local function fv (term)
+ local freevars = { }
+ local scope = scope:new()
+ local cfg = { expr = { } }
+
+ function cfg.expr.down(x)
+ match x with
+ | `Id{ name } -> if not scope.current[name] then freevars[name] = true end
+ | _ -> -- pass
+ end
+ end
+
+ walk.guess(cfg, term)
+ return freevars
+end
+\end{Verbatim}
+
+Since we don't ever add any identifier to the scope, this will just list all the
+identifiers used in the AST, bound or not. Now let's start doing more
+interesting things:
+
+\begin{itemize}
+\item We'll save the scope's state before entering a new block, and restore it
+ when we leave it. That will be done by providing functions {\tt cfg.block.down()}
+ and {\tt cfg.block.up()}. Saving and restoring will be performed by methods
+ {\tt :push()} and {\tt :pop()}.
+\item Whenever we find a {\tt local} declaration, we'll add the list of
+ identifiers to the current scope, so that they won't be gathered when parsing
+ the {\tt `Id} expression nodes. Notice that the identifiers declared by the
+ 'local' statement only start being valid after the statement, so we'll add
+ them in the {\tt cfg.stat.up()} function rather than {\tt cfg.stat.down()}.
+\end{itemize}
+
+\begin{Verbatim}[fontsize=\scriptsize]
+local cfg = { expr = { },
+ stat = { },
+ block = { } }
+ [...]
+ function cfg.stat.up(x)
+ match x with
+ | `Local{ vars, ... } -> scope:add(vars)
+ | _ -> -- pass
+ end
+ end
+
+ -----------------------------------------------------------------------------
+ -- Create a separate scope for each block, close it when leaving.
+ -----------------------------------------------------------------------------
+ function cfg.block.down() scope:push() end
+ function cfg.block.up() scope:pop() end
+\end{Verbatim}
+
+This starts to be useful. We can also easily add the case for `Localrec{ } nodes
+(the ones generated by {\tt "local function foo() ... end"}), where the variable
+is already bound in the {\tt`Localrec} statement's right-hand side; so we do the
+same as for {\tt`Local}, but we do it in the {\tt down()} function rather than
+in the up() one.
+
+We'll also take care of {\tt`Function}, {\tt`Forin} and {\tt`Fornum} nodes,
+which introduce new bound identifiers as function parameters or loop variables.
+This is quite straightforward; the only thing to take care of is to save the
+scope before entering the function/loop body (in {\tt down()}), and restore it
+when leaving (in {\tt up()}). The code becomes:
+
+
+\begin{Verbatim}[fontsize=\scriptsize]
+local function fv (term)
+ local freevars = { }
+ local scope = scope:new()
+ local cfg = { expr = { },
+ stat = { },
+ block = { } }
+
+ -----------------------------------------------------------------------------
+ -- Check identifiers; add functions parameters to newly created scope.
+ -----------------------------------------------------------------------------
+ function cfg.expr.down(x)
+ match x with
+ | `Id{ name } -> if not scope.current[name] then freevars[name] = true end
+ | `Function{ params, _ } -> scope:push(); scope:add (params)
+ | _ -> -- pass
+ end
+ end
+
+ -----------------------------------------------------------------------------
+ -- Close the function scope opened by 'down()'.
+ -----------------------------------------------------------------------------
+ function cfg.expr.up(x)
+ match x with
+ | `Function{ ... } -> scope:pop()
+ | _ -> -- pass
+ end
+ end
+
+ -----------------------------------------------------------------------------
+ -- Create a new scope and register loop variable[s] in it
+ -----------------------------------------------------------------------------
+ function cfg.stat.down(x)
+ match x with
+ | `Forin{ vars, ... } -> scope:push(); scope:add(vars)
+ | `Fornum{ var, ... } -> scope:push(); scope:add{var}
+ | `Localrec{ vars, ... } -> scope:add(vars)
+ | `Local{ ... } -> -- pass
+ | _ -> -- pass
+ end
+ end
+
+ -----------------------------------------------------------------------------
+ -- Close the scopes opened by 'up()'
+ -----------------------------------------------------------------------------
+ function cfg.stat.up(x)
+ match x with
+ | `Forin{ ... } | `Fornum{ ... } -> scope:pop()
+ | `Local{ vars, ... } -> scope:add(vars)
+ | _ -> -- pass
+ end
+ end
+
+ -----------------------------------------------------------------------------
+ -- Create a separate scope for each block, close it when leaving.
+ -----------------------------------------------------------------------------
+ function cfg.block.down() scope:push() end
+ function cfg.block.up() scope:pop() end
+
+ walk.guess(cfg, term)
+ return freevars
+end
+\end{Verbatim}
+
+This is almost correct now. There's one last tricky point of Lua's semantics
+that we need to address: in {\tt repeat foo until bar} loops, "bar" is included
+in "foo"'s scope. For instance, if we write {\tt repeat local x=foo() until
+ x>3}, the "x" in the condition is the local variable "x" declared inside the
+body. This violates our way of handling block scopes: the scope must be kept
+alive after the block is finished. We'll fix this by providing a custom walking
+for the block inside `Repeat, and preventing the normal walking to happen by
+returning 'break':
+
+\begin{Verbatim}[fontsize=\scriptsize]
+ -----------------------------------------------------------------------------
+ -- Create a new scope and register loop variable[s] in it
+ -----------------------------------------------------------------------------
+ function cfg.stat.down(x)
+ match x with
+ | `Forin{ vars, ... } -> scope:push(); scope:add(vars)
+ | `Fornum{ var, ... } -> scope:push(); scope:add{var}
+ | `Localrec{ vars, ... } -> scope:add(vars)
+ | `Local{ ... } -> -- pass
+ | `Repeat{ block, cond } -> -- 'cond' is in the scope of 'block'
+ scope:push()
+ for s in values (block) do walk.stat(cfg)(s) end -- no new scope
+ walk.expr(cfg)(cond)
+ scope:pop()
+ return 'break' -- No automatic walking of subparts 'cond' and 'body'
+ | _ -> -- pass
+ end
+ end
+\end{Verbatim}
+
+That's it, we've now got a full free variables lister, and have demonstrated
+most APIs offered by the basic 'walk' library. If you want to walk through
+identifiers in a scope-aware way, though, you'll want to look at the {\tt
+ walk.id} library.
+
+\subsection{Library {\tt walk.id}, the scope-aware walker}
+
+This library walks AST to gather information about the identifiers in it. It
+call distinct visitor functions depending on whether an identifier is bound or
+free; moreover, when an identifier is bound, the visitor also receives its
+binder node as a parameter. For instance, in {\tt +\{function(x) print(x)
+ end\}}, the bound identifier walker will be called on the \verb|+{x}| in the
+\verb|print| call, and the visitor's second parameter will be the {\tt`Function}
+node which created the local variable {\tt x}.
+
+\paragraph{API}
+The library is loaded with \verb|require 'walk.id'|. The walkers provided are:
+\begin{itemize}
+\item {\tt walk\_id.expr()};
+\item {\tt walk\_id.stat()};
+\item {\tt walk\_id.block()};
+\item {\tt walk\_id.guess()}.
+\end{itemize}
+
+They take the same config tables as regular walkers, except that they also
+recognize the following entries:
+\begin{itemize}
+\item {\tt cfg.id.free(identifier, parent, grandparent...)}, which is run on
+ free variables;
+\item {\tt cfg.id.bound(identifier, binder, parent, grandparent...)}, which is
+ run on bound variables. The statement or expression which created this bound
+ veriable's scope is passed as a second parameter, before the parent nodes.
+\end{itemize}
+
+\paragraph{Examples}
+Let's rewrite the free variables walker above, with the id walker:
+
+\begin{Verbatim}[fontsize=\scriptsize]
+function fv (term)
+ local cfg = { id = { } }
+ local freevars = { }
+ function cfg.id.free(id)
+ local id_name = id[1]
+ freevars[id_name] = true
+ end
+ walk_id.guess (cfg, term)
+ return freevars
+end
+\end{Verbatim}
+
+Now, let's $\alpha$-rename all bound variables in a term. This is slightly
+trickier than one could think: we need to first walk the whole tree, then
+perform all the replacement. If we renamed binders as we went, they would stop
+binding their variables, and something messy would happen. For instance, if we
+took {\tt +\{function(x) print(x) end\}} and renamed the binder {\tt x} into
+{\tt foo}, we'd then start walking the function body on the tree {\tt
+ +\{function(foo) print(x) end\}}, where {\tt x} isn't bound anymore.
+
+\begin{Verbatim}[fontsize=\scriptsize]
+--------------------------------------------------------------------------------
+-- bound_vars keeps a binder node -> old_name -> new_name dictionary.
+-- It allows to associate all instances of an identifier together,
+-- with the binder that created them
+--------------------------------------------------------------------------------
+local bound_vars = { }
+
+--------------------------------------------------------------------------------
+-- local_renames will keep all identifier nodes to rename as keys,
+-- and their new name as values. The renaming must happen after
+-- the whole tree has been visited, in order to avoid breaking captures.
+--------------------------------------------------------------------------------
+local local_renames = { }
+
+--------------------------------------------------------------------------------
+-- Associate a new name in bound_vars when a local variable is created.
+--------------------------------------------------------------------------------
+function cfg.binder (id, binder)
+ local old_name = id[1]
+ local binder_table = bound_vars[binder]
+ if not binder_table then
+ -- Create a new entry for this binder:
+ binder_table = { }
+ bound_vars[binder] = binder_table
+ end
+ local new_name = mlp.gensym(old_name)[1] -- generate a new name
+ binder_table[old_name] = new_name -- remember name for next instances
+ local_renames[id] = new_name -- add to the rename todo-list
+end
+
+--------------------------------------------------------------------------------
+-- Add a bound variable the the rename todo-list
+--------------------------------------------------------------------------------
+function cfg.id.bound (id, binder)
+ local old_name = id[1]
+ local new_name = bound_vars[binder][old_name]
+ local_renames[id] = new_name
+end
+
+-- walk the tree and fill laocal_renames:
+walk_id.guess(cfg, ast)
+
+-- perform the renaming actions:
+for id, new_name in pairs(local_renames) do id[1] = new_name end
+\end{Verbatim}
+
+\subsection{Library {\tt walk.scope}, the scope helper}
+This library allows to easily store data, in an AST walking function, in a scope
+aware way. Cf. comments in the sources for details.