1 \section{{\tt walk}, the code walker}
\r
3 When you write advanced macros, or when you're analyzing some code to check for
\r
4 some property, you often need to design a function that walks through an
\r
5 arbitrary piece of code, and does complex stuff on it. Such a function is called
\r
6 a code walker. Code walkers can be used for some punctual adjustments, e.g.
\r
7 changing a function's {\tt return} statements into something else, or checking
\r
8 that a loop performs no {\tt break}, up to pretty advanced transformations, such
\r
9 as CPS transformation (a way to encode full continuations into a language taht
\r
10 doesn't support them natively; see Paul Graham's On Lisp for an accessible
\r
11 description of how it works), lazy semantics...
\r
13 Anyway, code walkers are tricky to write, can involve a lot of boilerplate code,
\r
14 and are generally brittle. To ease things as much as possible, Metalua comes
\r
15 with a walk library, which intends to accelerate code walker implementation.
\r
16 Since code walking is intrinsically tricky, the lib won't magically make it
\r
17 trivial, but at least it will save you a lot of time and code, when compared to
\r
18 writing all walkers from scratch. Moreover, other people who took the time to
\r
19 learn the walker generator's API will enter into your code much faster.
\r
21 \subsection{Principles}
\r
23 Code walking is about traversing a tree, first from root to leaves, then from
\r
24 leaves back to the root. This tree is not uniform: some nodes are expressions,
\r
25 some others statements, some others blocks; and each of these node kinds is
\r
26 subdivided in several sub-cases (addition, numeric for loop...). The basic code
\r
27 walker just goes from root to leaves and back to root without doing anything.
\r
28 Then it's up to you to plug some action callbacks in that walker, so that it
\r
29 does interesting things for you.
\r
31 Without entering into the details of AST structure, here is a simplified version
\r
32 of the walking algorithm, pretending to work on a generic tree:
\r
35 function traverse(node)
\r
36 local down_result = down(node)
\r
37 if down_result ~= 'break' then
\r
38 for c in children(node) do
\r
46 The algorithm behind 'walk' is almost as simple as the one above, except that
\r
47 it's specialized for AST trees. You can essentially specialize it by providing
\r
48 your own up() and down() functions. These visitor functions perform whatever
\r
49 action you want on the AST; moreover, down() has the option to return 'break':
\r
50 in that case, the sub-nodes are not traversed. It allows the user to shun parts
\r
51 of the tree, or to provide his own special traversal method in the down()
\r
54 The real walker generator is only marginally more complex than that:
\r
56 \item It lets you define your up() and down(), and down() can return 'break' to
\r
57 cut the tree traversal; however, these up() and down() functions are
\r
58 specialized by node kind: there can be separate callbacks for expr nodes, stat
\r
60 \item There's also a binder() visitor for identifier binders. Binders are
\r
61 variables which declare a new local variable; you'll find them in nodes
\r
62 `Local, `Localrec, `Forin, `Fornum, `Function. The binders are visited just
\r
63 before the variable's scope begins, i.e. before traversing a loop or a
\r
64 function's body, after traversing a `Local's right-hand side, before
\r
65 traversing a `Localrec's right-hand side. \\
\r
66 Notice that separate up() and down() visitors wouldn't make sense for
\r
67 binders, since they're leave nodes.
\r
68 \item Visitor functions don't only take the visited node as parameter: they also
\r
69 take the list of all expr, stat and block nodes above it, up to the AST's
\r
70 root. This allows a child node to act on its parent nodes.
\r
74 There are 3 main tree walkers: {\tt walk.expr()}, {\tt walk.stat()} and {\tt
\r
75 walk.block()}, to walk through the corresponding kinds of ASTs. Each of these
\r
76 walker take as parameters a table {\tt cfg} containing the various visitor
\r
77 functions, and the AST to walk throuhg. the configuration table {\tt cfg} can
\r
80 \item {\tt cfg.stat.down(node, parent, grandparent...)}, which applies when
\r
81 traversing a statement down, i.e. before its children nodes are parsed, and
\r
82 can modify the tree, and return {\tt nil} or {\tt'break'}. The way children
\r
83 are traversed is decided {\em after} the {\tt down()} visitor has been run:
\r
84 this point matters when the visitor modifies its children nodes.
\r
85 \item {\tt cfg.stat.up(node, parent, grandparent...)}, which is applies on the
\r
86 way back up. It is applied even if {\tt cfg.stat.down()} returned
\r
87 {\tt'break'}, but in that case, the children have not been (and will not be)
\r
89 \item {\tt cfg.expr.down()} and {\tt cfg.expr.up()}, which work just as their
\r
90 {\tt stat} equivalent, but apply to expression nodes.\\
\r
91 Notice that in Lua, function calls and method invocations can be used as
\r
92 statements as well as as espressions: in such cases, they are visited only by
\r
93 the statement visitor, not by the expression visitor.
\r
94 \item {\tt cfg.block.down()} and {\tt cfg.block.up()} do the same for statements
\r
95 blocks: loops, conditional and function bodies.
\r
96 \item {\tt cfg.binder(identifier, id\_parent, id\_grandparent...)}: this
\r
97 is run on identifiers which create a new local variable, jsut before that
\r
98 variable's scope begins.
\r
101 Moreover, there is a {\tt walk.guess(cfg, ast)} walker which tries to guess the
\r
102 type of the AST it receives, and applies the appropriate walker. When an AST can
\r
103 be either an expression or a statement (nodes {\tt`Call} and {\tt`Invoke}), it
\r
104 is interpreted as an expression.
\r
106 \subsection{Examples}
\r
108 A bit of practice now. Let's build the simplest walker possible, that does
\r
113 walker = |ast| walk.block(cfg, ast)
\r
116 Now, let's say we want to catch and remove all statement calls to function
\r
117 assert(). This can be done by removing its tag and content: an empty list is
\r
118 simply ignored in an AST. So we're only interested by `Call nodes, and within
\r
119 these nodes, we want the function to be `Id 'assert'. All of this is only
\r
120 relevant to stat nodes:
\r
123 function cfg.stat.down (x)
\r
125 | `Call{ `Id 'assert', ... } -> x.tag=nil; x <- { }
\r
126 | _ -> -- not interested by this node, do nothing
\r
131 You'll almost always want to use the 'match' extension to implement visitors.
\r
132 The imperative table overrider ({\tt x <- y} a.k.a. {\tt table.override(x, y)}
\r
133 also often comes handy to modify an AST.
\r
135 We'll now remove {\tt assert()} calls in non-statement; we cannot replace an
\r
136 expression by nothing, so we'll replace these nodes by these will simply be
\r
137 replaced by {\tt nil}:
\r
140 function cfg.expr.down (x)
\r
142 | `Call{ `Id 'assert', ... } -> x <- `Nil
\r
143 | _ -> -- not interested by this node, do nothing
\r
149 Here's a remark for functional programmers: this API is very imperative; you
\r
150 might cringe at seeing the `Call nodes transformed in-place. Well, I tend to
\r
151 agree but it's generally counter-productive to work against the grain of the
\r
152 wood: Lua is imperative at heart, and design attempts at doing this functionally
\r
153 sucked more than approaches that embraced imperativeness.
\r
156 By making down() return 'break', you can prevent the traversal to go further
\r
157 down. This might be either because you're not interested by the subtrees, or
\r
158 because you want to traverse them in a special way. In that later case, just do
\r
159 the traversal by yourself in the down() function, and cut the walking by
\r
160 returning 'break', so that nodes aren't re-traversed by the default walking
\r
161 algorithm. We'll see that in the next, more complex example, listing of free
\r
164 This example is exclusively there for demonstration purposes. For actual work on
\r
165 identifiers that require awareness of an identifier's binder of freedom, there
\r
166 is a dedicated {\tt walk.id} library.
\r
168 We'll progressively build a walker that gathers all global variables used in a
\r
169 given AST. This involves keeping, at all times, a set of the identifiers
\r
170 currently bound by a "local" declaration, by function parameters, as for loop
\r
171 variables etc. Then, every time an identifier is found in the AST, its presence
\r
172 is checked in the current set of bound variables. If it isn't in it, then it's a
\r
173 free (global) identifier.
\r
175 The first thing we'll need is a scope handling system: something that keeps
\r
176 track of what identifiers are currently in scope. It must also allow to save the
\r
177 current scope (e.g. before we enter a new block) and restore it afterwards (e.g.
\r
178 after we leave the block). This is quite straightforward and unrelated to code
\r
179 walking; here is the code:
\r
181 \begin{Verbatim}[fontsize=\scriptsize]
\r
185 -{ extension 'match' }
\r
187 --------------------------------------------------------------------------------
\r
188 -- Scope handling: ':push()' saves the current scope, ':pop()'
\r
189 -- restores the previously saved one. ':add(identifiers_list)' adds
\r
190 -- identifiers to the current scope. Current scope is stored in
\r
191 -- '.current', as a string->boolean hashtable.
\r
192 --------------------------------------------------------------------------------
\r
195 scope.__index = scope
\r
197 function scope:new()
\r
198 local ret = { current = { } }
\r
199 ret.stack = { ret.current }
\r
200 setmetatable (ret, self)
\r
204 function scope:push()
\r
205 table.insert (self.stack, table.shallow_copy (self.current))
\r
208 function scope:pop()
\r
209 self.current = table.remove (self.stack)
\r
212 function scope:add (vars)
\r
213 for id in values (vars) do
\r
214 match id with `Id{ x } -> self.current[x] = true end
\r
219 (There is an improved version of that class in library {\tt walk.scope}; cf.
\r
220 its documentation for details).
\r
222 Now let's start designing the walker. We'll keep a scope object up to date, as
\r
223 well as a set of found free identifiers, gathered every time we find an `Id{ }
\r
224 node. To slightly simplify matter, we'll consider that the AST represent a
\r
227 \begin{Verbatim}[fontsize=\scriptsize]
\r
228 local function fv (term)
\r
229 local freevars = { }
\r
230 local scope = scope:new()
\r
231 local cfg = { expr = { } }
\r
233 function cfg.expr.down(x)
\r
235 | `Id{ name } -> if not scope.current[name] then freevars[name] = true end
\r
240 walk.guess(cfg, term)
\r
245 Since we don't ever add any identifier to the scope, this will just list all the
\r
246 identifiers used in the AST, bound or not. Now let's start doing more
\r
247 interesting things:
\r
250 \item We'll save the scope's state before entering a new block, and restore it
\r
251 when we leave it. That will be done by providing functions {\tt cfg.block.down()}
\r
252 and {\tt cfg.block.up()}. Saving and restoring will be performed by methods
\r
253 {\tt :push()} and {\tt :pop()}.
\r
254 \item Whenever we find a {\tt local} declaration, we'll add the list of
\r
255 identifiers to the current scope, so that they won't be gathered when parsing
\r
256 the {\tt `Id} expression nodes. Notice that the identifiers declared by the
\r
257 'local' statement only start being valid after the statement, so we'll add
\r
258 them in the {\tt cfg.stat.up()} function rather than {\tt cfg.stat.down()}.
\r
261 \begin{Verbatim}[fontsize=\scriptsize]
\r
262 local cfg = { expr = { },
\r
266 function cfg.stat.up(x)
\r
268 | `Local{ vars, ... } -> scope:add(vars)
\r
273 -----------------------------------------------------------------------------
\r
274 -- Create a separate scope for each block, close it when leaving.
\r
275 -----------------------------------------------------------------------------
\r
276 function cfg.block.down() scope:push() end
\r
277 function cfg.block.up() scope:pop() end
\r
280 This starts to be useful. We can also easily add the case for `Localrec{ } nodes
\r
281 (the ones generated by {\tt "local function foo() ... end"}), where the variable
\r
282 is already bound in the {\tt`Localrec} statement's right-hand side; so we do the
\r
283 same as for {\tt`Local}, but we do it in the {\tt down()} function rather than
\r
286 We'll also take care of {\tt`Function}, {\tt`Forin} and {\tt`Fornum} nodes,
\r
287 which introduce new bound identifiers as function parameters or loop variables.
\r
288 This is quite straightforward; the only thing to take care of is to save the
\r
289 scope before entering the function/loop body (in {\tt down()}), and restore it
\r
290 when leaving (in {\tt up()}). The code becomes:
\r
293 \begin{Verbatim}[fontsize=\scriptsize]
\r
294 local function fv (term)
\r
295 local freevars = { }
\r
296 local scope = scope:new()
\r
297 local cfg = { expr = { },
\r
301 -----------------------------------------------------------------------------
\r
302 -- Check identifiers; add functions parameters to newly created scope.
\r
303 -----------------------------------------------------------------------------
\r
304 function cfg.expr.down(x)
\r
306 | `Id{ name } -> if not scope.current[name] then freevars[name] = true end
\r
307 | `Function{ params, _ } -> scope:push(); scope:add (params)
\r
312 -----------------------------------------------------------------------------
\r
313 -- Close the function scope opened by 'down()'.
\r
314 -----------------------------------------------------------------------------
\r
315 function cfg.expr.up(x)
\r
317 | `Function{ ... } -> scope:pop()
\r
322 -----------------------------------------------------------------------------
\r
323 -- Create a new scope and register loop variable[s] in it
\r
324 -----------------------------------------------------------------------------
\r
325 function cfg.stat.down(x)
\r
327 | `Forin{ vars, ... } -> scope:push(); scope:add(vars)
\r
328 | `Fornum{ var, ... } -> scope:push(); scope:add{var}
\r
329 | `Localrec{ vars, ... } -> scope:add(vars)
\r
330 | `Local{ ... } -> -- pass
\r
335 -----------------------------------------------------------------------------
\r
336 -- Close the scopes opened by 'up()'
\r
337 -----------------------------------------------------------------------------
\r
338 function cfg.stat.up(x)
\r
340 | `Forin{ ... } | `Fornum{ ... } -> scope:pop()
\r
341 | `Local{ vars, ... } -> scope:add(vars)
\r
346 -----------------------------------------------------------------------------
\r
347 -- Create a separate scope for each block, close it when leaving.
\r
348 -----------------------------------------------------------------------------
\r
349 function cfg.block.down() scope:push() end
\r
350 function cfg.block.up() scope:pop() end
\r
352 walk.guess(cfg, term)
\r
357 This is almost correct now. There's one last tricky point of Lua's semantics
\r
358 that we need to address: in {\tt repeat foo until bar} loops, "bar" is included
\r
359 in "foo"'s scope. For instance, if we write {\tt repeat local x=foo() until
\r
360 x>3}, the "x" in the condition is the local variable "x" declared inside the
\r
361 body. This violates our way of handling block scopes: the scope must be kept
\r
362 alive after the block is finished. We'll fix this by providing a custom walking
\r
363 for the block inside `Repeat, and preventing the normal walking to happen by
\r
366 \begin{Verbatim}[fontsize=\scriptsize]
\r
367 -----------------------------------------------------------------------------
\r
368 -- Create a new scope and register loop variable[s] in it
\r
369 -----------------------------------------------------------------------------
\r
370 function cfg.stat.down(x)
\r
372 | `Forin{ vars, ... } -> scope:push(); scope:add(vars)
\r
373 | `Fornum{ var, ... } -> scope:push(); scope:add{var}
\r
374 | `Localrec{ vars, ... } -> scope:add(vars)
\r
375 | `Local{ ... } -> -- pass
\r
376 | `Repeat{ block, cond } -> -- 'cond' is in the scope of 'block'
\r
378 for s in values (block) do walk.stat(cfg)(s) end -- no new scope
\r
379 walk.expr(cfg)(cond)
\r
381 return 'break' -- No automatic walking of subparts 'cond' and 'body'
\r
387 That's it, we've now got a full free variables lister, and have demonstrated
\r
388 most APIs offered by the basic 'walk' library. If you want to walk through
\r
389 identifiers in a scope-aware way, though, you'll want to look at the {\tt
\r
392 \subsection{Library {\tt walk.id}, the scope-aware walker}
\r
394 This library walks AST to gather information about the identifiers in it. It
\r
395 call distinct visitor functions depending on whether an identifier is bound or
\r
396 free; moreover, when an identifier is bound, the visitor also receives its
\r
397 binder node as a parameter. For instance, in {\tt +\{function(x) print(x)
\r
398 end\}}, the bound identifier walker will be called on the \verb|+{x}| in the
\r
399 \verb|print| call, and the visitor's second parameter will be the {\tt`Function}
\r
400 node which created the local variable {\tt x}.
\r
403 The library is loaded with \verb|require 'walk.id'|. The walkers provided are:
\r
405 \item {\tt walk\_id.expr()};
\r
406 \item {\tt walk\_id.stat()};
\r
407 \item {\tt walk\_id.block()};
\r
408 \item {\tt walk\_id.guess()}.
\r
411 They take the same config tables as regular walkers, except that they also
\r
412 recognize the following entries:
\r
414 \item {\tt cfg.id.free(identifier, parent, grandparent...)}, which is run on
\r
416 \item {\tt cfg.id.bound(identifier, binder, parent, grandparent...)}, which is
\r
417 run on bound variables. The statement or expression which created this bound
\r
418 veriable's scope is passed as a second parameter, before the parent nodes.
\r
421 \paragraph{Examples}
\r
422 Let's rewrite the free variables walker above, with the id walker:
\r
424 \begin{Verbatim}[fontsize=\scriptsize]
\r
426 local cfg = { id = { } }
\r
427 local freevars = { }
\r
428 function cfg.id.free(id)
\r
429 local id_name = id[1]
\r
430 freevars[id_name] = true
\r
432 walk_id.guess (cfg, term)
\r
437 Now, let's $\alpha$-rename all bound variables in a term. This is slightly
\r
438 trickier than one could think: we need to first walk the whole tree, then
\r
439 perform all the replacement. If we renamed binders as we went, they would stop
\r
440 binding their variables, and something messy would happen. For instance, if we
\r
441 took {\tt +\{function(x) print(x) end\}} and renamed the binder {\tt x} into
\r
442 {\tt foo}, we'd then start walking the function body on the tree {\tt
\r
443 +\{function(foo) print(x) end\}}, where {\tt x} isn't bound anymore.
\r
445 \begin{Verbatim}[fontsize=\scriptsize]
\r
446 --------------------------------------------------------------------------------
\r
447 -- bound_vars keeps a binder node -> old_name -> new_name dictionary.
\r
448 -- It allows to associate all instances of an identifier together,
\r
449 -- with the binder that created them
\r
450 --------------------------------------------------------------------------------
\r
451 local bound_vars = { }
\r
453 --------------------------------------------------------------------------------
\r
454 -- local_renames will keep all identifier nodes to rename as keys,
\r
455 -- and their new name as values. The renaming must happen after
\r
456 -- the whole tree has been visited, in order to avoid breaking captures.
\r
457 --------------------------------------------------------------------------------
\r
458 local local_renames = { }
\r
460 --------------------------------------------------------------------------------
\r
461 -- Associate a new name in bound_vars when a local variable is created.
\r
462 --------------------------------------------------------------------------------
\r
463 function cfg.binder (id, binder)
\r
464 local old_name = id[1]
\r
465 local binder_table = bound_vars[binder]
\r
466 if not binder_table then
\r
467 -- Create a new entry for this binder:
\r
469 bound_vars[binder] = binder_table
\r
471 local new_name = mlp.gensym(old_name)[1] -- generate a new name
\r
472 binder_table[old_name] = new_name -- remember name for next instances
\r
473 local_renames[id] = new_name -- add to the rename todo-list
\r
476 --------------------------------------------------------------------------------
\r
477 -- Add a bound variable the the rename todo-list
\r
478 --------------------------------------------------------------------------------
\r
479 function cfg.id.bound (id, binder)
\r
480 local old_name = id[1]
\r
481 local new_name = bound_vars[binder][old_name]
\r
482 local_renames[id] = new_name
\r
485 -- walk the tree and fill laocal_renames:
\r
486 walk_id.guess(cfg, ast)
\r
488 -- perform the renaming actions:
\r
489 for id, new_name in pairs(local_renames) do id[1] = new_name end
\r
492 \subsection{Library {\tt walk.scope}, the scope helper}
\r
493 This library allows to easily store data, in an AST walking function, in a scope
\r
494 aware way. Cf. comments in the sources for details.
\r