1 \section{{\tt gg}, the grammar generator}
3 \verb|gg| is the grammar generator, the library with which Metalua
4 parser is built. Knowing it allows you to easily write your own
5 parsers, and to plug them into mlp, the existing Metalua source
6 parser. It defines a couple of generators, which take parsers as
7 parameters, and return a more complex parser as a result by combining
10 Notice that \verb|gg| sources are thoroughly commented, and try to be
11 readable as a secondary source of documentation.
13 There are four main classes in gg, which allow to generate:
15 \item sequences of keywords and subparsers;
16 \item keyword-driven sequence sets, i.e. parsers which select a
17 sequence parser depending on an initial keyword;
18 \item lists, i.e. series of an undetermined number of elements of the
19 same type, optionnaly separated by a keyword (typically ``{\tt,}'');
20 \item expressions, built with infix, prefix and suffix operators
21 around a primary expression element;
26 \subsection{Sequences}
28 A sequence parser combines sub-parsers together, calling them one after
29 the other. A lot of these sub-parsers will simply read a keyword, and
30 do nothing of it except making sure that it is indeed here: these can
31 be specified by a simple string. For instance, the following
32 declarations create parsers that read function declarations, thanks
35 \item \verb|func_stat_name| reads a function name (a list of
36 identifiers separated by dots, plus optionally a semicolon and a
38 \item \verb|func_params_content| reads a possibly empty list of
39 identifiers separated with commas, plus an optional ``\verb|...|'';
40 \item \verb|mlp.block| reads a block of statements (here the function
42 \item \verb|mlp.id| reads an identifier.
45 \begin{Verbatim}[fontsize=\scriptsize]
46 -- Read a function definition statement
47 func_stat = gg.sequence{ "function", func_stat_name, "(",
48 func_params_content, ")", mlp.block, "end" }
50 -- Read a local function definition statement
51 func_stat = gg.sequence{ "local", "function", mlp.id, "(",
52 func_params_content, ")", mlp.block, "end" }
56 \subsubsection{Constructor {\tt gg.sequence (config\_table) }}
58 This function returns a sequence parser. \verb|config_table| contains,
59 in its array part, a sequence of string representing keyword parsers,
60 and arbitrary sub-parsers. Moreover, the following fields are allowed
61 in the hash part of the table:
64 \item\verb|name = <string>|: the parser's name, used to generate error
66 \item\verb|builder = <function>|: if present, whenever the parser is
67 called, the list of the sub-parsers results is passed to this
68 function, and the function's return value is returned as the
69 parser's result. If absent, the list of sub-parser results is simply
70 returned. It can be updated at anytime with:\\
71 \verb|x.builder = <newval>|.
72 \item\verb|builder = <string>|: the string is added as a tag to the
73 list of sub-parser results. \verb|builder = "foobar"| is equivalent
75 \verb|builder = function(x) x.tag="foobar"; return x end|
76 \item\verb|transformers = <function list>|: applies all the functions
77 of the list, in sequence, to the result of the parsing: these
78 functions must be of type AST$\rightarrow$AST. For instance, if the
79 transformers list is {\tt\{f1, f2, f3\}}, and the builder returns
80 {\tt x}, then the whole parser returns {\tt f3(f2(f1(x)))}.
83 \subsubsection{Method {\tt :parse(lexstream)}}
85 Read a sequence from the lexstream. If the sequence can't be entirely
86 read, an error occurs. The result is either the list of results of
87 sub-parsers, or the result of \verb|builder| if it is non-nil. In the
88 \verb|func_stat| example above, the result would be a list of 3
89 elements: the results of \verb|func_stat_name|,
90 \verb|func_params_content| and \verb|mlp.block|.
92 It can also be directly called as simply \verb|x(lexstream)| instead of
93 \verb|x:parse(lexstream)|.
95 \subsubsection{Method {\tt .transformers:add(f)}}
96 Adds a function at the end of the transformers list.
98 \subsection{Sequence sets}
100 In many cases, several sequence parsers can be applied at a given
101 point, and the choice of the right parser is determined by the next
102 keyword in the lexstream. This is typically the case for Metalua's
103 statement parser. All the sequence parsers must start with a keyword
104 rather than a sub-parser, and that initial keyword must be different
105 for each sequence parser in the sequence set parser. The sequence set
106 parser takes care of selecting the appropriate sequence
107 parser. Moreover, if the next token in the lexstream is not a keyword, or
108 if it is a keyword but no sequence parser starts with it, the sequence
109 set parser can have a default parser which is used as a fallback.
111 For instance, the declaration of \verb|mlp.stat| the Metalua statement
115 mlp.stat = gg.multisequence{
116 mlp.do_stat, mlp.while_stat, mlp.repeat_stat, mlp.if_stat... }
119 \subsubsection{Constructor {\tt gg.multisequence (config\_table)}}
122 This function returns a sequence set parser. The array part of
123 \verb|config_table| contains a list of parsers. It also accepts tables
124 instead of sequence parsers: in this case, these tables are supposed
125 to be config tables for \verb|gg.sequence| constructor, and are
126 converted into sequence parsers on-the-fly by calling
127 \verb|gg.sequence| on them. Each of these sequence parsers has to
128 start with a keyword, distinct from all other initial keywords,
129 e.g. it's illegal for two sequences in the same multisequence to start
130 with keyword {\tt do}; it's also illegal for any parser but the
131 default one to start with a subparser, e.g. {\tt mlp.id}.
133 It also accepts the following fields in the table's hash part:
135 \item\verb|default = <parser>|: if no sequence can be chosen, because
136 the next token is not a keyword, or no sequence parser in the set
137 starts with that keyword, then the default parser is run instead; if
138 no default parser is provided and no sequence parser can be chosen,
139 an error is generated when parsing.
140 \item\verb|name = <string>|: the parser's name, used to generate arror
142 \item\verb|builder = <function>|: if present, whenever the parser is
143 called, the selected parser's result is passed to this
144 function, and the function's return value is returned as the
145 parser's result. If absent, the selected parser's result is simply
146 returned. It can be updated at anytime with
147 \verb|x.builder = <newval>|.
148 \item\verb|builder = <string>|: the string is added as a tag to the
149 list of sub-parser results. \verb|builder = "foobar"| is equivalent
151 \verb|builder = function(x) return { tag = "foobar"; unpack (x) } end|
152 \item\verb|transformers = <function list>|: applies all the functions
153 of the list, in sequence, to the result of the parsing: these
154 functions must be of type AST$\rightarrow$AST.
157 \subsubsection{Method {\tt :parse(lexstream)}}
159 Read from the lexstream. The result returned is the result of the selected
160 parser (one of the sequence parsers, or the default parser). That
161 result is either returned directly, or passed through \verb|builder|
162 if this field is non-nil.
164 It can also be directly called as simply \verb|x(lexstream)| instead of
165 \verb|x:parse(lexstream)|.
167 \subsubsection{Method {\tt :add(sequence\_parser)}}
169 Take a sequence parser, or a config table that would be accepted by
170 \verb|gg.sequence| to build a sequence parser. Add that parser to the
171 set of sequence parsers handled by x. Cause an error if the parser
172 doesn't start with a keyword, or if that initial keyword is already
173 reserved by a registered sequence parser, or if the parser is not a
176 \subsubsection{Field {\tt .default}}
177 This field contains the default parser, and can be set to another
180 \subsubsection{Method {\tt .transformers:add}}
181 Adds a function at the end of the transformers list.
183 \subsubsection{Method {\tt :get(keyword)}}
184 Takes a keyword (as a string), and returns the sequence in the set
185 starting with that keyword, or nil if there is no such sequence.
187 \subsubsection{Method {\tt :del(keyword)}}
188 Removes the sequence parser starting with keyword {\tt kw}.
190 \subsection{List parser}
192 Sequence parsers allow to chain several different sub-parser. Another
193 common need is to read a series of identical elements into a list, but
194 without knowing in advance how many of such elements will be
195 found. This allows to read lists of arguments in a call, lists of
196 parameters in a function definition, lists of statements in a
199 Another common feature of such lists is that elements of the list are
200 separated by keywords, typically semicolons or commas. These are
201 handled by the list parser generator.
203 A list parser needs a way to know when the list is finished. When
204 elements are separated by keyword separators, this is easy to
205 determine: the list stops when an element is not followed by a
206 separator. But two cases remain unsolved:
208 \item When a list is allowed to be empty, no separator keyword allows
209 the parser to realize that it is in front of an empty list: it would
210 call the element parser, and that parser would fail;
211 \item when there are no separator keyword separator specified, they
212 can't be used to determine the end of the list.
215 For these two cases, a list parser can specify a set of terminator
216 keywords: in separator-less lists, the parser returns as soon as a
217 terminator keyword is found where an element would otherwise have been
218 read. In lists with separators, if terminators are specified, and such
219 a terminator is found at the beginning of the list, then no element is
220 parsed, and an empty list is returned. For instance, for argument
221 lists, ``\verb|)|'' would be specified as a terminator, so that empty
222 argument lists ``\verb|()|'' are handled properly.
224 Beware that separators are consumed from the lexstream stream, but
227 %\caveat{FIXME: check that it works as advertized (for list parserd
228 % with separators AND terminators; it's related to the trailing
229 % separator issue in block and table\_content parsers). There still is
230 % a design issue to settle here.}
232 \subsubsection{Constructor {\tt gg.list (config\_table)}}
234 This function returns a list parser. \verb|config_table| can contain
235 the following fields:
238 \item\verb|primary = <parser>| (mandatory): the parser used to read
239 elemetns of the list;
241 \item\verb|separators = <list> |: list of strings representing the
242 keywords accepted as element separators. If only one separator is
243 allowed, then the string can be passed outside the list:\\
244 \verb|separators = "foo"| is the same as
245 \verb|separators = { "foo" }|.
247 \item\verb|terminators = <list> |: list of strings representing the
248 keywords accepted as list terminators. If only one separator is
249 allowed, then the string can be passed outside the list.
251 \item\verb|name = <string>|: the parser's name, used to generate arror
254 \item\verb|builder = <function>|: if present, whenever the parser is
255 called, the list of primary parser results is passed to this
256 function, and the function's return value is returned as the
257 parser's result. If absent, the list of sub-parser results is simply
258 returned. It can be updated at anytime with
259 \verb|x.builder = <newval>|.
261 \item\verb|builder = <string>|: the string is added as a tag to the
262 list of sub-parser results. \verb|builder = "foobar"| is equivalent
264 \verb|builder = function(x) return { tag = "foobar"; unpack (x) } end|
266 \item\verb|transformers = <function list>|: applies all the functions
267 of the list, in sequence, to the result of the parsing: these
268 functions must be of type AST$\rightarrow$AST.
270 \item if keyless element is found in \verb|config_table|, and there is
271 no \verb|primary| key in the table, then it is expected to be a
272 parser, and it is considered to be the primary parser.
275 \subsubsection{Method {\tt :parse (lexstream)}}
277 Read a list from the lexstream. The result is either the list of elements
278 as read by the primary parser, or the result of that list passed
279 through \verb|builder| if it is specified.
281 It can also be directly called as simply \verb|x(lexstream)| instead of
282 \verb|x:parse(lexstream)|.
284 \subsubsection{Method {\tt .transformers:add}}
285 Adds a function at the end of the transformers list.
287 \subsection{Method {\tt .separators:add}}
288 Adds a string to the list of separators.
290 \subsection{Method {\tt .terminators:add}}
291 Adds a string to the list of terminators.
293 \subsection{Expression parser}
295 This is a very powerfull parser generator, but it ensues that its API
296 is quite large. An expression parser relies on a primary parser, and
297 the elements read by this parser can be:
299 \item combined pairwise by infix operators;
300 \item modified by prefix operators;
301 \item modified by suffix operators.
304 All operators are described in a way analoguous to sequence config
305 tables: a sequence of keywords-as-strings and subparsers in a table,
306 plus the usual \verb|builder| and \verb|transformers|
307 fields. Each kind of operator has its own signature for \verb|builder|
308 functions, and some specific additional information such as precedence
309 or associativity. As in multisequences, the choice among operators is
310 determined by the initial keyword. Therefore, it is illegal to have
311 two operator sequences which start with the same keyword (for
312 instance, two infix sequences starting with keyword ``\$'').
314 Most of the time, the sequences representing operators will have a
315 single keyword, and no subparser. For instance, the addition is
317 \verb~{ "+", prec=60, assoc="left", builder= |a, _, b| `Op{ `Add, a, b } }~
320 \paragraph{Infix operators}
321 Infix operators are described by a table whose array-part works as for
322 sequence parsers. Besides this array-part and the usual {\tt
323 transformers} list, the table accepts the following fields in its hash-part:
326 \item\verb|prec = <number>| its precedence. The higher the precedence,
327 the tighter the operator bind with respect to other operators. For
328 instance, in Metalua, addition precedence is 60, whereas
329 multiplication precedence is 70.
331 \item\verb|assoc = <string>| is one of \verb|"left"|, \verb|"right"|,
332 \verb|"flat"| or \verb|"none"|, and specifies how an operator
333 associates. If not specified, the default associativity is
336 Left and right describe how to read sequences of operators with the
337 same precedence, e.g. addition is left associative ({\tt 1+2+3} reads as
338 {\tt(1+2)+3}), whereas exponentiation is right-associative (\verb|1^2^3|
339 reads as \verb|1^(2^3)|).
341 If an operator is non-associative and an ambiguity is found, a
342 parsing error occurs.
344 Finally, flat operators get series of them collected in a list,
345 which is passed to the corresponding builder as a single
346 parameter. For instance, if \verb|++| is declared as flat and its
347 builder is \verb|f|, then whenever {\tt 1++2++3++4} is parsed, the
348 result returned is {\tt f\{1, 2, 3, 4\}}.
350 \item\verb|builder = <function>| the usual result transformer. The
351 function takes as parameters the left operand, the result of the
352 sequence parser (i.e. \verb|{ }| if the sequence contains no
353 subparser), and the right operand; it must return the resulting AST.
357 %For instance, Metalua's addition parser could be described by:
360 %{ "+", assoc = "left", prec = 60, builder = |a,_,b| `Op{ `Add, a, b } }
363 \paragraph{Prefix operators}
364 These operators are placed before the sub-expression they modify. They
365 have the same properties as infix operators, except that they don't
366 have an \verb|assoc| field, and \verb|builder| takes {\tt|operator,
367 operand|} instead of {\tt|left\_operand, operator, right\_operand|}.
369 \paragraph{Suffix operators}
370 Same as prefix operators, except that \verb|builder| takes
371 {\tt|operand, operator|} instead of {\tt|operator, operand|}.
373 \subsubsection{Constructor {\tt gg.expr (config\_table)}}
375 This function returns an expression parser. \verb|config_table|
376 is a table of fields which describes the kind of expression to be
377 read by the parser. The following fields can appear in the table:
380 \item\verb|primary| (mandatory): the primary parser, which reads the
381 primary elements linked by operators. In Metalua expression parsers,
382 that would be numbers, strings, identifiers\ldots It is often a
383 multisequence parser, although that's not mandatory.
384 \item\verb|prefix|: a list of tables representing prefix operator
385 sequences, as described above. It supports a {\tt default} parser:
386 this parser is considered to have succesfully parsed a prefix
387 operator if it returns a non-false result.
388 \item\verb|infix|: a list of tables representing infix operator
389 sequences, as described above. Supports a {\tt default} parser.
390 \item\verb|suffix|: a list of tables representing suffix operator
391 sequences, as described above. Supports a {\tt default} parser.
394 \subsubsection{Methods {\tt .prefix:add()}, {\tt .infix:add()}, {\tt
396 Add an operator in the relevant table. The argument must be an
397 operator sequence table, as described above.
399 \subsubsection{Method {\tt :add()}}
400 This is just a shortcut for {\tt primary.add}. Unspecified behavior if
401 {\tt primary} doesn't support method {\tt add}.
404 \subsubsection{Method {\tt :parse (lexstream)}}
405 Read a list from the lexstream. The result is built by \verb|builder1| calls.
407 It can also be directly called as simply \verb|x(lexstream)| instead of
408 \verb|x:parse(lexstream)|.
410 \subsubsection{Method {\tt :tostring()}}
412 Returns a string representing the parser. Mainly useful for error
415 \subsection{{\tt onkeyword} parser}
417 Takes a list of keywords and a parser: if the next token is one of the
418 keywords in the list, runs the parser; if not, simply returns
421 Notice that by default, the keyword is consumed by the
422 \verb|onkeyword| parser. If you want it not to be consumed, but
423 instead passed to the internal parser, add a \verb|peek=true| entry in
426 \subsubsection{Constructor {\tt gg.onkeyword (config\_table)}}
428 Create a keyword-conditionnal parser. \verb|config_table| can contain:
431 \item strings, representing the triggerring keywords (at least one);
432 \item the parser to run if one of the keywords is found (exactly one);
433 \item \verb|peek=<boolean>| to indicate whether recognized keywords
434 must be consumed or passed to the inner parser.
436 The order of elements in the list is not relevant.
438 \subsubsection{Method {\tt :parse (lexstream)}}
440 Run the parser. The result is the internal parser's result, or
441 \verb|false| if the next token in the lexstream wasn't one of the
444 It can also be directly called as simply \verb|x(lexstream)| instead of
445 \verb|x:parse(lexstream)|.
447 \subsection{{\tt optkeyword} parser}
449 Watch for optional keywords: an \verb|optkeyword| parser has a list of
450 keyword strings as a configuration. If such a keyword is found as the
451 nex lexstream element upon parsing, the keyword is consumed and that
452 string is returned. If not, \verb|false| is returned.
454 \subsubsection{Constructor {\tt gg.optkeyword (keyword1, keyword2, ...)}}
456 Return a \verb|gg.optkeyword| parser, which accepts all of the
457 keywords given as parameters, and returns either the found keyword, or
458 \verb|false| if none is found.