doc/tutorial-macros.md

   1 % Rust Macros Tutorial
   2
   3 # Introduction
   4
   5 Functions are the primary tool that programmers can use to build abstractions.
   6 Sometimes, however, programmers want to abstract over compile-time syntax
   7 rather than run-time values.
   8 Macros provide syntactic abstraction.
   9 For an example of how this can be useful, consider the following two code fragments,
  10 which both pattern-match on their input and both return early in one case,
  11 doing nothing otherwise:
  12
  13 ~~~~
  14 # enum t { special_a(uint), special_b(uint) };
  15 # fn f() -> uint {
  16 # let input_1 = special_a(0);
  17 # let input_2 = special_a(0);
  18 match input_1 {
  19     special_a(x) => { return x; }
  20     _ => {}
  21 }
  22 // ...
  23 match input_2 {
  24     special_b(x) => { return x; }
  25     _ => {}
  26 }
  27 # return 0u;
  28 # }
  29 ~~~~
  30
  31 This code could become tiresome if repeated many times.
  32 However, no function can capture its functionality to make it possible
  33 to abstract the repetition away.
  34 Rust's macro system, however, can eliminate the repetition. Macros are
  35 lightweight custom syntax extensions, themselves defined using the
  36 `macro_rules!` syntax extension. The following `early_return` macro captures
  37 the pattern in the above code:
  38
  39 ~~~~
  40 # enum t { special_a(uint), special_b(uint) };
  41 # fn f() -> uint {
  42 # let input_1 = special_a(0);
  43 # let input_2 = special_a(0);
  44 macro_rules! early_return(
  45     ($inp:expr $sp:ident) => ( // invoke it like `(input_5 special_e)`
  46         match $inp {
  47             $sp(x) => { return x; }
  48             _ => {}
  49         }
  50     );
  51 )
  52 // ...
  53 early_return!(input_1 special_a);
  54 // ...
  55 early_return!(input_2 special_b);
  56 # return 0;
  57 # }
  58 ~~~~
  59
  60 Macros are defined in pattern-matching style: in the above example, the text
  61 `($inp:expr $sp:ident)` that appears on the left-hand side of the `=>` is the
  62 *macro invocation syntax*, a pattern denoting how to write a call to the
  63 macro. The text on the right-hand side of the `=>`, beginning with `match
  64 $inp`, is the *macro transcription syntax*: what the macro expands to.
  65
  66 # Invocation syntax
  67
  68 The macro invocation syntax specifies the syntax for the arguments to the
  69 macro. It appears on the left-hand side of the `=>` in a macro definition. It
  70 conforms to the following rules:
  71
  72 1. It must be surrounded by parentheses.
  73 2. `$` has special meaning (described below).
  74 3. The `()`s, `[]`s, and `{}`s it contains must balance. For example, `([)` is
  75 forbidden.
  76
  77 Otherwise, the invocation syntax is free-form.
  78
  79 To take as an argument a fragment of Rust code, write `$` followed by a name
  80  (for use on the right-hand side), followed by a `:`, followed by a *fragment
  81  specifier*. The fragment specifier denotes the sort of fragment to match. The
  82  most common fragment specifiers are:
  83
  84 * `ident` (an identifier, referring to a variable or item. Examples: `f`, `x`,
  85   `foo`.)
  86 * `expr` (an expression. Examples: `2 + 2`; `if true then { 1 } else { 2 }`;
  87   `f(42)`.)
  88 * `ty` (a type. Examples: `int`, `~[(char, ~str)]`, `&T`.)
  89 * `pat` (a pattern, usually appearing in a `match` or on the left-hand side of
  90   a declaration. Examples: `Some(t)`; `(17, 'a')`; `_`.)
  91 * `block` (a sequence of actions. Example: `{ log(error, "hi"); return 12; }`)
  92
  93 The parser interprets any token that's not preceded by a `$` literally. Rust's usual
  94 rules of tokenization apply,
  95
  96 So `($x:ident -> (($e:expr)))`, though excessively fancy, would designate a macro
  97 that could be invoked like: `my_macro!(i->(( 2+2 )))`.
  98
  99 ## Invocation location
 100
 101 A macro invocation may take the place of (and therefore expand to)
 102 an expression, an item, or a statement.
 103 The Rust parser will parse the macro invocation as a "placeholder"
 104 for whichever of those three nonterminals is appropriate for the location.
 105
 106 At expansion time, the output of the macro will be parsed as whichever of the
 107 three nonterminals it stands in for. This means that a single macro might,
 108 for example, expand to an item or an expression, depending on its arguments
 109 (and cause a syntax error if it is called with the wrong argument for its
 110 location). Although this behavior sounds excessively dynamic, it is known to
 111 be useful under some circumstances.
 112
 113
 114 # Transcription syntax
 115
 116 The right-hand side of the `=>` follows the same rules as the left-hand side,
 117 except that a `$` need only be followed by the name of the syntactic fragment
 118 to transcribe into the macro expansion; its type need not be repeated.
 119
 120 The right-hand side must be enclosed by delimiters, which the transcriber ignores.
 121 Therefore `() => ((1,2,3))` is a macro that expands to a tuple expression,
 122 `() => (let $x=$val)` is a macro that expands to a statement,
 123 and `() => (1,2,3)` is a macro that expands to a syntax error
 124 (since the transcriber interprets the parentheses on the right-hand-size as delimiters,
 125 and `1,2,3` is not a valid Rust expression on its own).
 126
 127 Except for permissibility of `$name` (and `$(...)*`, discussed below), the
 128 right-hand side of a macro definition is ordinary Rust syntax. In particular,
 129 macro invocations (including invocations of the macro currently being defined)
 130 are permitted in expression, statement, and item locations. However, nothing
 131 else about the code is examined or executed by the macro system; execution
 132 still has to wait until run-time.
 133
 134 ## Interpolation location
 135
 136 The interpolation `$argument_name` may appear in any location consistent with
 137 its fragment specifier (i.e., if it is specified as `ident`, it may be used
 138 anywhere an identifier is permitted).
 139
 140 # Multiplicity
 141
 142 ## Invocation
 143
 144 Going back to the motivating example, recall that `early_return` expanded into
 145 a `match` that would `return` if the `match`'s scrutinee matched the
 146 "special case" identifier provided as the second argument to `early_return`,
 147 and do nothing otherwise. Now suppose that we wanted to write a
 148 version of `early_return` that could handle a variable number of "special"
 149 cases.
 150
 151 The syntax `$(...)*` on the left-hand side of the `=>` in a macro definition
 152 accepts zero or more occurrences of its contents. It works much
 153 like the `*` operator in regular expressions. It also supports a
 154 separator token (a comma-separated list could be written `$(...),*`), and `+`
 155 instead of `*` to mean "at least one".
 156
 157 ~~~~
 158 # enum t { special_a(uint),special_b(uint),special_c(uint),special_d(uint)};
 159 # fn f() -> uint {
 160 # let input_1 = special_a(0);
 161 # let input_2 = special_a(0);
 162 macro_rules! early_return(
 163     ($inp:expr, [ $($sp:ident)|+ ]) => (
 164         match $inp {
 165             $(
 166                 $sp(x) => { return x; }
 167             )+
 168             _ => {}
 169         }
 170     );
 171 )
 172 // ...
 173 early_return!(input_1, [special_a|special_c|special_d]);
 174 // ...
 175 early_return!(input_2, [special_b]);
 176 # return 0;
 177 # }
 178 ~~~~
 179
 180 ### Transcription
 181
 182 As the above example demonstrates, `$(...)*` is also valid on the right-hand
 183 side of a macro definition. The behavior of `*` in transcription,
 184 especially in cases where multiple `*`s are nested, and multiple different
 185 names are involved, can seem somewhat magical and intuitive at first. The
 186 system that interprets them is called "Macro By Example". The two rules to
 187 keep in mind are (1) the behavior of `$(...)*` is to walk through one "layer"
 188 of repetitions for all of the `$name`s it contains in lockstep, and (2) each
 189 `$name` must be under at least as many `$(...)*`s as it was matched against.
 190 If it is under more, it'll be repeated, as appropriate.
 191
 192 ## Parsing limitations
 193
 194
 195 For technical reasons, there are two limitations to the treatment of syntax
 196 fragments by the macro parser:
 197
 198 1. The parser will always parse as much as possible of a Rust syntactic
 199 fragment. For example, if the comma were omitted from the syntax of
 200 `early_return!` above, `input_1 [` would've been interpreted as the beginning
 201 of an array index. In fact, invoking the macro would have been impossible.
 202 2. The parser must have eliminated all ambiguity by the time it reaches a
 203 `$name:fragment_specifier` declaration. This limitation can result in parse
 204 errors when declarations occur at the beginning of, or immediately after,
 205 a `$(...)*`. For example, the grammar `$($t:ty)* $e:expr` will always fail to
 206 parse because the parser would be forced to choose between parsing `t` and
 207 parsing `e`. Changing the invocation syntax to require a distinctive token in
 208 front can solve the problem. In the above example, `$(T $t:ty)* E $e:exp`
 209 solves the problem.
 210
 211 # Macro argument pattern matching
 212
 213 Now consider code like the following:
 214
 215 ## Motivation
 216
 217 ~~~~
 218 # enum t1 { good_1(t2, uint), bad_1 };
 219 # pub struct t2 { body: t3 }
 220 # enum t3 { good_2(uint), bad_2};
 221 # fn f(x: t1) -> uint {
 222 match x {
 223     good_1(g1, val) => {
 224         match g1.body {
 225             good_2(result) => {
 226                 // complicated stuff goes here
 227                 return result + val;
 228             },
 229             _ => fail!("Didn't get good_2")
 230         }
 231     }
 232     _ => return 0 // default value
 233 }
 234 # }
 235 ~~~~
 236
 237 All the complicated stuff is deeply indented, and the error-handling code is
 238 separated from matches that fail. We'd like to write a macro that performs
 239 a match, but with a syntax that suits the problem better. The following macro
 240 can solve the problem:
 241
 242 ~~~~
 243 macro_rules! biased_match (
 244     // special case: `let (x) = ...` is illegal, so use `let x = ...` instead
 245     ( ($e:expr) ~ ($p:pat) else $err:stmt ;
 246       binds $bind_res:ident
 247     ) => (
 248         let $bind_res = match $e {
 249             $p => ( $bind_res ),
 250             _ => { $err }
 251         };
 252     );
 253     // more than one name; use a tuple
 254     ( ($e:expr) ~ ($p:pat) else $err:stmt ;
 255       binds $( $bind_res:ident ),*
 256     ) => (
 257         let ( $( $bind_res ),* ) = match $e {
 258             $p => ( $( $bind_res ),* ),
 259             _ => { $err }
 260         };
 261     )
 262 )
 263
 264 # enum t1 { good_1(t2, uint), bad_1 };
 265 # pub struct t2 { body: t3 }
 266 # enum t3 { good_2(uint), bad_2};
 267 # fn f(x: t1) -> uint {
 268 biased_match!((x)       ~ (good_1(g1, val)) else { return 0 };
 269               binds g1, val )
 270 biased_match!((g1.body) ~ (good_2(result) )
 271                   else { fail!("Didn't get good_2") };
 272               binds result )
 273 // complicated stuff goes here
 274 return result + val;
 275 # }
 276 ~~~~
 277
 278 This solves the indentation problem. But if we have a lot of chained matches
 279 like this, we might prefer to write a single macro invocation. The input
 280 pattern we want is clear:
 281 ~~~~
 282 # macro_rules! b(
 283     ( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
 284       binds $( $bind_res:ident ),*
 285     )
 286 # => (0))
 287 ~~~~
 288
 289 However, it's not possible to directly expand to nested match statements. But
 290 there is a solution.
 291
 292 ## The recursive approach to macro writing
 293
 294 A macro may accept multiple different input grammars. The first one to
 295 successfully match the actual argument to a macro invocation is the one that
 296 "wins".
 297
 298 In the case of the example above, we want to write a recursive macro to
 299 process the semicolon-terminated lines, one-by-one. So, we want the following
 300 input patterns:
 301
 302 ~~~~
 303 # macro_rules! b(
 304     ( binds $( $bind_res:ident ),* )
 305 # => (0))
 306 ~~~~
 307 ...and:
 308
 309 ~~~~
 310 # macro_rules! b(
 311     (    ($e     :expr) ~ ($p     :pat) else $err     :stmt ;
 312       $( ($e_rest:expr) ~ ($p_rest:pat) else $err_rest:stmt ; )*
 313       binds  $( $bind_res:ident ),*
 314     )
 315 # => (0))
 316 ~~~~
 317
 318 The resulting macro looks like this. Note that the separation into
 319 `biased_match!` and `biased_match_rec!` occurs only because we have an outer
 320 piece of syntax (the `let`) which we only want to transcribe once.
 321
 322 ~~~~
 323
 324 macro_rules! biased_match_rec (
 325     // Handle the first layer
 326     (   ($e     :expr) ~ ($p     :pat) else $err     :stmt ;
 327      $( ($e_rest:expr) ~ ($p_rest:pat) else $err_rest:stmt ; )*
 328      binds $( $bind_res:ident ),*
 329     ) => (
 330         match $e {
 331             $p => {
 332                 // Recursively handle the next layer
 333                 biased_match_rec!($( ($e_rest) ~ ($p_rest) else $err_rest ; )*
 334                                   binds $( $bind_res ),*
 335                 )
 336             }
 337             _ => { $err }
 338         }
 339     );
 340     ( binds $( $bind_res:ident ),* ) => ( ($( $bind_res ),*) )
 341 )
 342
 343 // Wrap the whole thing in a `let`.
 344 macro_rules! biased_match (
 345     // special case: `let (x) = ...` is illegal, so use `let x = ...` instead
 346     ( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
 347       binds $bind_res:ident
 348     ) => (
 349         let ( $( $bind_res ),* ) = biased_match_rec!(
 350             $( ($e) ~ ($p) else $err ; )*
 351             binds $bind_res
 352         );
 353     );
 354     // more than one name: use a tuple
 355     ( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
 356       binds  $( $bind_res:ident ),*
 357     ) => (
 358         let ( $( $bind_res ),* ) = biased_match_rec!(
 359             $( ($e) ~ ($p) else $err ; )*
 360             binds $( $bind_res ),*
 361         );
 362     )
 363 )
 364
 365
 366 # enum t1 { good_1(t2, uint), bad_1 };
 367 # pub struct t2 { body: t3 }
 368 # enum t3 { good_2(uint), bad_2};
 369 # fn f(x: t1) -> uint {
 370 biased_match!(
 371     (x)       ~ (good_1(g1, val)) else { return 0 };
 372     (g1.body) ~ (good_2(result) ) else { fail!("Didn't get good_2") };
 373     binds val, result )
 374 // complicated stuff goes here
 375 return result + val;
 376 # }
 377 ~~~~
 378
 379 This technique applies to many cases where transcribing a result all at once is not possible.
 380 The resulting code resembles ordinary functional programming in some respects,
 381 but has some important differences from functional programming.
 382
 383 The first difference is important, but also easy to forget: the transcription
 384 (right-hand) side of a `macro_rules!` rule is literal syntax, which can only
 385 be executed at run-time. If a piece of transcription syntax does not itself
 386 appear inside another macro invocation, it will become part of the final
 387 program. If it is inside a macro invocation (for example, the recursive
 388 invocation of `biased_match_rec!`), it does have the opportunity to affect
 389 transcription, but only through the process of attempted pattern matching.
 390
 391 The second, related, difference is that the evaluation order of macros feels
 392 "backwards" compared to ordinary programming. Given an invocation
 393 `m1!(m2!())`, the expander first expands `m1!`, giving it as input the literal
 394 syntax `m2!()`. If it transcribes its argument unchanged into an appropriate
 395 position (in particular, not as an argument to yet another macro invocation),
 396 the expander will then proceed to evaluate `m2!()` (along with any other macro
 397 invocations `m1!(m2!())` produced).
 398
 399 # A final note
 400
 401 Macros, as currently implemented, are not for the faint of heart. Even
 402 ordinary syntax errors can be more difficult to debug when they occur inside a
 403 macro, and errors caused by parse problems in generated code can be very
 404 tricky. Invoking the `log_syntax!` macro can help elucidate intermediate
 405 states, invoking `trace_macros!(true)` will automatically print those
 406 intermediate states out, and passing the flag `--pretty expanded` as a
 407 command-line argument to the compiler will show the result of expansion.