src/doc/guide-macros.md

   1 % The Rust Macros Guide
   2
   3 <div class="unstable-feature">
   4 <b>Warning:</b> There are currently various problems with invoking macros, how
   5 they interact with their environment, and how they are used outside of the
   6 location in which they are defined. Macro definitions are likely to change
   7 slightly in the future. For this reason, they are hidden behind the
   8 <code>macro_rules</code> <a href="reference.html#compiler-features">feature
   9 attribute</a>.
  10 </div>
  11
  12 # Introduction
  13
  14 Functions are the primary tool that programmers can use to build abstractions.
  15 Sometimes, however, programmers want to abstract over compile-time syntax
  16 rather than run-time values.
  17 Macros provide syntactic abstraction.
  18 For an example of how this can be useful, consider the following two code fragments,
  19 which both pattern-match on their input and both return early in one case,
  20 doing nothing otherwise:
  21
  22 ~~~~
  23 # enum T { SpecialA(uint), SpecialB(uint) }
  24 # fn f() -> uint {
  25 # let input_1 = T::SpecialA(0);
  26 # let input_2 = T::SpecialA(0);
  27 match input_1 {
  28     T::SpecialA(x) => { return x; }
  29     _ => {}
  30 }
  31 // ...
  32 match input_2 {
  33     T::SpecialB(x) => { return x; }
  34     _ => {}
  35 }
  36 # return 0u;
  37 # }
  38 ~~~~
  39
  40 This code could become tiresome if repeated many times.
  41 However, no function can capture its functionality to make it possible
  42 to abstract the repetition away.
  43 Rust's macro system, however, can eliminate the repetition. Macros are
  44 lightweight custom syntax extensions, themselves defined using the
  45 `macro_rules!` syntax extension. The following `early_return` macro captures
  46 the pattern in the above code:
  47
  48 ~~~~
  49 # #![feature(macro_rules)]
  50 # enum T { SpecialA(uint), SpecialB(uint) }
  51 # fn f() -> uint {
  52 # let input_1 = T::SpecialA(0);
  53 # let input_2 = T::SpecialA(0);
  54 macro_rules! early_return(
  55     ($inp:expr $sp:path) => ( // invoke it like `(input_5 SpecialE)`
  56         match $inp {
  57             $sp(x) => { return x; }
  58             _ => {}
  59         }
  60     );
  61 );
  62 // ...
  63 early_return!(input_1 T::SpecialA);
  64 // ...
  65 early_return!(input_2 T::SpecialB);
  66 # return 0;
  67 # }
  68 # fn main() {}
  69 ~~~~
  70
  71 Macros are defined in pattern-matching style: in the above example, the text
  72 `($inp:expr $sp:ident)` that appears on the left-hand side of the `=>` is the
  73 *macro invocation syntax*, a pattern denoting how to write a call to the
  74 macro. The text on the right-hand side of the `=>`, beginning with `match
  75 $inp`, is the *macro transcription syntax*: what the macro expands to.
  76
  77 # Invocation syntax
  78
  79 The macro invocation syntax specifies the syntax for the arguments to the
  80 macro. It appears on the left-hand side of the `=>` in a macro definition. It
  81 conforms to the following rules:
  82
  83 1. It must be surrounded by parentheses.
  84 2. `$` has special meaning (described below).
  85 3. The `()`s, `[]`s, and `{}`s it contains must balance. For example, `([)` is
  86 forbidden.
  87
  88 Otherwise, the invocation syntax is free-form.
  89
  90 To take a fragment of Rust code as an argument, write `$` followed by a name
  91  (for use on the right-hand side), followed by a `:`, followed by a *fragment
  92  specifier*. The fragment specifier denotes the sort of fragment to match. The
  93  most common fragment specifiers are:
  94
  95 * `ident` (an identifier, referring to a variable or item. Examples: `f`, `x`,
  96   `foo`.)
  97 * `expr` (an expression. Examples: `2 + 2`; `if true then { 1 } else { 2 }`;
  98   `f(42)`.)
  99 * `ty` (a type. Examples: `int`, `Vec<(char, String)>`, `&T`.)
 100 * `pat` (a pattern, usually appearing in a `match` or on the left-hand side of
 101   a declaration. Examples: `Some(t)`; `(17, 'a')`; `_`.)
 102 * `block` (a sequence of actions. Example: `{ log(error, "hi"); return 12; }`)
 103
 104 The parser interprets any token that's not preceded by a `$` literally. Rust's usual
 105 rules of tokenization apply,
 106
 107 So `($x:ident -> (($e:expr)))`, though excessively fancy, would designate a macro
 108 that could be invoked like: `my_macro!(i->(( 2+2 )))`.
 109
 110 ## Invocation location
 111
 112 A macro invocation may take the place of (and therefore expand to)
 113 an expression, an item, or a statement.
 114 The Rust parser will parse the macro invocation as a "placeholder"
 115 for whichever of those three nonterminals is appropriate for the location.
 116
 117 At expansion time, the output of the macro will be parsed as whichever of the
 118 three nonterminals it stands in for. This means that a single macro might,
 119 for example, expand to an item or an expression, depending on its arguments
 120 (and cause a syntax error if it is called with the wrong argument for its
 121 location). Although this behavior sounds excessively dynamic, it is known to
 122 be useful under some circumstances.
 123
 124
 125 # Transcription syntax
 126
 127 The right-hand side of the `=>` follows the same rules as the left-hand side,
 128 except that a `$` need only be followed by the name of the syntactic fragment
 129 to transcribe into the macro expansion; its type need not be repeated.
 130
 131 The right-hand side must be enclosed by delimiters, which the transcriber ignores.
 132 Therefore `() => ((1,2,3))` is a macro that expands to a tuple expression,
 133 `() => (let $x=$val)` is a macro that expands to a statement,
 134 and `() => (1,2,3)` is a macro that expands to a syntax error
 135 (since the transcriber interprets the parentheses on the right-hand-size as delimiters,
 136 and `1,2,3` is not a valid Rust expression on its own).
 137
 138 Except for permissibility of `$name` (and `$(...)*`, discussed below), the
 139 right-hand side of a macro definition is ordinary Rust syntax. In particular,
 140 macro invocations (including invocations of the macro currently being defined)
 141 are permitted in expression, statement, and item locations. However, nothing
 142 else about the code is examined or executed by the macro system; execution
 143 still has to wait until run-time.
 144
 145 ## Interpolation location
 146
 147 The interpolation `$argument_name` may appear in any location consistent with
 148 its fragment specifier (i.e., if it is specified as `ident`, it may be used
 149 anywhere an identifier is permitted).
 150
 151 # Multiplicity
 152
 153 ## Invocation
 154
 155 Going back to the motivating example, recall that `early_return` expanded into
 156 a `match` that would `return` if the `match`'s scrutinee matched the
 157 "special case" identifier provided as the second argument to `early_return`,
 158 and do nothing otherwise. Now suppose that we wanted to write a
 159 version of `early_return` that could handle a variable number of "special"
 160 cases.
 161
 162 The syntax `$(...)*` on the left-hand side of the `=>` in a macro definition
 163 accepts zero or more occurrences of its contents. It works much
 164 like the `*` operator in regular expressions. It also supports a
 165 separator token (a comma-separated list could be written `$(...),*`), and `+`
 166 instead of `*` to mean "at least one".
 167
 168 ~~~~
 169 # #![feature(macro_rules)]
 170 # enum T { SpecialA(uint),SpecialB(uint),SpecialC(uint),SpecialD(uint)}
 171 # fn f() -> uint {
 172 # let input_1 = T::SpecialA(0);
 173 # let input_2 = T::SpecialA(0);
 174 macro_rules! early_return(
 175     ($inp:expr, [ $($sp:path)|+ ]) => (
 176         match $inp {
 177             $(
 178                 $sp(x) => { return x; }
 179             )+
 180             _ => {}
 181         }
 182     )
 183 );
 184 // ...
 185 early_return!(input_1, [T::SpecialA|T::SpecialC|T::SpecialD]);
 186 // ...
 187 early_return!(input_2, [T::SpecialB]);
 188 # return 0;
 189 # }
 190 # fn main() {}
 191 ~~~~
 192
 193 ### Transcription
 194
 195 As the above example demonstrates, `$(...)*` is also valid on the right-hand
 196 side of a macro definition. The behavior of `*` in transcription,
 197 especially in cases where multiple `*`s are nested, and multiple different
 198 names are involved, can seem somewhat magical and unintuitive at first. The
 199 system that interprets them is called "Macro By Example". The two rules to
 200 keep in mind are (1) the behavior of `$(...)*` is to walk through one "layer"
 201 of repetitions for all of the `$name`s it contains in lockstep, and (2) each
 202 `$name` must be under at least as many `$(...)*`s as it was matched against.
 203 If it is under more, it'll be repeated, as appropriate.
 204
 205 ## Parsing limitations
 206
 207
 208 For technical reasons, there are two limitations to the treatment of syntax
 209 fragments by the macro parser:
 210
 211 1. The parser will always parse as much as possible of a Rust syntactic
 212 fragment. For example, if the comma were omitted from the syntax of
 213 `early_return!` above, `input_1 [` would've been interpreted as the beginning
 214 of an array index. In fact, invoking the macro would have been impossible.
 215 2. The parser must have eliminated all ambiguity by the time it reaches a
 216 `$name:fragment_specifier` declaration. This limitation can result in parse
 217 errors when declarations occur at the beginning of, or immediately after,
 218 a `$(...)*`. For example, the grammar `$($t:ty)* $e:expr` will always fail to
 219 parse because the parser would be forced to choose between parsing `t` and
 220 parsing `e`. Changing the invocation syntax to require a distinctive token in
 221 front can solve the problem. In the above example, `$(T $t:ty)* E $e:exp`
 222 solves the problem.
 223
 224 # Macro argument pattern matching
 225
 226 ## Motivation
 227
 228 Now consider code like the following:
 229
 230 ~~~~
 231 # #![feature(macro_rules)]
 232 # enum T1 { Good1(T2, uint), Bad1}
 233 # struct T2 { body: T3 }
 234 # enum T3 { Good2(uint), Bad2}
 235 # fn f(x: T1) -> uint {
 236 match x {
 237     T1::Good1(g1, val) => {
 238         match g1.body {
 239             T3::Good2(result) => {
 240                 // complicated stuff goes here
 241                 return result + val;
 242             },
 243             _ => panic!("Didn't get good_2")
 244         }
 245     }
 246     _ => return 0 // default value
 247 }
 248 # }
 249 # fn main() {}
 250 ~~~~
 251
 252 All the complicated stuff is deeply indented, and the error-handling code is
 253 separated from matches that fail. We'd like to write a macro that performs
 254 a match, but with a syntax that suits the problem better. The following macro
 255 can solve the problem:
 256
 257 ~~~~
 258 # #![feature(macro_rules)]
 259 macro_rules! biased_match (
 260     // special case: `let (x) = ...` is illegal, so use `let x = ...` instead
 261     ( ($e:expr) ~ ($p:pat) else $err:stmt ;
 262       binds $bind_res:ident
 263     ) => (
 264         let $bind_res = match $e {
 265             $p => ( $bind_res ),
 266             _ => { $err }
 267         };
 268     );
 269     // more than one name; use a tuple
 270     ( ($e:expr) ~ ($p:pat) else $err:stmt ;
 271       binds $( $bind_res:ident ),*
 272     ) => (
 273         let ( $( $bind_res ),* ) = match $e {
 274             $p => ( $( $bind_res ),* ),
 275             _ => { $err }
 276         };
 277     )
 278 );
 279
 280 # enum T1 { Good1(T2, uint), Bad1}
 281 # struct T2 { body: T3 }
 282 # enum T3 { Good2(uint), Bad2}
 283 # fn f(x: T1) -> uint {
 284 biased_match!((x)       ~ (T1::Good1(g1, val)) else { return 0 };
 285               binds g1, val );
 286 biased_match!((g1.body) ~ (T3::Good2(result) )
 287                   else { panic!("Didn't get good_2") };
 288               binds result );
 289 // complicated stuff goes here
 290 return result + val;
 291 # }
 292 # fn main() {}
 293 ~~~~
 294
 295 This solves the indentation problem. But if we have a lot of chained matches
 296 like this, we might prefer to write a single macro invocation. The input
 297 pattern we want is clear:
 298
 299 ~~~~
 300 # #![feature(macro_rules)]
 301 # fn main() {}
 302 # macro_rules! b(
 303     ( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
 304       binds $( $bind_res:ident ),*
 305     )
 306 # => (0));
 307 ~~~~
 308
 309 However, it's not possible to directly expand to nested match statements. But
 310 there is a solution.
 311
 312 ## The recursive approach to macro writing
 313
 314 A macro may accept multiple different input grammars. The first one to
 315 successfully match the actual argument to a macro invocation is the one that
 316 "wins".
 317
 318 In the case of the example above, we want to write a recursive macro to
 319 process the semicolon-terminated lines, one-by-one. So, we want the following
 320 input patterns:
 321
 322 ~~~~
 323 # #![feature(macro_rules)]
 324 # macro_rules! b(
 325     ( binds $( $bind_res:ident ),* )
 326 # => (0));
 327 # fn main() {}
 328 ~~~~
 329
 330 ...and:
 331
 332 ~~~~
 333 # #![feature(macro_rules)]
 334 # fn main() {}
 335 # macro_rules! b(
 336     (    ($e     :expr) ~ ($p     :pat) else $err     :stmt ;
 337       $( ($e_rest:expr) ~ ($p_rest:pat) else $err_rest:stmt ; )*
 338       binds  $( $bind_res:ident ),*
 339     )
 340 # => (0));
 341 ~~~~
 342
 343 The resulting macro looks like this. Note that the separation into
 344 `biased_match!` and `biased_match_rec!` occurs only because we have an outer
 345 piece of syntax (the `let`) which we only want to transcribe once.
 346
 347 ~~~~
 348 # #![feature(macro_rules)]
 349 # fn main() {
 350
 351 macro_rules! biased_match_rec (
 352     // Handle the first layer
 353     (   ($e     :expr) ~ ($p     :pat) else $err     :stmt ;
 354      $( ($e_rest:expr) ~ ($p_rest:pat) else $err_rest:stmt ; )*
 355      binds $( $bind_res:ident ),*
 356     ) => (
 357         match $e {
 358             $p => {
 359                 // Recursively handle the next layer
 360                 biased_match_rec!($( ($e_rest) ~ ($p_rest) else $err_rest ; )*
 361                                   binds $( $bind_res ),*
 362                 )
 363             }
 364             _ => { $err }
 365         }
 366     );
 367     // Produce the requested values
 368     ( binds $( $bind_res:ident ),* ) => ( ($( $bind_res ),*) )
 369 );
 370
 371 // Wrap the whole thing in a `let`.
 372 macro_rules! biased_match (
 373     // special case: `let (x) = ...` is illegal, so use `let x = ...` instead
 374     ( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
 375       binds $bind_res:ident
 376     ) => (
 377         let $bind_res = biased_match_rec!(
 378             $( ($e) ~ ($p) else $err ; )*
 379             binds $bind_res
 380         );
 381     );
 382     // more than one name: use a tuple
 383     ( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
 384       binds  $( $bind_res:ident ),*
 385     ) => (
 386         let ( $( $bind_res ),* ) = biased_match_rec!(
 387             $( ($e) ~ ($p) else $err ; )*
 388             binds $( $bind_res ),*
 389         );
 390     )
 391 );
 392
 393
 394 # enum T1 { Good1(T2, uint), Bad1}
 395 # struct T2 { body: T3 }
 396 # enum T3 { Good2(uint), Bad2}
 397 # fn f(x: T1) -> uint {
 398 biased_match!(
 399     (x)       ~ (T1::Good1(g1, val)) else { return 0 };
 400     (g1.body) ~ (T3::Good2(result) ) else { panic!("Didn't get Good2") };
 401     binds val, result );
 402 // complicated stuff goes here
 403 return result + val;
 404 # }
 405 # }
 406 ~~~~
 407
 408 This technique applies to many cases where transcribing a result all at once is not possible.
 409 The resulting code resembles ordinary functional programming in some respects,
 410 but has some important differences from functional programming.
 411
 412 The first difference is important, but also easy to forget: the transcription
 413 (right-hand) side of a `macro_rules!` rule is literal syntax, which can only
 414 be executed at run-time. If a piece of transcription syntax does not itself
 415 appear inside another macro invocation, it will become part of the final
 416 program. If it is inside a macro invocation (for example, the recursive
 417 invocation of `biased_match_rec!`), it does have the opportunity to affect
 418 transcription, but only through the process of attempted pattern matching.
 419
 420 The second, related, difference is that the evaluation order of macros feels
 421 "backwards" compared to ordinary programming. Given an invocation
 422 `m1!(m2!())`, the expander first expands `m1!`, giving it as input the literal
 423 syntax `m2!()`. If it transcribes its argument unchanged into an appropriate
 424 position (in particular, not as an argument to yet another macro invocation),
 425 the expander will then proceed to evaluate `m2!()` (along with any other macro
 426 invocations `m1!(m2!())` produced).
 427
 428 # Hygiene
 429
 430 To prevent clashes, rust implements
 431 [hygienic macros](http://en.wikipedia.org/wiki/Hygienic_macro).
 432
 433 As an example, `loop` and `for-loop` labels (discussed in the lifetimes guide)
 434 will not clash. The following code will print "Hello!" only once:
 435
 436 ~~~
 437 #![feature(macro_rules)]
 438
 439 macro_rules! loop_x (
 440     ($e: expr) => (
 441         // $e will not interact with this 'x
 442         'x: loop {
 443             println!("Hello!");
 444             $e
 445         }
 446     );
 447 );
 448
 449 fn main() {
 450     'x: loop {
 451         loop_x!(break 'x);
 452         println!("I am never printed.");
 453     }
 454 }
 455 ~~~
 456
 457 The two `'x` names did not clash, which would have caused the loop
 458 to print "I am never printed" and to run forever.
 459
 460 # Scoping and macro import/export
 461
 462 Macros occupy a single global namespace. The interaction with Rust's system of
 463 modules and crates is somewhat complex.
 464
 465 Definition and expansion of macros both happen in a single depth-first,
 466 lexical-order traversal of a crate's source. So a macro defined at module scope
 467 is visible to any subsequent code in the same module, which includes the body
 468 of any subsequent child `mod` items.
 469
 470 If a module has the `macro_escape` attribute, its macros are also visible in
 471 its parent module after the child's `mod` item. If the parent also has
 472 `macro_escape` then the macros will be visible in the grandparent after the
 473 parent's `mod` item, and so forth.
 474
 475 Independent of `macro_escape`, the `macro_export` attribute controls visibility
 476 between crates.  Any `macro_rules!` definition with the `macro_export`
 477 attribute will be visible to other crates that have loaded this crate with
 478 `phase(plugin)`. There is currently no way for the importing crate to control
 479 which macros are imported.
 480
 481 An example:
 482
 483 ```rust
 484 # #![feature(macro_rules)]
 485 macro_rules! m1 (() => (()));
 486
 487 // visible here: m1
 488
 489 mod foo {
 490     // visible here: m1
 491
 492     #[macro_export]
 493     macro_rules! m2 (() => (()));
 494
 495     // visible here: m1, m2
 496 }
 497
 498 // visible here: m1
 499
 500 macro_rules! m3 (() => (()));
 501
 502 // visible here: m1, m3
 503
 504 #[macro_escape]
 505 mod bar {
 506     // visible here: m1, m3
 507
 508     macro_rules! m4 (() => (()));
 509
 510     // visible here: m1, m3, m4
 511 }
 512
 513 // visible here: m1, m3, m4
 514 # fn main() { }
 515 ```
 516
 517 When this library is loaded with `#[phase(plugin)] extern crate`, only `m2`
 518 will be imported.
 519
 520 # A final note
 521
 522 Macros, as currently implemented, are not for the faint of heart. Even
 523 ordinary syntax errors can be more difficult to debug when they occur inside a
 524 macro, and errors caused by parse problems in generated code can be very
 525 tricky. Invoking the `log_syntax!` macro can help elucidate intermediate
 526 states, invoking `trace_macros!(true)` will automatically print those
 527 intermediate states out, and passing the flag `--pretty expanded` as a
 528 command-line argument to the compiler will show the result of expansion.
 529
 530 If Rust's macro system can't do what you need, you may want to write a
 531 [compiler plugin](guide-plugin.html) instead. Compared to `macro_rules!`
 532 macros, this is significantly more work, the interfaces are much less stable,
 533 and the warnings about debugging apply ten-fold. In exchange you get the
 534 flexibility of running arbitrary Rust code within the compiler. Syntax
 535 extension plugins are sometimes called "procedural macros" for this reason.