src/doc/guide-macros.md

   1 % The Rust Macros Guide
   2
   3 # Introduction
   4
   5 Functions are the primary tool that programmers can use to build abstractions.
   6 Sometimes, however, programmers want to abstract over compile-time syntax
   7 rather than run-time values.
   8 Macros provide syntactic abstraction.
   9 For an example of how this can be useful, consider the following two code fragments,
  10 which both pattern-match on their input and both return early in one case,
  11 doing nothing otherwise:
  12
  13 ~~~~
  14 # enum T { SpecialA(uint), SpecialB(uint) }
  15 # fn f() -> uint {
  16 # let input_1 = T::SpecialA(0);
  17 # let input_2 = T::SpecialA(0);
  18 match input_1 {
  19     T::SpecialA(x) => { return x; }
  20     _ => {}
  21 }
  22 // ...
  23 match input_2 {
  24     T::SpecialB(x) => { return x; }
  25     _ => {}
  26 }
  27 # return 0u;
  28 # }
  29 ~~~~
  30
  31 This code could become tiresome if repeated many times.
  32 However, no function can capture its functionality to make it possible
  33 to abstract the repetition away.
  34 Rust's macro system, however, can eliminate the repetition. Macros are
  35 lightweight custom syntax extensions, themselves defined using the
  36 `macro_rules!` syntax extension. The following `early_return` macro captures
  37 the pattern in the above code:
  38
  39 ~~~~
  40 # enum T { SpecialA(uint), SpecialB(uint) }
  41 # fn f() -> uint {
  42 # let input_1 = T::SpecialA(0);
  43 # let input_2 = T::SpecialA(0);
  44 macro_rules! early_return {
  45     ($inp:expr, $sp:path) => ( // invoke it like `(input_5 SpecialE)`
  46         match $inp {
  47             $sp(x) => { return x; }
  48             _ => {}
  49         }
  50     );
  51 }
  52 // ...
  53 early_return!(input_1, T::SpecialA);
  54 // ...
  55 early_return!(input_2, T::SpecialB);
  56 # return 0;
  57 # }
  58 # fn main() {}
  59 ~~~~
  60
  61 Macros are defined in pattern-matching style: in the above example, the text
  62 `($inp:expr $sp:ident)` that appears on the left-hand side of the `=>` is the
  63 *macro invocation syntax*, a pattern denoting how to write a call to the
  64 macro. The text on the right-hand side of the `=>`, beginning with `match
  65 $inp`, is the *macro transcription syntax*: what the macro expands to.
  66
  67 # Invocation syntax
  68
  69 The macro invocation syntax specifies the syntax for the arguments to the
  70 macro. It appears on the left-hand side of the `=>` in a macro definition. It
  71 conforms to the following rules:
  72
  73 1. It must be surrounded by parentheses.
  74 2. `$` has special meaning (described below).
  75 3. The `()`s, `[]`s, and `{}`s it contains must balance. For example, `([)` is
  76 forbidden.
  77
  78 Otherwise, the invocation syntax is free-form.
  79
  80 To take a fragment of Rust code as an argument, write `$` followed by a name
  81  (for use on the right-hand side), followed by a `:`, followed by a *fragment
  82  specifier*. The fragment specifier denotes the sort of fragment to match. The
  83  most common fragment specifiers are:
  84
  85 * `ident` (an identifier, referring to a variable or item. Examples: `f`, `x`,
  86   `foo`.)
  87 * `expr` (an expression. Examples: `2 + 2`; `if true then { 1 } else { 2 }`;
  88   `f(42)`.)
  89 * `ty` (a type. Examples: `int`, `Vec<(char, String)>`, `&T`.)
  90 * `pat` (a pattern, usually appearing in a `match` or on the left-hand side of
  91   a declaration. Examples: `Some(t)`; `(17, 'a')`; `_`.)
  92 * `block` (a sequence of actions. Example: `{ log(error, "hi"); return 12; }`)
  93
  94 The parser interprets any token that's not preceded by a `$` literally. Rust's usual
  95 rules of tokenization apply,
  96
  97 So `($x:ident -> (($e:expr)))`, though excessively fancy, would designate a macro
  98 that could be invoked like: `my_macro!(i->(( 2+2 )))`.
  99
 100 ## Invocation location
 101
 102 A macro invocation may take the place of (and therefore expand to) an
 103 expression, item, statement, or pattern.  The Rust parser will parse the macro
 104 invocation as a "placeholder" for whichever syntactic form is appropriate for
 105 the location.
 106
 107 At expansion time, the output of the macro will be parsed as whichever of the
 108 three nonterminals it stands in for. This means that a single macro might,
 109 for example, expand to an item or an expression, depending on its arguments
 110 (and cause a syntax error if it is called with the wrong argument for its
 111 location). Although this behavior sounds excessively dynamic, it is known to
 112 be useful under some circumstances.
 113
 114
 115 # Transcription syntax
 116
 117 The right-hand side of the `=>` follows the same rules as the left-hand side,
 118 except that a `$` need only be followed by the name of the syntactic fragment
 119 to transcribe into the macro expansion; its type need not be repeated.
 120
 121 The right-hand side must be enclosed by delimiters, which the transcriber ignores.
 122 Therefore `() => ((1,2,3))` is a macro that expands to a tuple expression,
 123 `() => (let $x=$val)` is a macro that expands to a statement,
 124 and `() => (1,2,3)` is a macro that expands to a syntax error
 125 (since the transcriber interprets the parentheses on the right-hand-size as delimiters,
 126 and `1,2,3` is not a valid Rust expression on its own).
 127
 128 Except for permissibility of `$name` (and `$(...)*`, discussed below), the
 129 right-hand side of a macro definition is ordinary Rust syntax. In particular,
 130 macro invocations (including invocations of the macro currently being defined)
 131 are permitted in expression, statement, and item locations. However, nothing
 132 else about the code is examined or executed by the macro system; execution
 133 still has to wait until run-time.
 134
 135 ## Interpolation location
 136
 137 The interpolation `$argument_name` may appear in any location consistent with
 138 its fragment specifier (i.e., if it is specified as `ident`, it may be used
 139 anywhere an identifier is permitted).
 140
 141 # Multiplicity
 142
 143 ## Invocation
 144
 145 Going back to the motivating example, recall that `early_return` expanded into
 146 a `match` that would `return` if the `match`'s scrutinee matched the
 147 "special case" identifier provided as the second argument to `early_return`,
 148 and do nothing otherwise. Now suppose that we wanted to write a
 149 version of `early_return` that could handle a variable number of "special"
 150 cases.
 151
 152 The syntax `$(...)*` on the left-hand side of the `=>` in a macro definition
 153 accepts zero or more occurrences of its contents. It works much
 154 like the `*` operator in regular expressions. It also supports a
 155 separator token (a comma-separated list could be written `$(...),*`), and `+`
 156 instead of `*` to mean "at least one".
 157
 158 ~~~~
 159 # enum T { SpecialA(uint),SpecialB(uint),SpecialC(uint),SpecialD(uint)}
 160 # fn f() -> uint {
 161 # let input_1 = T::SpecialA(0);
 162 # let input_2 = T::SpecialA(0);
 163 macro_rules! early_return {
 164     ($inp:expr, [ $($sp:path),+ ]) => (
 165         match $inp {
 166             $(
 167                 $sp(x) => { return x; }
 168             )+
 169             _ => {}
 170         }
 171     )
 172 }
 173 // ...
 174 early_return!(input_1, [T::SpecialA,T::SpecialC,T::SpecialD]);
 175 // ...
 176 early_return!(input_2, [T::SpecialB]);
 177 # return 0;
 178 # }
 179 # fn main() {}
 180 ~~~~
 181
 182 ### Transcription
 183
 184 As the above example demonstrates, `$(...)*` is also valid on the right-hand
 185 side of a macro definition. The behavior of `*` in transcription,
 186 especially in cases where multiple `*`s are nested, and multiple different
 187 names are involved, can seem somewhat magical and unintuitive at first. The
 188 system that interprets them is called "Macro By Example". The two rules to
 189 keep in mind are (1) the behavior of `$(...)*` is to walk through one "layer"
 190 of repetitions for all of the `$name`s it contains in lockstep, and (2) each
 191 `$name` must be under at least as many `$(...)*`s as it was matched against.
 192 If it is under more, it'll be repeated, as appropriate.
 193
 194 ## Parsing limitations
 195
 196
 197 For technical reasons, there are two limitations to the treatment of syntax
 198 fragments by the macro parser:
 199
 200 1. The parser will always parse as much as possible of a Rust syntactic
 201 fragment. For example, if the comma were omitted from the syntax of
 202 `early_return!` above, `input_1 [` would've been interpreted as the beginning
 203 of an array index. In fact, invoking the macro would have been impossible.
 204 2. The parser must have eliminated all ambiguity by the time it reaches a
 205 `$name:fragment_specifier` declaration. This limitation can result in parse
 206 errors when declarations occur at the beginning of, or immediately after,
 207 a `$(...)*`. For example, the grammar `$($t:ty)* $e:expr` will always fail to
 208 parse because the parser would be forced to choose between parsing `t` and
 209 parsing `e`. Changing the invocation syntax to require a distinctive token in
 210 front can solve the problem. In the above example, `$(T $t:ty)* E $e:exp`
 211 solves the problem.
 212
 213 # Macro argument pattern matching
 214
 215 ## Motivation
 216
 217 Now consider code like the following:
 218
 219 ~~~~
 220 # enum T1 { Good1(T2, uint), Bad1}
 221 # struct T2 { body: T3 }
 222 # enum T3 { Good2(uint), Bad2}
 223 # fn f(x: T1) -> uint {
 224 match x {
 225     T1::Good1(g1, val) => {
 226         match g1.body {
 227             T3::Good2(result) => {
 228                 // complicated stuff goes here
 229                 return result + val;
 230             },
 231             _ => panic!("Didn't get good_2")
 232         }
 233     }
 234     _ => return 0 // default value
 235 }
 236 # }
 237 # fn main() {}
 238 ~~~~
 239
 240 All the complicated stuff is deeply indented, and the error-handling code is
 241 separated from matches that fail. We'd like to write a macro that performs
 242 a match, but with a syntax that suits the problem better. The following macro
 243 can solve the problem:
 244
 245 ~~~~
 246 macro_rules! biased_match {
 247     // special case: `let (x) = ...` is illegal, so use `let x = ...` instead
 248     ( ($e:expr) -> ($p:pat) else $err:stmt ;
 249       binds $bind_res:ident
 250     ) => (
 251         let $bind_res = match $e {
 252             $p => ( $bind_res ),
 253             _ => { $err }
 254         };
 255     );
 256     // more than one name; use a tuple
 257     ( ($e:expr) -> ($p:pat) else $err:stmt ;
 258       binds $( $bind_res:ident ),*
 259     ) => (
 260         let ( $( $bind_res ),* ) = match $e {
 261             $p => ( $( $bind_res ),* ),
 262             _ => { $err }
 263         };
 264     )
 265 }
 266
 267 # enum T1 { Good1(T2, uint), Bad1}
 268 # struct T2 { body: T3 }
 269 # enum T3 { Good2(uint), Bad2}
 270 # fn f(x: T1) -> uint {
 271 biased_match!((x)       -> (T1::Good1(g1, val)) else { return 0 };
 272               binds g1, val );
 273 biased_match!((g1.body) -> (T3::Good2(result) )
 274                   else { panic!("Didn't get good_2") };
 275               binds result );
 276 // complicated stuff goes here
 277 return result + val;
 278 # }
 279 # fn main() {}
 280 ~~~~
 281
 282 This solves the indentation problem. But if we have a lot of chained matches
 283 like this, we might prefer to write a single macro invocation. The input
 284 pattern we want is clear:
 285
 286 ~~~~
 287 # fn main() {}
 288 # macro_rules! b {
 289     ( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
 290       binds $( $bind_res:ident ),*
 291     )
 292 # => (0) }
 293 ~~~~
 294
 295 However, it's not possible to directly expand to nested match statements. But
 296 there is a solution.
 297
 298 ## The recursive approach to macro writing
 299
 300 A macro may accept multiple different input grammars. The first one to
 301 successfully match the actual argument to a macro invocation is the one that
 302 "wins".
 303
 304 In the case of the example above, we want to write a recursive macro to
 305 process the semicolon-terminated lines, one-by-one. So, we want the following
 306 input patterns:
 307
 308 ~~~~
 309 # macro_rules! b {
 310     ( binds $( $bind_res:ident ),* )
 311 # => (0) }
 312 # fn main() {}
 313 ~~~~
 314
 315 ...and:
 316
 317 ~~~~
 318 # fn main() {}
 319 # macro_rules! b {
 320     (    ($e     :expr) -> ($p     :pat) else $err     :stmt ;
 321       $( ($e_rest:expr) -> ($p_rest:pat) else $err_rest:stmt ; )*
 322       binds  $( $bind_res:ident ),*
 323     )
 324 # => (0) }
 325 ~~~~
 326
 327 The resulting macro looks like this. Note that the separation into
 328 `biased_match!` and `biased_match_rec!` occurs only because we have an outer
 329 piece of syntax (the `let`) which we only want to transcribe once.
 330
 331 ~~~~
 332 # fn main() {
 333
 334 macro_rules! biased_match_rec {
 335     // Handle the first layer
 336     (   ($e     :expr) -> ($p     :pat) else $err     :stmt ;
 337      $( ($e_rest:expr) -> ($p_rest:pat) else $err_rest:stmt ; )*
 338      binds $( $bind_res:ident ),*
 339     ) => (
 340         match $e {
 341             $p => {
 342                 // Recursively handle the next layer
 343                 biased_match_rec!($( ($e_rest) -> ($p_rest) else $err_rest ; )*
 344                                   binds $( $bind_res ),*
 345                 )
 346             }
 347             _ => { $err }
 348         }
 349     );
 350     // Produce the requested values
 351     ( binds $( $bind_res:ident ),* ) => ( ($( $bind_res ),*) )
 352 }
 353
 354 // Wrap the whole thing in a `let`.
 355 macro_rules! biased_match {
 356     // special case: `let (x) = ...` is illegal, so use `let x = ...` instead
 357     ( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
 358       binds $bind_res:ident
 359     ) => (
 360         let $bind_res = biased_match_rec!(
 361             $( ($e) -> ($p) else $err ; )*
 362             binds $bind_res
 363         );
 364     );
 365     // more than one name: use a tuple
 366     ( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
 367       binds  $( $bind_res:ident ),*
 368     ) => (
 369         let ( $( $bind_res ),* ) = biased_match_rec!(
 370             $( ($e) -> ($p) else $err ; )*
 371             binds $( $bind_res ),*
 372         );
 373     )
 374 }
 375
 376
 377 # enum T1 { Good1(T2, uint), Bad1}
 378 # struct T2 { body: T3 }
 379 # enum T3 { Good2(uint), Bad2}
 380 # fn f(x: T1) -> uint {
 381 biased_match!(
 382     (x)       -> (T1::Good1(g1, val)) else { return 0 };
 383     (g1.body) -> (T3::Good2(result) ) else { panic!("Didn't get Good2") };
 384     binds val, result );
 385 // complicated stuff goes here
 386 return result + val;
 387 # }
 388 # }
 389 ~~~~
 390
 391 This technique applies to many cases where transcribing a result all at once is not possible.
 392 The resulting code resembles ordinary functional programming in some respects,
 393 but has some important differences from functional programming.
 394
 395 The first difference is important, but also easy to forget: the transcription
 396 (right-hand) side of a `macro_rules!` rule is literal syntax, which can only
 397 be executed at run-time. If a piece of transcription syntax does not itself
 398 appear inside another macro invocation, it will become part of the final
 399 program. If it is inside a macro invocation (for example, the recursive
 400 invocation of `biased_match_rec!`), it does have the opportunity to affect
 401 transcription, but only through the process of attempted pattern matching.
 402
 403 The second, related, difference is that the evaluation order of macros feels
 404 "backwards" compared to ordinary programming. Given an invocation
 405 `m1!(m2!())`, the expander first expands `m1!`, giving it as input the literal
 406 syntax `m2!()`. If it transcribes its argument unchanged into an appropriate
 407 position (in particular, not as an argument to yet another macro invocation),
 408 the expander will then proceed to evaluate `m2!()` (along with any other macro
 409 invocations `m1!(m2!())` produced).
 410
 411 # Hygiene
 412
 413 To prevent clashes, rust implements
 414 [hygienic macros](http://en.wikipedia.org/wiki/Hygienic_macro).
 415
 416 As an example, `loop` and `for-loop` labels (discussed in the lifetimes guide)
 417 will not clash. The following code will print "Hello!" only once:
 418
 419 ~~~
 420 macro_rules! loop_x {
 421     ($e: expr) => (
 422         // $e will not interact with this 'x
 423         'x: loop {
 424             println!("Hello!");
 425             $e
 426         }
 427     );
 428 }
 429
 430 fn main() {
 431     'x: loop {
 432         loop_x!(break 'x);
 433         println!("I am never printed.");
 434     }
 435 }
 436 ~~~
 437
 438 The two `'x` names did not clash, which would have caused the loop
 439 to print "I am never printed" and to run forever.
 440
 441 # Scoping and macro import/export
 442
 443 Macros occupy a single global namespace. The interaction with Rust's system of
 444 modules and crates is somewhat complex.
 445
 446 Definition and expansion of macros both happen in a single depth-first,
 447 lexical-order traversal of a crate's source. So a macro defined at module scope
 448 is visible to any subsequent code in the same module, which includes the body
 449 of any subsequent child `mod` items.
 450
 451 If a module has the `macro_use` attribute, its macros are also visible in its
 452 parent module after the child's `mod` item. If the parent also has `macro_use`
 453 then the macros will be visible in the grandparent after the parent's `mod`
 454 item, and so forth.
 455
 456 The `macro_use` attribute can also appear on `extern crate`.  In this context
 457 it controls which macros are loaded from the external crate, e.g.
 458
 459 ```rust,ignore
 460 #[macro_use(foo, bar)]
 461 extern crate baz;
 462 ```
 463
 464 If the attribute is given simply as `#[macro_use]`, all macros are loaded.  If
 465 there is no `#[macro_use]` attribute then no macros are loaded.  Only macros
 466 defined with the `#[macro_export]` attribute may be loaded.
 467
 468 To load a crate's macros *without* linking it into the output, use `#[no_link]`
 469 as well.
 470
 471 An example:
 472
 473 ```rust
 474 macro_rules! m1 { () => (()) }
 475
 476 // visible here: m1
 477
 478 mod foo {
 479     // visible here: m1
 480
 481     #[macro_export]
 482     macro_rules! m2 { () => (()) }
 483
 484     // visible here: m1, m2
 485 }
 486
 487 // visible here: m1
 488
 489 macro_rules! m3 { () => (()) }
 490
 491 // visible here: m1, m3
 492
 493 #[macro_use]
 494 mod bar {
 495     // visible here: m1, m3
 496
 497     macro_rules! m4 { () => (()) }
 498
 499     // visible here: m1, m3, m4
 500 }
 501
 502 // visible here: m1, m3, m4
 503 # fn main() { }
 504 ```
 505
 506 When this library is loaded with `#[use_macros] extern crate`, only `m2` will
 507 be imported.
 508
 509 The Rust Reference has a [listing of macro-related
 510 attributes](reference.html#macro--and-plugin-related-attributes).
 511
 512 # The variable `$crate`
 513
 514 A further difficulty occurs when a macro is used in multiple crates.  Say that
 515 `mylib` defines
 516
 517 ```rust
 518 pub fn increment(x: uint) -> uint {
 519     x + 1
 520 }
 521
 522 #[macro_export]
 523 macro_rules! inc_a {
 524     ($x:expr) => ( ::increment($x) )
 525 }
 526
 527 #[macro_export]
 528 macro_rules! inc_b {
 529     ($x:expr) => ( ::mylib::increment($x) )
 530 }
 531 # fn main() { }
 532 ```
 533
 534 `inc_a` only works within `mylib`, while `inc_b` only works outside the
 535 library.  Furthermore, `inc_b` will break if the user imports `mylib` under
 536 another name.
 537
 538 Rust does not (yet) have a hygiene system for crate references, but it does
 539 provide a simple workaround for this problem.  Within a macro imported from a
 540 crate named `foo`, the special macro variable `$crate` will expand to `::foo`.
 541 By contrast, when a macro is defined and then used in the same crate, `$crate`
 542 will expand to nothing.  This means we can write
 543
 544 ```rust
 545 #[macro_export]
 546 macro_rules! inc {
 547     ($x:expr) => ( $crate::increment($x) )
 548 }
 549 # fn main() { }
 550 ```
 551
 552 to define a single macro that works both inside and outside our library.  The
 553 function name will expand to either `::increment` or `::mylib::increment`.
 554
 555 To keep this system simple and correct, `#[macro_use] extern crate ...` may
 556 only appear at the root of your crate, not inside `mod`.  This ensures that
 557 `$crate` is a single identifier.
 558
 559 # A final note
 560
 561 Macros, as currently implemented, are not for the faint of heart. Even
 562 ordinary syntax errors can be more difficult to debug when they occur inside a
 563 macro, and errors caused by parse problems in generated code can be very
 564 tricky. Invoking the `log_syntax!` macro can help elucidate intermediate
 565 states, invoking `trace_macros!(true)` will automatically print those
 566 intermediate states out, and passing the flag `--pretty expanded` as a
 567 command-line argument to the compiler will show the result of expansion.
 568
 569 If Rust's macro system can't do what you need, you may want to write a
 570 [compiler plugin](guide-plugin.html) instead. Compared to `macro_rules!`
 571 macros, this is significantly more work, the interfaces are much less stable,
 572 and the warnings about debugging apply ten-fold. In exchange you get the
 573 flexibility of running arbitrary Rust code within the compiler. Syntax
 574 extension plugins are sometimes called "procedural macros" for this reason.