1 # **This is a work in progress**
7 This document is the primary reference for the Rust programming language grammar. It
8 provides only one kind of material:
10 - Chapters that formally define the language grammar and, for each
13 This document does not serve as an introduction to the language. Background
14 familiarity with the language is assumed. A separate [guide] is available to
15 help acquire such background familiarity.
17 This document also does not serve as a reference to the [standard] library
18 included in the language distribution. Those libraries are documented
19 separately by extracting documentation attributes from their source code. Many
20 of the features that one might expect to be language features are library
21 features in Rust, so what you're looking for may be there, not here.
24 [standard]: std/index.html
28 Rust's grammar is defined over Unicode codepoints, each conventionally denoted
29 `U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
30 confined to the ASCII range of Unicode, and is described in this document by a
31 dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
32 supported by common automated LL(k) parsing tools such as `llgen`, rather than
33 the dialect given in ISO 14977. The dialect can be defined self-referentially
38 rule : nonterminal ':' productionrule ';' ;
39 productionrule : production [ '|' production ] * ;
41 term : element repeats ;
42 element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
43 repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
48 - Whitespace in the grammar is ignored.
49 - Square brackets are used to group rules.
50 - `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
51 ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
52 Unicode codepoint `U+00QQ`.
53 - `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
54 - The `repeat` forms apply to the adjacent `element`, and are as follows:
55 - `?` means zero or one repetition
56 - `*` means zero or more repetitions
57 - `+` means one or more repetitions
58 - NUMBER trailing a repeat symbol gives a maximum repetition count
59 - NUMBER on its own gives an exact repetition count
61 This EBNF dialect should hopefully be familiar to many readers.
63 ## Unicode productions
65 A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
66 range. We define these productions in terms of character properties specified
67 in the Unicode standard, rather than in terms of ASCII-range codepoints. The
68 section [Special Unicode Productions](#special-unicode-productions) lists these
71 ## String table productions
73 Some rules in the grammar — notably [unary
74 operators](#unary-operator-expressions), [binary
75 operators](#binary-operator-expressions), and [keywords](#keywords) — are
76 given in a simplified form: as a listing of a table of unquoted, printable
77 whitespace-separated strings. These cases form a subset of the rules regarding
78 the [token](#tokens) rule, and are assumed to be the result of a
79 lexical-analysis phase feeding the parser, driven by a DFA, operating over the
80 disjunction of all such string table entries.
82 When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
83 it is an implicit reference to a single member of such a string table
84 production. See [tokens](#tokens) for more information.
90 Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
91 Most Rust grammar rules are defined in terms of printable ASCII-range
92 codepoints, but a small number are defined in terms of Unicode properties or
93 explicit codepoint lists. [^inputformat]
95 [^inputformat]: Substitute definitions for the special Unicode productions are
96 provided to the grammar verifier, restricted to ASCII range, when verifying the
97 grammar in this document.
99 ## Special Unicode Productions
101 The following productions in the Rust grammar are defined in terms of Unicode
102 properties: `ident`, `non_null`, `non_star`, `non_eol`, `non_slash_or_star`,
103 `non_single_quote` and `non_double_quote`.
107 The `ident` production is any nonempty Unicode string of the following form:
109 - The first character has property `XID_start`
110 - The remaining characters have property `XID_continue`
112 that does _not_ occur in the set of [keywords](#keywords).
114 > **Note**: `XID_start` and `XID_continue` as character properties cover the
115 > character ranges used to form the more familiar C and Java language-family
118 ### Delimiter-restricted productions
120 Some productions are defined by exclusion of particular Unicode characters:
122 - `non_null` is any single Unicode character aside from `U+0000` (null)
123 - `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`)
124 - `non_star` is `non_null` restricted to exclude `U+002A` (`*`)
125 - `non_slash_or_star` is `non_null` restricted to exclude `U+002F` (`/`) and `U+002A` (`*`)
126 - `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`)
127 - `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`)
132 comment : block_comment | line_comment ;
133 block_comment : "/*" block_comment_body * "*/" ;
134 block_comment_body : [block_comment | character] * ;
135 line_comment : "//" non_eol * ;
138 **FIXME:** add doc grammar?
143 whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
144 whitespace : [ whitespace_char | comment ] + ;
150 simple_token : keyword | unop | binop ;
151 token : simple_token | ident | literal | symbol | whitespace token ;
156 <p id="keyword-table-marker"></p>
159 |----------|----------|----------|----------|--------|
160 | abstract | alignof | as | be | box |
161 | break | const | continue | crate | do |
162 | else | enum | extern | false | final |
163 | fn | for | if | impl | in |
164 | let | loop | match | mod | move |
165 | mut | offsetof | once | override | priv |
166 | proc | pub | pure | ref | return |
167 | sizeof | static | self | struct | super |
168 | true | trait | type | typeof | unsafe |
169 | unsized | use | virtual | where | while |
173 Each of these keywords has special meaning in its grammar, and all of them are
174 excluded from the `ident` rule.
180 literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit ] lit_suffix ?;
183 #### Character and string literals
186 char_lit : '\x27' char_body '\x27' ;
187 string_lit : '"' string_body * '"' | 'r' raw_string ;
189 char_body : non_single_quote
190 | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
192 string_body : non_double_quote
193 | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
194 raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
196 common_escape : '\x5c'
197 | 'n' | 'r' | 't' | '0'
199 unicode_escape : 'u' hex_digit 4
202 hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
203 | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
205 oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
206 dec_digit : '0' | nonzero_dec ;
207 nonzero_dec: '1' | '2' | '3' | '4'
208 | '5' | '6' | '7' | '8' | '9' ;
211 #### Byte and byte string literals
214 byte_lit : "b\x27" byte_body '\x27' ;
215 byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
217 byte_body : ascii_non_single_quote
218 | '\x5c' [ '\x27' | common_escape ] ;
220 byte_string_body : ascii_non_double_quote
221 | '\x5c' [ '\x22' | common_escape ] ;
222 raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
229 num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
230 | '0' [ [ dec_digit | '_' ] * float_suffix ?
231 | 'b' [ '1' | '0' | '_' ] +
232 | 'o' [ oct_digit | '_' ] +
233 | 'x' [ hex_digit | '_' ] + ] ;
235 float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
237 exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
238 dec_lit : [ dec_digit | '_' ] + ;
241 #### Boolean literals
243 **FIXME:** write grammar
245 The two values of the boolean type are written `true` and `false`.
251 | '#' | '[' | ']' | '(' | ')' | '{' | '}'
255 Symbols are a general class of printable [token](#tokens) that play structural
256 roles in a variety of grammar productions. They are catalogued here for
257 completeness as the set of remaining miscellaneous printable tokens that do not
258 otherwise appear as [unary operators](#unary-operator-expressions), [binary
259 operators](#binary-operator-expressions), or [keywords](#keywords).
264 expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
265 expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
268 type_path : ident [ type_path_tail ] + ;
269 type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
278 expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ;
279 macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
280 matcher : '(' matcher * ')' | '[' matcher * ']'
281 | '{' matcher * '}' | '$' ident ':' ident
282 | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
283 | non_special_token ;
284 transcriber : '(' transcriber * ')' | '[' transcriber * ']'
285 | '{' transcriber * '}' | '$' ident
286 | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
287 | non_special_token ;
290 # Crates and source files
292 **FIXME:** grammar? What production covers #![crate_id = "foo"] ?
294 # Items and attributes
301 item : mod_item | fn_item | type_item | struct_item | enum_item
302 | static_item | trait_item | impl_item | extern_block ;
312 mod_item : "mod" ident ( ';' | '{' mod '}' );
313 mod : [ view_item | item ] * ;
319 view_item : extern_crate_decl | use_decl ;
322 ##### Extern crate declarations
325 extern_crate_decl : "extern" "crate" crate_name
326 crate_name: ident | ( string_lit as ident )
329 ##### Use declarations
332 use_decl : "pub" ? "use" [ path "as" ident
335 path_glob : ident [ "::" [ path_glob
337 | '{' path_item [ ',' path_item ] * '}' ;
339 path_item : ident | "mod" ;
346 #### Generic functions
354 ##### Unsafe functions
362 #### Diverging functions
377 const_item : "const" ident ':' type '=' expr ';' ;
383 static_item : "static" ident ':' type '=' expr ';' ;
401 extern_block_item : "extern" '{' extern_block '}' ;
402 extern_block : [ foreign_fn ] * ;
405 ## Visibility and Privacy
409 ### Re-exporting and Visibility
416 attribute : "#!" ? '[' meta_item ']' ;
417 meta_item : ident [ '=' literal
418 | '(' meta_seq ')' ] ? ;
419 meta_seq : meta_item [ ',' meta_seq ] ? ;
422 # Statements and expressions
428 ### Declaration statements
432 A _declaration statement_ is one that introduces one or more *names* into the
433 enclosing statement block. The declared names may denote new slots or new
436 #### Item declarations
440 An _item declaration statement_ has a syntactic form identical to an
441 [item](#items) declaration within a module. Declaring an item — a
442 function, enumeration, structure, type, static, trait, implementation or module
443 — locally within a statement block is simply a way of restricting its
444 scope to a narrow region containing all of its uses; it is otherwise identical
445 in meaning to declaring the item outside the statement block.
447 #### Slot declarations
450 let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
451 init : [ '=' ] expr ;
454 ### Expression statements
462 #### Lvalues, rvalues and temporaries
466 #### Moved and copied types
468 **FIXME:** Do we want to capture this in the grammar as different productions?
470 ### Literal expressions
478 ### Tuple expressions
486 ### Structure expressions
489 struct_expr : expr_path '{' ident ':' expr
490 [ ',' ident ':' expr ] *
497 ### Block expressions
500 block_expr : '{' [ view_item ] *
501 [ stmt ';' | item ] *
505 ### Method-call expressions
508 method_call_expr : expr '.' ident paren_expr_list ;
511 ### Field expressions
514 field_expr : expr '.' ident ;
517 ### Array expressions
520 array_expr : '[' "mut" ? vec_elems? ']' ;
522 array_elems : [expr [',' expr]*] | [expr ',' ".." expr] ;
525 ### Index expressions
528 idx_expr : expr '[' expr ']' ;
531 ### Unary operator expressions
535 ### Binary operator expressions
538 binop_expr : expr binop expr ;
541 #### Arithmetic operators
545 #### Bitwise operators
549 #### Lazy boolean operators
553 #### Comparison operators
557 #### Type cast expressions
561 #### Assignment expressions
565 #### Compound assignment expressions
569 #### Operator precedence
571 The precedence of Rust binary operators is ordered as follows, going from
589 Operators at the same precedence level are evaluated left-to-right. [Unary
590 operators](#unary-operator-expressions) have the same precedence level and it
591 is stronger than any of the binary operators'.
593 ### Grouped expressions
596 paren_expr : '(' expr ')' ;
602 expr_list : [ expr [ ',' expr ]* ] ? ;
603 paren_expr_list : '(' expr_list ')' ;
604 call_expr : expr paren_expr_list ;
607 ### Lambda expressions
610 ident_list : [ ident [ ',' ident ]* ] ? ;
611 lambda_expr : '|' ident_list '|' expr ;
617 while_expr : "while" no_struct_literal_expr '{' block '}' ;
623 loop_expr : [ lifetime ':' ] "loop" '{' block '}';
626 ### Break expressions
629 break_expr : "break" [ lifetime ];
632 ### Continue expressions
635 continue_expr : "continue" [ lifetime ];
641 for_expr : "for" pat "in" no_struct_literal_expr '{' block '}' ;
647 if_expr : "if" no_struct_literal_expr '{' block '}'
650 else_tail : "else" [ if_expr | if_let_expr
654 ### Match expressions
657 match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
659 match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
661 match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
664 ### If let expressions
667 if_let_expr : "if" "let" pat '=' expr '{' block '}'
669 else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ;
675 while_let_expr : "while" "let" pat '=' expr '{' block '}' ;
678 ### Return expressions
681 return_expr : "return" expr ? ;
686 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
698 #### Machine-dependent integer types
710 ### Array, and Slice types
733 closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
734 [ ':' bound-list ] [ '->' type ]
735 procedure_type := 'proc' [ '<' lifetime-list '>' ] '(' arg-list ')'
736 [ ':' bound-list ] [ '->' type ]
737 lifetime-list := lifetime | lifetime ',' lifetime-list
738 arg-list := ident ':' type | ident ':' type ',' arg-list
739 bound-list := bound | bound '+' bound-list
740 bound := path | lifetime
757 **FIXME:** this this probably not relevant to the grammar...
759 # Memory and concurrency models
761 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
765 ### Memory allocation and lifetime
775 ### Communication between tasks