5 This document is the primary reference for the Rust programming language grammar. It
6 provides only one kind of material:
8 - Chapters that formally define the language grammar.
10 This document does not serve as an introduction to the language. Background
11 familiarity with the language is assumed. A separate [guide] is available to
12 help acquire such background.
14 This document also does not serve as a reference to the [standard] library
15 included in the language distribution. Those libraries are documented
16 separately by extracting documentation attributes from their source code. Many
17 of the features that one might expect to be language features are library
18 features in Rust, so what you're looking for may be there, not here.
21 [standard]: std/index.html
25 Rust's grammar is defined over Unicode codepoints, each conventionally denoted
26 `U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
27 confined to the ASCII range of Unicode, and is described in this document by a
28 dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
29 supported by common automated LL(k) parsing tools such as `llgen`, rather than
30 the dialect given in ISO 14977. The dialect can be defined self-referentially
35 rule : nonterminal ':' productionrule ';' ;
36 productionrule : production [ '|' production ] * ;
38 term : element repeats ;
39 element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
40 repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
45 - Whitespace in the grammar is ignored.
46 - Square brackets are used to group rules.
47 - `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
48 ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
49 Unicode codepoint `U+00QQ`.
50 - `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
51 - The `repeat` forms apply to the adjacent `element`, and are as follows:
52 - `?` means zero or one repetition
53 - `*` means zero or more repetitions
54 - `+` means one or more repetitions
55 - NUMBER trailing a repeat symbol gives a maximum repetition count
56 - NUMBER on its own gives an exact repetition count
58 This EBNF dialect should hopefully be familiar to many readers.
60 ## Unicode productions
62 A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
63 range. We define these productions in terms of character properties specified
64 in the Unicode standard, rather than in terms of ASCII-range codepoints. The
65 section [Special Unicode Productions](#special-unicode-productions) lists these
68 ## String table productions
70 Some rules in the grammar — notably [unary
71 operators](#unary-operator-expressions), [binary
72 operators](#binary-operator-expressions), and [keywords](#keywords) — are
73 given in a simplified form: as a listing of a table of unquoted, printable
74 whitespace-separated strings. These cases form a subset of the rules regarding
75 the [token](#tokens) rule, and are assumed to be the result of a
76 lexical-analysis phase feeding the parser, driven by a DFA, operating over the
77 disjunction of all such string table entries.
79 When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
80 it is an implicit reference to a single member of such a string table
81 production. See [tokens](#tokens) for more information.
87 Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
88 Most Rust grammar rules are defined in terms of printable ASCII-range
89 codepoints, but a small number are defined in terms of Unicode properties or
90 explicit codepoint lists. [^inputformat]
92 [^inputformat]: Substitute definitions for the special Unicode productions are
93 provided to the grammar verifier, restricted to ASCII range, when verifying the
94 grammar in this document.
96 ## Special Unicode Productions
98 The following productions in the Rust grammar are defined in terms of Unicode
99 properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and
104 The `ident` production is any nonempty Unicode string of
107 - The first character is in one of the following ranges `U+0041` to `U+005A`
108 ("A" to "Z"), `U+0061` to `U+007A` ("a" to "z"), or `U+005F` ("\_").
109 - The remaining characters are in the range `U+0030` to `U+0039` ("0" to "9"),
110 or any of the prior valid initial characters.
112 as long as the identifier does _not_ occur in the set of [keywords](#keywords).
114 ### Delimiter-restricted productions
116 Some productions are defined by exclusion of particular Unicode characters:
118 - `non_null` is any single Unicode character aside from `U+0000` (null)
119 - `non_eol` is any single Unicode character aside from `U+000A` (`'\n'`)
120 - `non_single_quote` is any single Unicode character aside from `U+0027` (`'`)
121 - `non_double_quote` is any single Unicode character aside from `U+0022` (`"`)
126 comment : block_comment | line_comment ;
127 block_comment : "/*" block_comment_body * "*/" ;
128 block_comment_body : [block_comment | character] * ;
129 line_comment : "//" non_eol * ;
132 **FIXME:** add doc grammar?
137 whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
138 whitespace : [ whitespace_char | comment ] + ;
144 simple_token : keyword | unop | binop ;
145 token : simple_token | ident | literal | symbol | whitespace token ;
150 <p id="keyword-table-marker"></p>
153 |----------|----------|----------|----------|----------|
154 | _ | abstract | alignof | as | become |
155 | box | break | const | continue | crate |
156 | do | else | enum | extern | false |
157 | final | fn | for | if | impl |
158 | in | let | loop | macro | match |
159 | mod | move | mut | offsetof | override |
160 | priv | proc | pub | pure | ref |
161 | return | Self | self | sizeof | static |
162 | struct | super | trait | true | type |
163 | typeof | unsafe | unsized | use | virtual |
164 | where | while | yield | | |
167 Each of these keywords has special meaning in its grammar, and all of them are
168 excluded from the `ident` rule.
170 Not all of these keywords are used by the language. Some of them were used
171 before Rust 1.0, and were left reserved once their implementations were
172 removed. Some of them were reserved before 1.0 to make space for possible
179 literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
182 The optional `lit_suffix` production is only used for certain numeric literals,
183 but is reserved for future extension. That is, the above gives the lexical
184 grammar, but a Rust parser will reject everything but the 12 special cases
185 mentioned in [Number literals](reference/tokens.html#number-literals) in the
188 #### Character and string literals
191 char_lit : '\x27' char_body '\x27' ;
192 string_lit : '"' string_body * '"' | 'r' raw_string ;
194 char_body : non_single_quote
195 | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
197 string_body : non_double_quote
198 | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
199 raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
201 common_escape : '\x5c'
202 | 'n' | 'r' | 't' | '0'
204 unicode_escape : 'u' '{' hex_digit+ 6 '}';
206 hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
207 | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
209 oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
210 dec_digit : '0' | nonzero_dec ;
211 nonzero_dec: '1' | '2' | '3' | '4'
212 | '5' | '6' | '7' | '8' | '9' ;
215 #### Byte and byte string literals
218 byte_lit : "b\x27" byte_body '\x27' ;
219 byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
221 byte_body : ascii_non_single_quote
222 | '\x5c' [ '\x27' | common_escape ] ;
224 byte_string_body : ascii_non_double_quote
225 | '\x5c' [ '\x22' | common_escape ] ;
226 raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
233 num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
234 | '0' [ [ dec_digit | '_' ] * float_suffix ?
235 | 'b' [ '1' | '0' | '_' ] +
236 | 'o' [ oct_digit | '_' ] +
237 | 'x' [ hex_digit | '_' ] + ] ;
239 float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
241 exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
242 dec_lit : [ dec_digit | '_' ] + ;
245 #### Boolean literals
248 bool_lit : [ "true" | "false" ] ;
251 The two values of the boolean type are written `true` and `false`.
257 | '#' | '[' | ']' | '(' | ')' | '{' | '}'
261 Symbols are a general class of printable [tokens](#tokens) that play structural
262 roles in a variety of grammar productions. They are cataloged here for
263 completeness as the set of remaining miscellaneous printable tokens that do not
264 otherwise appear as [unary operators](#unary-operator-expressions), [binary
265 operators](#binary-operator-expressions), or [keywords](#keywords).
270 expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
271 expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
274 type_path : ident [ type_path_tail ] + ;
275 type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
284 expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';'
285 | "macro_rules" '!' ident '{' macro_rule * '}' ;
286 macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
287 matcher : '(' matcher * ')' | '[' matcher * ']'
288 | '{' matcher * '}' | '$' ident ':' ident
289 | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
290 | non_special_token ;
291 transcriber : '(' transcriber * ')' | '[' transcriber * ']'
292 | '{' transcriber * '}' | '$' ident
293 | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
294 | non_special_token ;
297 # Crates and source files
299 **FIXME:** grammar? What production covers #![crate_id = "foo"] ?
301 # Items and attributes
308 item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
309 | const_item | static_item | trait_item | impl_item | extern_block_item ;
319 mod_item : "mod" ident ( ';' | '{' mod '}' );
320 mod : [ view_item | item ] * ;
326 view_item : extern_crate_decl | use_decl ';' ;
329 ##### Extern crate declarations
332 extern_crate_decl : "extern" "crate" crate_name
333 crate_name: ident | ( ident "as" ident )
336 ##### Use declarations
339 use_decl : vis ? "use" [ path "as" ident
342 path_glob : ident [ "::" [ path_glob
344 | '{' path_item [ ',' path_item ] * '}' ;
346 path_item : ident | "self" ;
353 #### Generic functions
361 ##### Unsafe functions
369 #### Diverging functions
388 const_item : "const" ident ':' type '=' expr ';' ;
394 static_item : "static" ident ':' type '=' expr ';' ;
412 extern_block_item : "extern" '{' extern_block '}' ;
413 extern_block : [ foreign_fn ] * ;
416 ## Visibility and Privacy
421 ### Re-exporting and Visibility
423 See [Use declarations](#use-declarations).
428 attribute : '#' '!' ? '[' meta_item ']' ;
429 meta_item : ident [ '=' literal
430 | '(' meta_seq ')' ] ? ;
431 meta_seq : meta_item [ ',' meta_seq ] ? ;
434 # Statements and expressions
439 stmt : decl_stmt | expr_stmt | ';' ;
442 ### Declaration statements
445 decl_stmt : item | let_decl ;
448 #### Item declarations
452 #### Variable declarations
455 let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
456 init : [ '=' ] expr ;
459 ### Expression statements
462 expr_stmt : expr ';' ;
468 expr : literal | path | tuple_expr | unit_expr | struct_expr
469 | block_expr | method_call_expr | field_expr | array_expr
470 | idx_expr | range_expr | unop_expr | binop_expr
471 | paren_expr | call_expr | lambda_expr | while_expr
472 | loop_expr | break_expr | continue_expr | for_expr
473 | if_expr | match_expr | if_let_expr | while_let_expr
477 #### Lvalues, rvalues and temporaries
481 #### Moved and copied types
483 **FIXME:** Do we want to capture this in the grammar as different productions?
485 ### Literal expressions
487 See [Literals](#literals).
493 ### Tuple expressions
496 tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
505 ### Structure expressions
508 struct_expr_field_init : ident | ident ':' expr ;
509 struct_expr : expr_path '{' struct_expr_field_init
510 [ ',' struct_expr_field_init ] *
517 ### Block expressions
520 block_expr : '{' [ stmt | item ] *
524 ### Method-call expressions
527 method_call_expr : expr '.' ident paren_expr_list ;
530 ### Field expressions
533 field_expr : expr '.' ident ;
536 ### Array expressions
539 array_expr : '[' "mut" ? array_elems? ']' ;
541 array_elems : [expr [',' expr]*] | [expr ';' expr] ;
544 ### Index expressions
547 idx_expr : expr '[' expr ']' ;
550 ### Range expressions
553 range_expr : expr ".." expr |
559 ### Unary operator expressions
562 unop_expr : unop expr ;
563 unop : '-' | '*' | '!' ;
566 ### Binary operator expressions
569 binop_expr : expr binop expr | type_cast_expr
570 | assignment_expr | compound_assignment_expr ;
571 binop : arith_op | bitwise_op | lazy_bool_op | comp_op
574 #### Arithmetic operators
577 arith_op : '+' | '-' | '*' | '/' | '%' ;
580 #### Bitwise operators
583 bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
586 #### Lazy boolean operators
589 lazy_bool_op : "&&" | "||" ;
592 #### Comparison operators
595 comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
598 #### Type cast expressions
601 type_cast_expr : value "as" type ;
604 #### Assignment expressions
607 assignment_expr : expr '=' expr ;
610 #### Compound assignment expressions
613 compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
616 ### Grouped expressions
619 paren_expr : '(' expr ')' ;
625 expr_list : [ expr [ ',' expr ]* ] ? ;
626 paren_expr_list : '(' expr_list ')' ;
627 call_expr : expr paren_expr_list ;
630 ### Lambda expressions
633 ident_list : [ ident [ ',' ident ]* ] ? ;
634 lambda_expr : '|' ident_list '|' expr ;
640 while_expr : [ lifetime ':' ] ? "while" no_struct_literal_expr '{' block '}' ;
646 loop_expr : [ lifetime ':' ] ? "loop" '{' block '}';
649 ### Break expressions
652 break_expr : "break" [ lifetime ] ?;
655 ### Continue expressions
658 continue_expr : "continue" [ lifetime ] ?;
664 for_expr : [ lifetime ':' ] ? "for" pat "in" no_struct_literal_expr '{' block '}' ;
670 if_expr : "if" no_struct_literal_expr '{' block '}'
673 else_tail : "else" [ if_expr | if_let_expr
677 ### Match expressions
680 match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
682 match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
684 match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
687 ### If let expressions
690 if_let_expr : "if" "let" pat '=' expr '{' block '}'
697 while_let_expr : [ lifetime ':' ] ? "while" "let" pat '=' expr '{' block '}' ;
700 ### Return expressions
703 return_expr : "return" expr ? ;
708 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
720 #### Machine-dependent integer types
732 ### Array, and Slice types
755 closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
756 [ ':' bound-list ] [ '->' type ]
757 lifetime-list := lifetime | lifetime ',' lifetime-list
758 arg-list := ident ':' type | ident ':' type ',' arg-list
776 ### Type parameter bounds
779 bound-list := bound | bound '+' bound-list '+' ?
780 bound := ty_bound | lt_bound
782 ty_bound := ty_bound_noparen | (ty_bound_noparen)
783 ty_bound_noparen := [?] [ for<lt_param_defs> ] simple_path
792 **FIXME:** this is probably not relevant to the grammar...
794 # Memory and concurrency models
796 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
800 ### Memory allocation and lifetime
810 ### Communication between threads