5 This document is the primary reference for the Rust programming language grammar. It
6 provides only one kind of material:
8 - Chapters that formally define the language grammar.
10 This document does not serve as an introduction to the language. Background
11 familiarity with the language is assumed. A separate [guide] is available to
12 help acquire such background.
14 This document also does not serve as a reference to the [standard] library
15 included in the language distribution. Those libraries are documented
16 separately by extracting documentation attributes from their source code. Many
17 of the features that one might expect to be language features are library
18 features in Rust, so what you're looking for may be there, not here.
21 [standard]: std/index.html
25 Rust's grammar is defined over Unicode codepoints, each conventionally denoted
26 `U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
27 confined to the ASCII range of Unicode, and is described in this document by a
28 dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
29 supported by common automated LL(k) parsing tools such as `llgen`, rather than
30 the dialect given in ISO 14977. The dialect can be defined self-referentially
35 rule : nonterminal ':' productionrule ';' ;
36 productionrule : production [ '|' production ] * ;
38 term : element repeats ;
39 element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
40 repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
45 - Whitespace in the grammar is ignored.
46 - Square brackets are used to group rules.
47 - `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
48 ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
49 Unicode codepoint `U+00QQ`.
50 - `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
51 - The `repeat` forms apply to the adjacent `element`, and are as follows:
52 - `?` means zero or one repetition
53 - `*` means zero or more repetitions
54 - `+` means one or more repetitions
55 - NUMBER trailing a repeat symbol gives a maximum repetition count
56 - NUMBER on its own gives an exact repetition count
58 This EBNF dialect should hopefully be familiar to many readers.
60 ## Unicode productions
62 A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
63 range. We define these productions in terms of character properties specified
64 in the Unicode standard, rather than in terms of ASCII-range codepoints. The
65 section [Special Unicode Productions](#special-unicode-productions) lists these
68 ## String table productions
70 Some rules in the grammar — notably [unary
71 operators](#unary-operator-expressions), [binary
72 operators](#binary-operator-expressions), and [keywords](#keywords) — are
73 given in a simplified form: as a listing of a table of unquoted, printable
74 whitespace-separated strings. These cases form a subset of the rules regarding
75 the [token](#tokens) rule, and are assumed to be the result of a
76 lexical-analysis phase feeding the parser, driven by a DFA, operating over the
77 disjunction of all such string table entries.
79 When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
80 it is an implicit reference to a single member of such a string table
81 production. See [tokens](#tokens) for more information.
87 Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
88 Most Rust grammar rules are defined in terms of printable ASCII-range
89 codepoints, but a small number are defined in terms of Unicode properties or
90 explicit codepoint lists. [^inputformat]
92 [^inputformat]: Substitute definitions for the special Unicode productions are
93 provided to the grammar verifier, restricted to ASCII range, when verifying the
94 grammar in this document.
96 ## Special Unicode Productions
98 The following productions in the Rust grammar are defined in terms of Unicode
99 properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and
104 The `ident` production is any nonempty Unicode[^non_ascii_idents] string of
107 [^non_ascii_idents]: Non-ASCII characters in identifiers are currently feature
108 gated. This is expected to improve soon.
110 - The first character has property `XID_start`
111 - The remaining characters have property `XID_continue`
113 that does _not_ occur in the set of [keywords](#keywords).
115 > **Note**: `XID_start` and `XID_continue` as character properties cover the
116 > character ranges used to form the more familiar C and Java language-family
119 ### Delimiter-restricted productions
121 Some productions are defined by exclusion of particular Unicode characters:
123 - `non_null` is any single Unicode character aside from `U+0000` (null)
124 - `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`)
125 - `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`)
126 - `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`)
131 comment : block_comment | line_comment ;
132 block_comment : "/*" block_comment_body * "*/" ;
133 block_comment_body : [block_comment | character] * ;
134 line_comment : "//" non_eol * ;
137 **FIXME:** add doc grammar?
142 whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
143 whitespace : [ whitespace_char | comment ] + ;
149 simple_token : keyword | unop | binop ;
150 token : simple_token | ident | literal | symbol | whitespace token ;
155 <p id="keyword-table-marker"></p>
158 |----------|----------|----------|----------|---------|
159 | abstract | alignof | as | become | box |
160 | break | const | continue | crate | do |
161 | else | enum | extern | false | final |
162 | fn | for | if | impl | in |
163 | let | loop | macro | match | mod |
164 | move | mut | offsetof | override | priv |
165 | proc | pub | pure | ref | return |
166 | Self | self | sizeof | static | struct |
167 | super | trait | true | type | typeof |
168 | unsafe | unsized | use | virtual | where |
169 | while | yield | | | |
172 Each of these keywords has special meaning in its grammar, and all of them are
173 excluded from the `ident` rule.
175 Not all of these keywords are used by the language. Some of them were used
176 before Rust 1.0, and were left reserved once their implementations were
177 removed. Some of them were reserved before 1.0 to make space for possible
184 literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
187 The optional `lit_suffix` production is only used for certain numeric literals,
188 but is reserved for future extension. That is, the above gives the lexical
189 grammar, but a Rust parser will reject everything but the 12 special cases
190 mentioned in [Number literals](reference/tokens.html#number-literals) in the
193 #### Character and string literals
196 char_lit : '\x27' char_body '\x27' ;
197 string_lit : '"' string_body * '"' | 'r' raw_string ;
199 char_body : non_single_quote
200 | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
202 string_body : non_double_quote
203 | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
204 raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
206 common_escape : '\x5c'
207 | 'n' | 'r' | 't' | '0'
209 unicode_escape : 'u' '{' hex_digit+ 6 '}';
211 hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
212 | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
214 oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
215 dec_digit : '0' | nonzero_dec ;
216 nonzero_dec: '1' | '2' | '3' | '4'
217 | '5' | '6' | '7' | '8' | '9' ;
220 #### Byte and byte string literals
223 byte_lit : "b\x27" byte_body '\x27' ;
224 byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
226 byte_body : ascii_non_single_quote
227 | '\x5c' [ '\x27' | common_escape ] ;
229 byte_string_body : ascii_non_double_quote
230 | '\x5c' [ '\x22' | common_escape ] ;
231 raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
238 num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
239 | '0' [ [ dec_digit | '_' ] * float_suffix ?
240 | 'b' [ '1' | '0' | '_' ] +
241 | 'o' [ oct_digit | '_' ] +
242 | 'x' [ hex_digit | '_' ] + ] ;
244 float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
246 exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
247 dec_lit : [ dec_digit | '_' ] + ;
250 #### Boolean literals
253 bool_lit : [ "true" | "false" ] ;
256 The two values of the boolean type are written `true` and `false`.
262 | '#' | '[' | ']' | '(' | ')' | '{' | '}'
266 Symbols are a general class of printable [tokens](#tokens) that play structural
267 roles in a variety of grammar productions. They are cataloged here for
268 completeness as the set of remaining miscellaneous printable tokens that do not
269 otherwise appear as [unary operators](#unary-operator-expressions), [binary
270 operators](#binary-operator-expressions), or [keywords](#keywords).
275 expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
276 expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
279 type_path : ident [ type_path_tail ] + ;
280 type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
289 expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';'
290 | "macro_rules" '!' ident '{' macro_rule * '}' ;
291 macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
292 matcher : '(' matcher * ')' | '[' matcher * ']'
293 | '{' matcher * '}' | '$' ident ':' ident
294 | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
295 | non_special_token ;
296 transcriber : '(' transcriber * ')' | '[' transcriber * ']'
297 | '{' transcriber * '}' | '$' ident
298 | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
299 | non_special_token ;
302 # Crates and source files
304 **FIXME:** grammar? What production covers #![crate_id = "foo"] ?
306 # Items and attributes
313 item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
314 | const_item | static_item | trait_item | impl_item | extern_block_item ;
324 mod_item : "mod" ident ( ';' | '{' mod '}' );
325 mod : [ view_item | item ] * ;
331 view_item : extern_crate_decl | use_decl ';' ;
334 ##### Extern crate declarations
337 extern_crate_decl : "extern" "crate" crate_name
338 crate_name: ident | ( ident "as" ident )
341 ##### Use declarations
344 use_decl : vis ? "use" [ path "as" ident
347 path_glob : ident [ "::" [ path_glob
349 | '{' path_item [ ',' path_item ] * '}' ;
351 path_item : ident | "self" ;
358 #### Generic functions
366 ##### Unsafe functions
374 #### Diverging functions
393 const_item : "const" ident ':' type '=' expr ';' ;
399 static_item : "static" ident ':' type '=' expr ';' ;
417 extern_block_item : "extern" '{' extern_block '}' ;
418 extern_block : [ foreign_fn ] * ;
421 ## Visibility and Privacy
426 ### Re-exporting and Visibility
428 See [Use declarations](#use-declarations).
433 attribute : '#' '!' ? '[' meta_item ']' ;
434 meta_item : ident [ '=' literal
435 | '(' meta_seq ')' ] ? ;
436 meta_seq : meta_item [ ',' meta_seq ] ? ;
439 # Statements and expressions
444 stmt : decl_stmt | expr_stmt | ';' ;
447 ### Declaration statements
450 decl_stmt : item | let_decl ;
453 #### Item declarations
457 #### Variable declarations
460 let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
461 init : [ '=' ] expr ;
464 ### Expression statements
467 expr_stmt : expr ';' ;
473 expr : literal | path | tuple_expr | unit_expr | struct_expr
474 | block_expr | method_call_expr | field_expr | array_expr
475 | idx_expr | range_expr | unop_expr | binop_expr
476 | paren_expr | call_expr | lambda_expr | while_expr
477 | loop_expr | break_expr | continue_expr | for_expr
478 | if_expr | match_expr | if_let_expr | while_let_expr
482 #### Lvalues, rvalues and temporaries
486 #### Moved and copied types
488 **FIXME:** Do we want to capture this in the grammar as different productions?
490 ### Literal expressions
492 See [Literals](#literals).
498 ### Tuple expressions
501 tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
510 ### Structure expressions
513 struct_expr_field_init : ident | ident ':' expr ;
514 struct_expr : expr_path '{' struct_expr_field_init
515 [ ',' struct_expr_field_init ] *
522 ### Block expressions
525 block_expr : '{' [ stmt | item ] *
529 ### Method-call expressions
532 method_call_expr : expr '.' ident paren_expr_list ;
535 ### Field expressions
538 field_expr : expr '.' ident ;
541 ### Array expressions
544 array_expr : '[' "mut" ? array_elems? ']' ;
546 array_elems : [expr [',' expr]*] | [expr ';' expr] ;
549 ### Index expressions
552 idx_expr : expr '[' expr ']' ;
555 ### Range expressions
558 range_expr : expr ".." expr |
564 ### Unary operator expressions
567 unop_expr : unop expr ;
568 unop : '-' | '*' | '!' ;
571 ### Binary operator expressions
574 binop_expr : expr binop expr | type_cast_expr
575 | assignment_expr | compound_assignment_expr ;
576 binop : arith_op | bitwise_op | lazy_bool_op | comp_op
579 #### Arithmetic operators
582 arith_op : '+' | '-' | '*' | '/' | '%' ;
585 #### Bitwise operators
588 bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
591 #### Lazy boolean operators
594 lazy_bool_op : "&&" | "||" ;
597 #### Comparison operators
600 comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
603 #### Type cast expressions
606 type_cast_expr : value "as" type ;
609 #### Assignment expressions
612 assignment_expr : expr '=' expr ;
615 #### Compound assignment expressions
618 compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
621 ### Grouped expressions
624 paren_expr : '(' expr ')' ;
630 expr_list : [ expr [ ',' expr ]* ] ? ;
631 paren_expr_list : '(' expr_list ')' ;
632 call_expr : expr paren_expr_list ;
635 ### Lambda expressions
638 ident_list : [ ident [ ',' ident ]* ] ? ;
639 lambda_expr : '|' ident_list '|' expr ;
645 while_expr : [ lifetime ':' ] ? "while" no_struct_literal_expr '{' block '}' ;
651 loop_expr : [ lifetime ':' ] ? "loop" '{' block '}';
654 ### Break expressions
657 break_expr : "break" [ lifetime ] ?;
660 ### Continue expressions
663 continue_expr : "continue" [ lifetime ] ?;
669 for_expr : [ lifetime ':' ] ? "for" pat "in" no_struct_literal_expr '{' block '}' ;
675 if_expr : "if" no_struct_literal_expr '{' block '}'
678 else_tail : "else" [ if_expr | if_let_expr
682 ### Match expressions
685 match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
687 match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
689 match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
692 ### If let expressions
695 if_let_expr : "if" "let" pat '=' expr '{' block '}'
702 while_let_expr : [ lifetime ':' ] ? "while" "let" pat '=' expr '{' block '}' ;
705 ### Return expressions
708 return_expr : "return" expr ? ;
713 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
725 #### Machine-dependent integer types
737 ### Array, and Slice types
760 closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
761 [ ':' bound-list ] [ '->' type ]
762 lifetime-list := lifetime | lifetime ',' lifetime-list
763 arg-list := ident ':' type | ident ':' type ',' arg-list
781 ### Type parameter bounds
784 bound-list := bound | bound '+' bound-list '+' ?
785 bound := ty_bound | lt_bound
787 ty_bound := ty_bound_noparen | (ty_bound_noparen)
788 ty_bound_noparen := [?] [ for<lt_param_defs> ] simple_path
797 **FIXME:** this is probably not relevant to the grammar...
799 # Memory and concurrency models
801 **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
805 ### Memory allocation and lifetime
815 ### Communication between threads