1 # **This is a work in progress**
7 This document is the primary reference for the Rust programming language grammar. It
8 provides only one kind of material:
10 - Chapters that formally define the language grammar and, for each
13 This document does not serve as an introduction to the language. Background
14 familiarity with the language is assumed. A separate [guide] is available to
15 help acquire such background familiarity.
17 This document also does not serve as a reference to the [standard] library
18 included in the language distribution. Those libraries are documented
19 separately by extracting documentation attributes from their source code. Many
20 of the features that one might expect to be language features are library
21 features in Rust, so what you're looking for may be there, not here.
24 [standard]: std/index.html
28 Rust's grammar is defined over Unicode codepoints, each conventionally denoted
29 `U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
30 confined to the ASCII range of Unicode, and is described in this document by a
31 dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
32 supported by common automated LL(k) parsing tools such as `llgen`, rather than
33 the dialect given in ISO 14977. The dialect can be defined self-referentially
38 rule : nonterminal ':' productionrule ';' ;
39 productionrule : production [ '|' production ] * ;
41 term : element repeats ;
42 element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
43 repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
48 - Whitespace in the grammar is ignored.
49 - Square brackets are used to group rules.
50 - `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
51 ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
52 Unicode codepoint `U+00QQ`.
53 - `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
54 - The `repeat` forms apply to the adjacent `element`, and are as follows:
55 - `?` means zero or one repetition
56 - `*` means zero or more repetitions
57 - `+` means one or more repetitions
58 - NUMBER trailing a repeat symbol gives a maximum repetition count
59 - NUMBER on its own gives an exact repetition count
61 This EBNF dialect should hopefully be familiar to many readers.
63 ## Unicode productions
65 A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
66 range. We define these productions in terms of character properties specified
67 in the Unicode standard, rather than in terms of ASCII-range codepoints. The
68 section [Special Unicode Productions](#special-unicode-productions) lists these
71 ## String table productions
73 Some rules in the grammar — notably [unary
74 operators](#unary-operator-expressions), [binary
75 operators](#binary-operator-expressions), and [keywords](#keywords) — are
76 given in a simplified form: as a listing of a table of unquoted, printable
77 whitespace-separated strings. These cases form a subset of the rules regarding
78 the [token](#tokens) rule, and are assumed to be the result of a
79 lexical-analysis phase feeding the parser, driven by a DFA, operating over the
80 disjunction of all such string table entries.
82 When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
83 it is an implicit reference to a single member of such a string table
84 production. See [tokens](#tokens) for more information.
90 Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
91 Most Rust grammar rules are defined in terms of printable ASCII-range
92 codepoints, but a small number are defined in terms of Unicode properties or
93 explicit codepoint lists. [^inputformat]
95 [^inputformat]: Substitute definitions for the special Unicode productions are
96 provided to the grammar verifier, restricted to ASCII range, when verifying the
97 grammar in this document.
99 ## Special Unicode Productions
101 The following productions in the Rust grammar are defined in terms of Unicode
102 properties: `ident`, `non_null`, `non_star`, `non_eol`, `non_slash_or_star`,
103 `non_single_quote` and `non_double_quote`.
107 The `ident` production is any nonempty Unicode string of the following form:
109 - The first character has property `XID_start`
110 - The remaining characters have property `XID_continue`
112 that does _not_ occur in the set of [keywords](#keywords).
114 > **Note**: `XID_start` and `XID_continue` as character properties cover the
115 > character ranges used to form the more familiar C and Java language-family
118 ### Delimiter-restricted productions
120 Some productions are defined by exclusion of particular Unicode characters:
122 - `non_null` is any single Unicode character aside from `U+0000` (null)
123 - `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`)
124 - `non_star` is `non_null` restricted to exclude `U+002A` (`*`)
125 - `non_slash_or_star` is `non_null` restricted to exclude `U+002F` (`/`) and `U+002A` (`*`)
126 - `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`)
127 - `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`)
132 comment : block_comment | line_comment ;
133 block_comment : "/*" block_comment_body * "*/" ;
134 block_comment_body : [block_comment | character] * ;
135 line_comment : "//" non_eol * ;
138 **FIXME:** add doc grammar?
143 whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
144 whitespace : [ whitespace_char | comment ] + ;
150 simple_token : keyword | unop | binop ;
151 token : simple_token | ident | literal | symbol | whitespace token ;
156 <p id="keyword-table-marker"></p>
159 |----------|----------|----------|----------|--------|
160 | abstract | alignof | as | be | box |
161 | break | const | continue | crate | do |
162 | else | enum | extern | false | final |
163 | fn | for | if | impl | in |
164 | let | loop | match | mod | move |
165 | mut | offsetof | once | override | priv |
166 | proc | pub | pure | ref | return |
167 | sizeof | static | self | struct | super |
168 | true | trait | type | typeof | unsafe |
169 | unsized | use | virtual | where | while |
173 Each of these keywords has special meaning in its grammar, and all of them are
174 excluded from the `ident` rule.
180 literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit ] lit_suffix ?;
183 #### Character and string literals
186 char_lit : '\x27' char_body '\x27' ;
187 string_lit : '"' string_body * '"' | 'r' raw_string ;
189 char_body : non_single_quote
190 | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
192 string_body : non_double_quote
193 | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
194 raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
196 common_escape : '\x5c'
197 | 'n' | 'r' | 't' | '0'
199 unicode_escape : 'u' hex_digit 4
202 hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
203 | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
205 oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
206 dec_digit : '0' | nonzero_dec ;
207 nonzero_dec: '1' | '2' | '3' | '4'
208 | '5' | '6' | '7' | '8' | '9' ;
211 #### Byte and byte string literals
214 byte_lit : "b\x27" byte_body '\x27' ;
215 byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
217 byte_body : ascii_non_single_quote
218 | '\x5c' [ '\x27' | common_escape ] ;
220 byte_string_body : ascii_non_double_quote
221 | '\x5c' [ '\x22' | common_escape ] ;
222 raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
229 num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
230 | '0' [ [ dec_digit | '_' ] * float_suffix ?
231 | 'b' [ '1' | '0' | '_' ] +
232 | 'o' [ oct_digit | '_' ] +
233 | 'x' [ hex_digit | '_' ] + ] ;
235 float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
237 exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
238 dec_lit : [ dec_digit | '_' ] + ;
241 #### Boolean literals
243 **FIXME:** write grammar
245 The two values of the boolean type are written `true` and `false`.
251 | '#' | '[' | ']' | '(' | ')' | '{' | '}'
255 Symbols are a general class of printable [token](#tokens) that play structural
256 roles in a variety of grammar productions. They are catalogued here for
257 completeness as the set of remaining miscellaneous printable tokens that do not
258 otherwise appear as [unary operators](#unary-operator-expressions), [binary
259 operators](#binary-operator-expressions), or [keywords](#keywords).
264 expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
265 expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
268 type_path : ident [ type_path_tail ] + ;
269 type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
278 expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ;
279 macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
280 matcher : '(' matcher * ')' | '[' matcher * ']'
281 | '{' matcher * '}' | '$' ident ':' ident
282 | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
283 | non_special_token ;
284 transcriber : '(' transcriber * ')' | '[' transcriber * ']'
285 | '{' transcriber * '}' | '$' ident
286 | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
287 | non_special_token ;
290 # Crates and source files
292 **FIXME:** grammar? What production covers #![crate_id = "foo"] ?
294 # Items and attributes
301 item : mod_item | fn_item | type_item | struct_item | enum_item
302 | static_item | trait_item | impl_item | extern_block ;
312 mod_item : "mod" ident ( ';' | '{' mod '}' );
313 mod : [ view_item | item ] * ;
319 view_item : extern_crate_decl | use_decl ;
322 ##### Extern crate declarations
325 extern_crate_decl : "extern" "crate" crate_name
326 crate_name: ident | ( string_lit as ident )
329 ##### Use declarations
332 use_decl : "pub" ? "use" [ path "as" ident
335 path_glob : ident [ "::" [ path_glob
337 | '{' path_item [ ',' path_item ] * '}' ;
339 path_item : ident | "mod" ;
346 #### Generic functions
354 ##### Unsafe functions
362 #### Diverging functions
377 const_item : "const" ident ':' type '=' expr ';' ;
383 static_item : "static" ident ':' type '=' expr ';' ;
401 extern_block_item : "extern" '{' extern_block '}' ;
402 extern_block : [ foreign_fn ] * ;
405 ## Visibility and Privacy
409 ### Re-exporting and Visibility
416 attribute : "#!" ? '[' meta_item ']' ;
417 meta_item : ident [ '=' literal
418 | '(' meta_seq ')' ] ? ;
419 meta_seq : meta_item [ ',' meta_seq ] ? ;
422 # Statements and expressions
428 ### Declaration statements
432 A _declaration statement_ is one that introduces one or more *names* into the
433 enclosing statement block. The declared names may denote new slots or new
436 #### Item declarations
440 An _item declaration statement_ has a syntactic form identical to an
441 [item](#items) declaration within a module. Declaring an item — a
442 function, enumeration, structure, type, static, trait, implementation or module
443 — locally within a statement block is simply a way of restricting its
444 scope to a narrow region containing all of its uses; it is otherwise identical
445 in meaning to declaring the item outside the statement block.
447 #### Slot declarations
450 let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
451 init : [ '=' ] expr ;
454 ### Expression statements
462 #### Lvalues, rvalues and temporaries
466 #### Moved and copied types
468 **FIXME:** Do we want to capture this in the grammar as different productions?
470 ### Literal expressions
478 ### Tuple expressions
486 ### Structure expressions
489 struct_expr : expr_path '{' ident ':' expr
490 [ ',' ident ':' expr ] *
497 ### Block expressions
500 block_expr : '{' [ view_item ] *
501 [ stmt ';' | item ] *
505 ### Method-call expressions
508 method_call_expr : expr '.' ident paren_expr_list ;
511 ### Field expressions
514 field_expr : expr '.' ident ;
517 ### Array expressions
520 array_expr : '[' "mut" ? vec_elems? ']' ;
522 array_elems : [expr [',' expr]*] | [expr ',' ".." expr] ;
525 ### Index expressions
528 idx_expr : expr '[' expr ']' ;
531 ### Unary operator expressions
535 ### Binary operator expressions
538 binop_expr : expr binop expr ;
541 #### Arithmetic operators
545 #### Bitwise operators
549 #### Lazy boolean operators
553 #### Comparison operators
557 #### Type cast expressions
561 #### Assignment expressions
565 #### Compound assignment expressions
569 #### Operator precedence
571 The precedence of Rust binary operators is ordered as follows, going from
589 Operators at the same precedence level are evaluated left-to-right. [Unary
590 operators](#unary-operator-expressions) have the same precedence level and it
591 is stronger than any of the binary operators'.
593 ### Grouped expressions
596 paren_expr : '(' expr ')' ;
602 expr_list : [ expr [ ',' expr ]* ] ? ;
603 paren_expr_list : '(' expr_list ')' ;
604 call_expr : expr paren_expr_list ;
607 ### Lambda expressions
610 ident_list : [ ident [ ',' ident ]* ] ? ;
611 lambda_expr : '|' ident_list '|' expr ;
617 while_expr : "while" no_struct_literal_expr '{' block '}' ;
623 loop_expr : [ lifetime ':' ] "loop" '{' block '}';
626 ### Break expressions
629 break_expr : "break" [ lifetime ];
632 ### Continue expressions
635 continue_expr : "continue" [ lifetime ];
641 for_expr : "for" pat "in" no_struct_literal_expr '{' block '}' ;
647 if_expr : "if" no_struct_literal_expr '{' block '}'
650 else_tail : "else" [ if_expr | if_let_expr
654 ### Match expressions
657 match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
659 match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
661 match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
664 ### If let expressions
667 if_let_expr : "if" "let" pat '=' expr '{' block '}'
669 else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ;
675 while_let_expr : "while" "let" pat '=' expr '{' block '}' ;
678 ### Return expressions
681 return_expr : "return" expr ? ;
688 Every slot, item and value in a Rust program has a type. The _type_ of a
689 *value* defines the interpretation of the memory holding it.
691 Built-in types and type-constructors are tightly integrated into the language,
692 in nontrivial ways that are not possible to emulate in user-defined types.
693 User-defined types have limited capabilities.
697 The primitive types are the following:
699 * The "unit" type `()`, having the single "unit" value `()` (occasionally called
701 * The boolean type `bool` with values `true` and `false`.
703 * The machine-dependent integer and floating-point types.
705 [^unittype]: The "unit" value `()` is *not* a sentinel "null pointer" value for
706 reference slots; the "unit" type is the implicit return type from functions
707 otherwise lacking a return type, and can be used in other contexts (such as
708 message-sending or type-parametric code) as a zero-size type.]
712 The machine types are the following:
714 * The unsigned word types `u8`, `u16`, `u32` and `u64`, with values drawn from
715 the integer intervals [0, 2^8 - 1], [0, 2^16 - 1], [0, 2^32 - 1] and
716 [0, 2^64 - 1] respectively.
718 * The signed two's complement word types `i8`, `i16`, `i32` and `i64`, with
719 values drawn from the integer intervals [-(2^(7)), 2^7 - 1],
720 [-(2^(15)), 2^15 - 1], [-(2^(31)), 2^31 - 1], [-(2^(63)), 2^63 - 1]
723 * The IEEE 754-2008 `binary32` and `binary64` floating-point types: `f32` and
726 #### Machine-dependent integer types
728 The `uint` type is an unsigned integer type with the same number of bits as the
729 platform's pointer type. It can represent every memory address in the process.
731 The `int` type is a signed integer type with the same number of bits as the
732 platform's pointer type. The theoretical upper bound on object and array size
733 is the maximum `int` value. This ensures that `int` can be used to calculate
734 differences between pointers into an object or array and can address every byte
735 within an object along with one byte past the end.
739 The types `char` and `str` hold textual data.
741 A value of type `char` is a [Unicode scalar value](
742 http://www.unicode.org/glossary/#unicode_scalar_value) (ie. a code point that
743 is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to
744 0xD7FF or 0xE000 to 0x10FFFF range. A `[char]` array is effectively an UCS-4 /
747 A value of type `str` is a Unicode string, represented as an array of 8-bit
748 unsigned bytes holding a sequence of UTF-8 codepoints. Since `str` is of
749 unknown size, it is not a _first class_ type, but can only be instantiated
750 through a pointer type, such as `&str` or `String`.
754 A tuple *type* is a heterogeneous product of other types, called the *elements*
755 of the tuple. It has no nominal name and is instead structurally typed.
757 Tuple types and values are denoted by listing the types or values of their
758 elements, respectively, in a parenthesized, comma-separated list.
760 Because tuple elements don't have a name, they can only be accessed by
763 The members of a tuple are laid out in memory contiguously, in order specified
766 An example of a tuple type and its use:
769 type Pair<'a> = (int, &'a str);
770 let p: Pair<'static> = (10, "hello");
772 assert!(b != "world");
775 ### Array, and Slice types
777 Rust has two different types for a list of items:
779 * `[T ..N]`, an 'array'
782 An array has a fixed size, and can be allocated on either the stack or the
785 A slice is a 'view' into an array. It doesn't own the data it points
788 An example of each kind:
791 let vec: Vec<int> = vec![1, 2, 3];
792 let arr: [int, ..3] = [1, 2, 3];
793 let s: &[int] = vec.as_slice();
796 As you can see, the `vec!` macro allows you to create a `Vec<T>` easily. The
797 `vec!` macro is also part of the standard library, rather than the language.
799 All in-bounds elements of arrays, and slices are always initialized, and access
800 to an array or slice is always bounds-checked.
804 A `struct` *type* is a heterogeneous product of other types, called the
805 *fields* of the type.[^structtype]
807 [^structtype]: `struct` types are analogous `struct` types in C,
808 the *record* types of the ML family,
809 or the *structure* types of the Lisp family.
811 New instances of a `struct` can be constructed with a [struct
812 expression](#structure-expressions).
814 The memory layout of a `struct` is undefined by default to allow for compiler
815 optimizations like field reordering, but it can be fixed with the
816 `#[repr(...)]` attribute. In either case, fields may be given in any order in
817 a corresponding struct *expression*; the resulting `struct` value will always
818 have the same memory layout.
820 The fields of a `struct` may be qualified by [visibility
821 modifiers](#re-exporting-and-visibility), to allow access to data in a
822 structure outside a module.
824 A _tuple struct_ type is just like a structure type, except that the fields are
827 A _unit-like struct_ type is like a structure type, except that it has no
828 fields. The one value constructed by the associated [structure
829 expression](#structure-expressions) is the only value that inhabits such a
834 An *enumerated type* is a nominal, heterogeneous disjoint union type, denoted
835 by the name of an [`enum` item](#enumerations). [^enumtype]
837 [^enumtype]: The `enum` type is analogous to a `data` constructor declaration in
838 ML, or a *pick ADT* in Limbo.
840 An [`enum` item](#enumerations) declares both the type and a number of *variant
841 constructors*, each of which is independently named and takes an optional tuple
844 New instances of an `enum` can be constructed by calling one of the variant
845 constructors, in a [call expression](#call-expressions).
847 Any `enum` value consumes as much memory as the largest variant constructor for
848 its corresponding `enum` type.
850 Enum types cannot be denoted *structurally* as types, but must be denoted by
851 named reference to an [`enum` item](#enumerations).
855 Nominal types — [enumerations](#enumerated-types) and
856 [structures](#structure-types) — may be recursive. That is, each `enum`
857 constructor or `struct` field may refer, directly or indirectly, to the
858 enclosing `enum` or `struct` type itself. Such recursion has restrictions:
860 * Recursive types must include a nominal type in the recursion
861 (not mere [type definitions](#type-definitions),
862 or other structural types such as [arrays](#array,-and-slice-types) or [tuples](#tuple-types)).
863 * A recursive `enum` item must have at least one non-recursive constructor
864 (in order to give the recursion a basis case).
865 * The size of a recursive type must be finite;
866 in other words the recursive fields of the type must be [pointer types](#pointer-types).
867 * Recursive type definitions can cross module boundaries, but not module *visibility* boundaries,
868 or crate boundaries (in order to simplify the module system and type checker).
870 An example of a *recursive* type and its use:
875 Cons(T, Box<List<T>>)
878 let a: List<int> = List::Cons(7, box List::Cons(13, box List::Nil));
883 All pointers in Rust are explicit first-class values. They can be copied,
884 stored into data structures, and returned from functions. There are two
885 varieties of pointer in Rust:
888 : These point to memory _owned by some other value_.
889 A reference type is written `&type` for some lifetime-variable `f`,
890 or just `&'a type` when you need an explicit lifetime.
891 Copying a reference is a "shallow" operation:
892 it involves only copying the pointer itself.
893 Releasing a reference typically has no effect on the value it points to,
894 with the exception of temporary values, which are released when the last
895 reference to them is released.
898 : Raw pointers are pointers without safety or liveness guarantees.
899 Raw pointers are written as `*const T` or `*mut T`,
900 for example `*const int` means a raw pointer to an integer.
901 Copying or dropping a raw pointer has no effect on the lifecycle of any
902 other value. Dereferencing a raw pointer or converting it to any other
903 pointer type is an [`unsafe` operation](#unsafe-functions).
904 Raw pointers are generally discouraged in Rust code;
905 they exist to support interoperability with foreign code,
906 and writing performance-critical or low-level functions.
908 The standard library contains additional 'smart pointer' types beyond references
913 The function type constructor `fn` forms new function types. A function type
914 consists of a possibly-empty set of function-type modifiers (such as `unsafe`
915 or `extern`), a sequence of input types and an output type.
917 An example of a `fn` type:
920 fn add(x: int, y: int) -> int {
924 let mut x = add(5,7);
926 type Binop<'a> = |int,int|: 'a -> int;
934 closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
935 [ ':' bound-list ] [ '->' type ]
936 procedure_type := 'proc' [ '<' lifetime-list '>' ] '(' arg-list ')'
937 [ ':' bound-list ] [ '->' type ]
938 lifetime-list := lifetime | lifetime ',' lifetime-list
939 arg-list := ident ':' type | ident ':' type ',' arg-list
940 bound-list := bound | bound '+' bound-list
941 bound := path | lifetime
944 The type of a closure mapping an input of type `A` to an output of type `B` is
945 `|A| -> B`. A closure with no arguments or return values has type `||`.
946 Similarly, a procedure mapping `A` to `B` is `proc(A) -> B` and a no-argument
947 and no-return value closure has type `proc()`.
949 An example of creating and calling a closure:
952 let captured_var = 10i;
954 let closure_no_args = || println!("captured_var={}", captured_var);
956 let closure_args = |arg: int| -> int {
957 println!("captured_var={}, arg={}", captured_var, arg);
958 arg // Note lack of semicolon after 'arg'
961 fn call_closure(c1: ||, c2: |int| -> int) {
966 call_closure(closure_no_args, closure_args);
970 Unlike closures, procedures may only be invoked once, but own their
971 environment, and are allowed to move out of their environment. Procedures are
972 allocated on the heap (unlike closures). An example of creating and calling a
976 let string = "Hello".to_string();
978 // Creates a new procedure, passing it to the `spawn` function.
980 println!("{} world!", string);
983 // the variable `string` has been moved into the previous procedure, so it is
987 // Create an invoke a procedure. Note that the procedure is *moved* when
988 // invoked, so it cannot be invoked again.
989 let f = proc(n: int) { n + 22 };
990 println!("answer: {}", f(20));
996 Every trait item (see [traits](#traits)) defines a type with the same name as
997 the trait. This type is called the _object type_ of the trait. Object types
998 permit "late binding" of methods, dispatched using _virtual method tables_
999 ("vtables"). Whereas most calls to trait methods are "early bound" (statically
1000 resolved) to specific implementations at compile time, a call to a method on an
1001 object type is only resolved to a vtable entry at compile time. The actual
1002 implementation for each vtable entry can vary on an object-by-object basis.
1004 Given a pointer-typed expression `E` of type `&T` or `Box<T>`, where `T`
1005 implements trait `R`, casting `E` to the corresponding pointer type `&R` or
1006 `Box<R>` results in a value of the _object type_ `R`. This result is
1007 represented as a pair of pointers: the vtable pointer for the `T`
1008 implementation of `R`, and the pointer value of `E`.
1010 An example of an object type:
1014 fn stringify(&self) -> String;
1017 impl Printable for int {
1018 fn stringify(&self) -> String { self.to_string() }
1021 fn print(a: Box<Printable>) {
1022 println!("{}", a.stringify());
1026 print(box 10i as Box<Printable>);
1030 In this example, the trait `Printable` occurs as an object type in both the
1031 type signature of `print`, and the cast expression in `main`.
1035 Within the body of an item that has type parameter declarations, the names of
1036 its type parameters are types:
1039 fn map<A: Clone, B: Clone>(f: |A| -> B, xs: &[A]) -> Vec<B> {
1043 let first: B = f(xs[0].clone());
1044 let mut rest: Vec<B> = map(f, xs.slice(1, xs.len()));
1045 rest.insert(0, first);
1050 Here, `first` has type `B`, referring to `map`'s `B` type parameter; and `rest`
1051 has type `Vec<B>`, a vector type with element type `B`.
1055 The special type `self` has a meaning within methods inside an impl item. It
1056 refers to the type of the implicit `self` argument. For example, in:
1060 fn make_string(&self) -> String;
1063 impl Printable for String {
1064 fn make_string(&self) -> String {
1070 `self` refers to the value of type `String` that is the receiver for a call to
1071 the method `make_string`.
1075 Types in Rust are categorized into kinds, based on various properties of the
1076 components of the type. The kinds are:
1079 : Types of this kind can be safely sent between tasks.
1080 This kind includes scalars, boxes, procs, and
1081 structural types containing only other owned types.
1082 All `Send` types are `'static`.
1084 : Types of this kind consist of "Plain Old Data"
1085 which can be copied by simply moving bits.
1086 All values of this kind can be implicitly copied.
1087 This kind includes scalars and immutable references,
1088 as well as structural types containing other `Copy` types.
1090 : Types of this kind do not contain any references (except for
1091 references with the `static` lifetime, which are allowed).
1092 This can be a useful guarantee for code
1093 that breaks borrowing assumptions
1094 using [`unsafe` operations](#unsafe-functions).
1096 : This is not strictly a kind,
1097 but its presence interacts with kinds:
1098 the `Drop` trait provides a single method `drop`
1099 that takes no parameters,
1100 and is run when values of the type are dropped.
1101 Such a method is called a "destructor",
1102 and are always executed in "top-down" order:
1103 a value is completely destroyed
1104 before any of the values it owns run their destructors.
1105 Only `Send` types can implement `Drop`.
1108 : Types with destructors, closure environments,
1109 and various other _non-first-class_ types,
1110 are not copyable at all.
1111 Such types can usually only be accessed through pointers,
1112 or in some cases, moved between mutable locations.
1114 Kinds can be supplied as _bounds_ on type parameters, like traits, in which
1115 case the parameter is constrained to types satisfying that kind.
1117 By default, type parameters do not carry any assumed kind-bounds at all. When
1118 instantiating a type parameter, the kind bounds on the parameter are checked to
1119 be the same or narrower than the kind of the type that it is instantiated with.
1121 Sending operations are not part of the Rust language, but are implemented in
1122 the library. Generic functions that send values bound the kind of these values
1125 # Memory and concurrency models
1127 Rust has a memory model centered around concurrently-executing _tasks_. Thus
1128 its memory model and its concurrency model are best discussed simultaneously,
1129 as parts of each only make sense when considered from the perspective of the
1132 When reading about the memory model, keep in mind that it is partitioned in
1133 order to support tasks; and when reading about tasks, keep in mind that their
1134 isolation and communication mechanisms are only possible due to the ownership
1135 and lifetime semantics of the memory model.
1139 A Rust program's memory consists of a static set of *items*, a set of
1140 [tasks](#tasks) each with its own *stack*, and a *heap*. Immutable portions of
1141 the heap may be shared between tasks, mutable portions may not.
1143 Allocations in the stack consist of *slots*, and allocations in the heap
1146 ### Memory allocation and lifetime
1148 The _items_ of a program are those functions, modules and types that have their
1149 value calculated at compile-time and stored uniquely in the memory image of the
1150 rust process. Items are neither dynamically allocated nor freed.
1152 A task's _stack_ consists of activation frames automatically allocated on entry
1153 to each function as the task executes. A stack allocation is reclaimed when
1154 control leaves the frame containing it.
1156 The _heap_ is a general term that describes boxes. The lifetime of an
1157 allocation in the heap depends on the lifetime of the box values pointing to
1158 it. Since box values may themselves be passed in and out of frames, or stored
1159 in the heap, heap allocations may outlive the frame they are allocated within.
1161 ### Memory ownership
1163 A task owns all memory it can *safely* reach through local variables, as well
1164 as boxes and references.
1166 When a task sends a value that has the `Send` trait to another task, it loses
1167 ownership of the value sent and can no longer refer to it. This is statically
1168 guaranteed by the combined use of "move semantics", and the compiler-checked
1169 _meaning_ of the `Send` trait: it is only instantiated for (transitively)
1170 sendable kinds of data constructor and pointers, never including references.
1172 When a stack frame is exited, its local allocations are all released, and its
1173 references to boxes are dropped.
1175 When a task finishes, its stack is necessarily empty and it therefore has no
1176 references to any boxes; the remainder of its heap is immediately freed.
1180 A task's stack contains slots.
1182 A _slot_ is a component of a stack frame, either a function parameter, a
1183 [temporary](#lvalues,-rvalues-and-temporaries), or a local variable.
1185 A _local variable_ (or *stack-local* allocation) holds a value directly,
1186 allocated within the stack's memory. The value is a part of the stack frame.
1188 Local variables are immutable unless declared otherwise like: `let mut x = ...`.
1190 Function parameters are immutable unless declared with `mut`. The `mut` keyword
1191 applies only to the following parameter (so `|mut x, y|` and `fn f(mut x:
1192 Box<int>, y: Box<int>)` declare one mutable variable `x` and one immutable
1195 Methods that take either `self` or `Box<Self>` can optionally place them in a
1196 mutable slot by prefixing them with `mut` (similar to regular arguments):
1200 fn change(mut self) -> Self;
1201 fn modify(mut self: Box<Self>) -> Box<Self>;
1205 Local variables are not initialized when allocated; the entire frame worth of
1206 local variables are allocated at once, on frame-entry, in an uninitialized
1207 state. Subsequent statements within a function may or may not initialize the
1208 local variables. Local variables can be used only after they have been
1209 initialized; this is enforced by the compiler.
1213 A _box_ is a reference to a heap allocation holding another value, which is
1214 constructed by the prefix operator `box`. When the standard library is in use,
1215 the type of a box is `std::owned::Box<T>`.
1217 An example of a box type and value:
1220 let x: Box<int> = box 10;
1223 Box values exist in 1:1 correspondence with their heap allocation, copying a
1224 box value makes a shallow copy of the pointer. Rust will consider a shallow
1225 copy of a box to move ownership of the value. After a value has been moved,
1226 the source location cannot be used unless it is reinitialized.
1229 let x: Box<int> = box 10;
1231 // attempting to use `x` will result in an error here
1236 An executing Rust program consists of a tree of tasks. A Rust _task_ consists
1237 of an entry function, a stack, a set of outgoing communication channels and
1238 incoming communication ports, and ownership of some portion of the heap of a
1239 single operating-system process.
1241 ### Communication between tasks
1243 Rust tasks are isolated and generally unable to interfere with one another's
1244 memory directly, except through [`unsafe` code](#unsafe-functions). All
1245 contact between tasks is mediated by safe forms of ownership transfer, and data
1246 races on memory are prohibited by the type system.
1248 When you wish to send data between tasks, the values are restricted to the
1249 [`Send` type-kind](#type-kinds). Restricting communication interfaces to this
1250 kind ensures that no references move between tasks. Thus access to an entire
1251 data structure can be mediated through its owning "root" value; no further
1252 locking or copying is required to avoid data races within the substructure of
1257 The _lifecycle_ of a task consists of a finite set of states and events that
1258 cause transitions between the states. The lifecycle states of a task are:
1265 A task begins its lifecycle — once it has been spawned — in the
1266 *running* state. In this state it executes the statements of its entry
1267 function, and any functions called by the entry function.
1269 A task may transition from the *running* state to the *blocked* state any time
1270 it makes a blocking communication call. When the call can be completed —
1271 when a message arrives at a sender, or a buffer opens to receive a message
1272 — then the blocked task will unblock and transition back to *running*.
1274 A task may transition to the *panicked* state at any time, due being killed by
1275 some external event or internally, from the evaluation of a `panic!()` macro.
1276 Once *panicking*, a task unwinds its stack and transitions to the *dead* state.
1277 Unwinding the stack of a task is done by the task itself, on its own control
1278 stack. If a value with a destructor is freed during unwinding, the code for the
1279 destructor is run, also on the task's control stack. Running the destructor
1280 code causes a temporary transition to a *running* state, and allows the
1281 destructor code to cause any subsequent state transitions. The original task
1282 of unwinding and panicking thereby may suspend temporarily, and may involve
1283 (recursive) unwinding of the stack of a failed destructor. Nonetheless, the
1284 outermost unwinding activity will continue until the stack is unwound and the
1285 task transitions to the *dead* state. There is no way to "recover" from task
1286 panics. Once a task has temporarily suspended its unwinding in the *panicking*
1287 state, a panic occurring from within this destructor results in *hard* panic.
1288 A hard panic currently results in the process aborting.
1290 A task in the *dead* state cannot transition to other states; it exists only to
1291 have its termination status inspected by other tasks, and/or to await
1292 reclamation when the last reference to it drops.
1294 # Runtime services, linkage and debugging
1296 The Rust _runtime_ is a relatively compact collection of Rust code that
1297 provides fundamental services and datatypes to all Rust tasks at run-time. It
1298 is smaller and simpler than many modern language runtimes. It is tightly
1299 integrated into the language's execution model of memory, tasks, communication
1302 ### Memory allocation
1304 The runtime memory-management system is based on a _service-provider
1305 interface_, through which the runtime requests blocks of memory from its
1306 environment and releases them back to its environment when they are no longer
1307 needed. The default implementation of the service-provider interface consists
1308 of the C runtime functions `malloc` and `free`.
1310 The runtime memory-management system, in turn, supplies Rust tasks with
1311 facilities for allocating releasing stacks, as well as allocating and freeing
1316 The runtime provides C and Rust code to assist with various built-in types,
1317 such as arrays, strings, and the low level communication system (ports,
1320 Support for other built-in types such as simple types, tuples and enums is
1321 open-coded by the Rust compiler.
1323 ### Task scheduling and communication
1325 The runtime provides code to manage inter-task communication. This includes
1326 the system of task-lifecycle state transitions depending on the contents of
1327 queues, as well as code to copy values between queues and their recipients and
1328 to serialize values for transmission over operating-system inter-process
1329 communication facilities.
1333 The Rust compiler supports various methods to link crates together both
1334 statically and dynamically. This section will explore the various methods to
1335 link Rust crates together, and more information about native libraries can be
1336 found in the [ffi guide][ffi].
1338 In one session of compilation, the compiler can generate multiple artifacts
1339 through the usage of either command line flags or the `crate_type` attribute.
1340 If one or more command line flag is specified, all `crate_type` attributes will
1341 be ignored in favor of only building the artifacts specified by command line.
1343 * `--crate-type=bin`, `#[crate_type = "bin"]` - A runnable executable will be
1344 produced. This requires that there is a `main` function in the crate which
1345 will be run when the program begins executing. This will link in all Rust and
1346 native dependencies, producing a distributable binary.
1348 * `--crate-type=lib`, `#[crate_type = "lib"]` - A Rust library will be produced.
1349 This is an ambiguous concept as to what exactly is produced because a library
1350 can manifest itself in several forms. The purpose of this generic `lib` option
1351 is to generate the "compiler recommended" style of library. The output library
1352 will always be usable by rustc, but the actual type of library may change from
1353 time-to-time. The remaining output types are all different flavors of
1354 libraries, and the `lib` type can be seen as an alias for one of them (but the
1355 actual one is compiler-defined).
1357 * `--crate-type=dylib`, `#[crate_type = "dylib"]` - A dynamic Rust library will
1358 be produced. This is different from the `lib` output type in that this forces
1359 dynamic library generation. The resulting dynamic library can be used as a
1360 dependency for other libraries and/or executables. This output type will
1361 create `*.so` files on linux, `*.dylib` files on osx, and `*.dll` files on
1364 * `--crate-type=staticlib`, `#[crate_type = "staticlib"]` - A static system
1365 library will be produced. This is different from other library outputs in that
1366 the Rust compiler will never attempt to link to `staticlib` outputs. The
1367 purpose of this output type is to create a static library containing all of
1368 the local crate's code along with all upstream dependencies. The static
1369 library is actually a `*.a` archive on linux and osx and a `*.lib` file on
1370 windows. This format is recommended for use in situations such as linking
1371 Rust code into an existing non-Rust application because it will not have
1372 dynamic dependencies on other Rust code.
1374 * `--crate-type=rlib`, `#[crate_type = "rlib"]` - A "Rust library" file will be
1375 produced. This is used as an intermediate artifact and can be thought of as a
1376 "static Rust library". These `rlib` files, unlike `staticlib` files, are
1377 interpreted by the Rust compiler in future linkage. This essentially means
1378 that `rustc` will look for metadata in `rlib` files like it looks for metadata
1379 in dynamic libraries. This form of output is used to produce statically linked
1380 executables as well as `staticlib` outputs.
1382 Note that these outputs are stackable in the sense that if multiple are
1383 specified, then the compiler will produce each form of output at once without
1384 having to recompile. However, this only applies for outputs specified by the
1385 same method. If only `crate_type` attributes are specified, then they will all
1386 be built, but if one or more `--crate-type` command line flag is specified,
1387 then only those outputs will be built.
1389 With all these different kinds of outputs, if crate A depends on crate B, then
1390 the compiler could find B in various different forms throughout the system. The
1391 only forms looked for by the compiler, however, are the `rlib` format and the
1392 dynamic library format. With these two options for a dependent library, the
1393 compiler must at some point make a choice between these two formats. With this
1394 in mind, the compiler follows these rules when determining what format of
1395 dependencies will be used:
1397 1. If a static library is being produced, all upstream dependencies are
1398 required to be available in `rlib` formats. This requirement stems from the
1399 reason that a dynamic library cannot be converted into a static format.
1401 Note that it is impossible to link in native dynamic dependencies to a static
1402 library, and in this case warnings will be printed about all unlinked native
1403 dynamic dependencies.
1405 2. If an `rlib` file is being produced, then there are no restrictions on what
1406 format the upstream dependencies are available in. It is simply required that
1407 all upstream dependencies be available for reading metadata from.
1409 The reason for this is that `rlib` files do not contain any of their upstream
1410 dependencies. It wouldn't be very efficient for all `rlib` files to contain a
1411 copy of `libstd.rlib`!
1413 3. If an executable is being produced and the `-C prefer-dynamic` flag is not
1414 specified, then dependencies are first attempted to be found in the `rlib`
1415 format. If some dependencies are not available in an rlib format, then
1416 dynamic linking is attempted (see below).
1418 4. If a dynamic library or an executable that is being dynamically linked is
1419 being produced, then the compiler will attempt to reconcile the available
1420 dependencies in either the rlib or dylib format to create a final product.
1422 A major goal of the compiler is to ensure that a library never appears more
1423 than once in any artifact. For example, if dynamic libraries B and C were
1424 each statically linked to library A, then a crate could not link to B and C
1425 together because there would be two copies of A. The compiler allows mixing
1426 the rlib and dylib formats, but this restriction must be satisfied.
1428 The compiler currently implements no method of hinting what format a library
1429 should be linked with. When dynamically linking, the compiler will attempt to
1430 maximize dynamic dependencies while still allowing some dependencies to be
1431 linked in via an rlib.
1433 For most situations, having all libraries available as a dylib is recommended
1434 if dynamically linking. For other situations, the compiler will emit a
1435 warning if it is unable to determine which formats to link each library with.
1437 In general, `--crate-type=bin` or `--crate-type=lib` should be sufficient for
1438 all compilation needs, and the other options are just available if more
1439 fine-grained control is desired over the output format of a Rust crate.
1441 # Appendix: Rationales and design tradeoffs
1445 # Appendix: Influences and further references
1449 > The essential problem that must be solved in making a fault-tolerant
1450 > software system is therefore that of fault-isolation. Different programmers
1451 > will write different modules, some modules will be correct, others will have
1452 > errors. We do not want the errors in one module to adversely affect the
1453 > behaviour of a module which does not have any errors.
1455 > — Joe Armstrong
1457 > In our approach, all data is private to some process, and processes can
1458 > only communicate through communications channels. *Security*, as used
1459 > in this paper, is the property which guarantees that processes in a system
1460 > cannot affect each other except by explicit communication.
1462 > When security is absent, nothing which can be proven about a single module
1463 > in isolation can be guaranteed to hold when that module is embedded in a
1466 > — Robert Strom and Shaula Yemini
1468 > Concurrent and applicative programming complement each other. The
1469 > ability to send messages on channels provides I/O without side effects,
1470 > while the avoidance of shared data helps keep concurrent processes from
1475 Rust is not a particularly original language. It may however appear unusual by
1476 contemporary standards, as its design elements are drawn from a number of
1477 "historical" languages that have, with a few exceptions, fallen out of favour.
1478 Five prominent lineages contribute the most, though their influences have come
1479 and gone during the course of Rust's development:
1481 * The NIL (1981) and Hermes (1990) family. These languages were developed by
1482 Robert Strom, Shaula Yemini, David Bacon and others in their group at IBM
1483 Watson Research Center (Yorktown Heights, NY, USA).
1485 * The Erlang (1987) language, developed by Joe Armstrong, Robert Virding, Claes
1486 Wikström, Mike Williams and others in their group at the Ericsson Computer
1487 Science Laboratory (Älvsjö, Stockholm, Sweden) .
1489 * The Sather (1990) language, developed by Stephen Omohundro, Chu-Cheow Lim,
1490 Heinz Schmidt and others in their group at The International Computer
1491 Science Institute of the University of California, Berkeley (Berkeley, CA,
1494 * The Newsqueak (1988), Alef (1995), and Limbo (1996) family. These
1495 languages were developed by Rob Pike, Phil Winterbottom, Sean Dorward and
1496 others in their group at Bell Labs Computing Sciences Research Center
1497 (Murray Hill, NJ, USA).
1499 * The Napier (1985) and Napier88 (1988) family. These languages were
1500 developed by Malcolm Atkinson, Ron Morrison and others in their group at
1501 the University of St. Andrews (St. Andrews, Fife, UK).
1503 Additional specific influences can be seen from the following languages:
1505 * The structural algebraic types and compilation manager of SML.
1506 * The attribute and assembly systems of C#.
1507 * The references and deterministic destructor system of C++.
1508 * The memory region systems of the ML Kit and Cyclone.
1509 * The typeclass system of Haskell.
1510 * The lexical identifier rule of Python.
1511 * The block syntax of Ruby.
1513 [ffi]: guide-ffi.html
1514 [plugin]: guide-plugin.html