# Notation
-Rust's grammar is defined over Unicode codepoints, each conventionally denoted
+Rust's grammar is defined over Unicode code points, each conventionally denoted
`U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
confined to the ASCII range of Unicode, and is described in this document by a
dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
- Square brackets are used to group rules.
- `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
- Unicode codepoint `U+00QQ`.
+ Unicode code point `U+00QQ`.
- `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
- The `repeat` forms apply to the adjacent `element`, and are as follows:
- `?` means zero or one repetition
## Unicode productions
-A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
+A few productions in Rust's grammar permit Unicode code points outside the ASCII
range. We define these productions in terms of character properties specified
-in the Unicode standard, rather than in terms of ASCII-range codepoints. The
+in the Unicode standard, rather than in terms of ASCII-range code points. The
section [Special Unicode Productions](#special-unicode-productions) lists these
productions.
## Input format
-Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
+Rust input is interpreted as a sequence of Unicode code points encoded in UTF-8.
Most Rust grammar rules are defined in terms of printable ASCII-range
-codepoints, but a small number are defined in terms of Unicode properties or
-explicit codepoint lists. [^inputformat]
+code points, but a small number are defined in terms of Unicode properties or
+explicit code point lists. [^inputformat]
[^inputformat]: Substitute definitions for the special Unicode productions are
provided to the grammar verifier, restricted to ASCII range, when verifying the
sequence (`/**`), are interpreted as a special syntax for `doc`
[attributes](#attributes). That is, they are equivalent to writing
`#[doc="..."]` around the body of the comment (this includes the comment
-characters themselves, ie `/// Foo` turns into `#[doc="/// Foo"]`).
+characters themselves, i.e. `/// Foo` turns into `#[doc="/// Foo"]`).
-Line comments beginning with `//!` are doc comments that apply to the parent
-of the comment, rather than the item that follows. That is, they are
-equivalent to writing `#![doc="..."]` around the body of the comment. `//!`
-comments are usually used to display information on the crate index page.
+Line comments beginning with `//!` and block comments beginning with `/*!` are
+doc comments that apply to the parent of the comment, rather than the item
+that follows. That is, they are equivalent to writing `#![doc="..."]` around
+the body of the comment. `//!` comments are usually used to display
+information on the crate index page.
Non-doc comments are interpreted as a form of whitespace.
| fn | for | if | impl | in |
| let | loop | macro | match | mod |
| move | mut | offsetof | override | priv |
-| pub | pure | ref | return | sizeof |
-| static | self | struct | super | true |
-| trait | type | typeof | unsafe | unsized |
-| use | virtual | where | while | yield |
+| proc | pub | pure | ref | return |
+| Self | self | sizeof | static | struct |
+| super | trait | true | type | typeof |
+| unsafe | unsized | use | virtual | where |
+| while | yield | | | |
Each of these keywords has special meaning in its grammar, and all of them are
##### Suffixes
| Integer | Floating-point |
|---------|----------------|
-| `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `is` (`isize`), `us` (`usize`) | `f32`, `f64` |
+| `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `isize`, `usize` | `f32`, `f64` |
#### Character and string literals
literals. An escape starts with a `U+005C` (`\`) and continues with one of the
following forms:
-* An _8-bit codepoint escape_ escape starts with `U+0078` (`x`) and is
- followed by exactly two _hex digits_. It denotes the Unicode codepoint
+* An _8-bit code point escape_ starts with `U+0078` (`x`) and is
+ followed by exactly two _hex digits_. It denotes the Unicode code point
equal to the provided hex value.
-* A _24-bit codepoint escape_ starts with `U+0075` (`u`) and is followed
+* A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed
by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D`
- (`}`). It denotes the Unicode codepoint equal to the provided hex value.
+ (`}`). It denotes the Unicode code point equal to the provided hex value.
* A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
- (`r`), or `U+0074` (`t`), denoting the unicode values `U+000A` (LF),
+ (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF),
`U+000D` (CR) or `U+0009` (HT) respectively.
* The _backslash escape_ is the character `U+005C` (`\`) which must be
escaped in order to denote *itself*.
literals. An escape starts with a `U+005C` (`\`) and continues with one of the
following forms:
-* An _byte escape_ escape starts with `U+0078` (`x`) and is
+* A _byte escape_ escape starts with `U+0078` (`x`) and is
followed by exactly two _hex digits_. It denotes the byte
equal to the provided hex value.
* A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
only the name of a matched nonterminal comes after the dollar sign.
In both the matcher and transcriber, the Kleene star-like operator indicates
-repetition. The Kleene star operator consists of `$` and parens, optionally
+repetition. The Kleene star operator consists of `$` and parentheses, optionally
followed by a separator token, followed by `*` or `+`. `*` means zero or more
-repetitions, `+` means at least one repetition. The parens are not matched or
+repetitions, `+` means at least one repetition. The parentheses are not matched or
transcribed. On the matcher side, a name is bound to _all_ of the names it
matches, in a structure that mimics the structure of the repetition encountered
on a successful match. The job of the transcriber is to sort that structure
# Crates and source files
-Rust is a *compiled* language. Its semantics obey a *phase distinction* between
-compile-time and run-time. Those semantic rules that have a *static
+Although Rust, like any other language, can be implemented by an interpreter as
+well as a compiler, the only existing implementation is a compiler —
+from now on referred to as *the* Rust compiler — and the language has
+always been designed to be compiled. For these reasons, this section assumes a
+compiler.
+
+Rust's semantics obey a *phase distinction* between compile-time and
+run-time.[^phase-distinction] Those semantic rules that have a *static
interpretation* govern the success or failure of compilation. Those semantics
that have a *dynamic interpretation* govern the behavior of the program at
run-time.
+[^phase-distinction]: This distinction would also exist in an interpreter.
+ Static checks like syntactic analysis, type checking, and lints should
+ happen before the program is executed regardless of when it is executed.
+
The compilation model centers on artifacts called _crates_. Each compilation
processes a single crate in source form, and if successful, produces a single
-crate in binary form: either an executable or a library.[^cratesourcefile]
+crate in binary form: either an executable or some sort of
+library.[^cratesourcefile]
[^cratesourcefile]: A crate is somewhat analogous to an *assembly* in the
ECMA-335 CLI model, a *library* in the SML/NJ Compilation Manager, a *unit*
A Rust source file describes a module, the name and location of which —
in the module tree of the current crate — are defined from outside the
source file: either by an explicit `mod_item` in a referencing source file, or
-by the name of the crate itself.
+by the name of the crate itself. Every source file is a module, but not every
+module needs its own source file: [module definitions](#modules) can be nested
+within one file.
Each source file contains a sequence of zero or more `item` definitions, and
-may optionally begin with any number of `attributes` that apply to the
-containing module. Attributes on the anonymous crate module define important
-metadata that influences the behavior of the compiler.
+may optionally begin with any number of [attributes](#Items and attributes)
+that apply to the containing module, most of which influence the behavior of
+the compiler. The anonymous crate module can have additional attributes that
+apply to the crate as a whole.
```no_run
-// Crate name
+// Specify the crate name.
#![crate_name = "projx"]
-// Specify the output type
+// Specify the type of output artifact.
#![crate_type = "lib"]
-// Turn on a warning
+// Turn on a warning.
+// This can be done in any module, not just the anonymous crate module.
#![warn(non_camel_case_types)]
```
angle-bracket-enclosed, comma-separated list following the function name.
```{.ignore}
-fn iter<T>(seq: &[T], f: |T|) {
- for elt in seq.iter() { f(elt); }
+fn iter<T, F>(seq: &[T], f: F) where T: Copy, F: Fn(T) {
+ for elt in seq { f(*elt); }
}
-fn map<T, U>(seq: &[T], f: |T| -> U) -> Vec<U> {
+fn map<T, U, F>(seq: &[T], f: F) -> Vec<U> where T: Copy, U: Copy, F: Fn(T) -> U {
let mut acc = vec![];
- for elt in seq.iter() { acc.push(f(elt)); }
+ for elt in seq { acc.push(f(*elt)); }
acc
}
```
Inside the function signature and body, the name of the type parameter can be
-used as a type name.
+used as a type name. [Trait](#traits) bounds can be specified for type parameters
+to allow methods with that trait to be called on values of that type. This is
+specified using the `where` syntax, as in the above example.
When a generic function is referenced, its type is instantiated based on the
context of the reference. For example, calling the `iter` function defined
above on `[1, 2]` will instantiate type parameter `T` with `i32`, and require
-the closure parameter to have type `fn(i32)`.
+the closure parameter to have type `Fn(i32)`.
The type parameters can also be explicitly supplied in a trailing
[path](#paths) component after the function name. This might be necessary if
there is not sufficient context to determine the type parameters. For example,
`mem::size_of::<u32>() == 4`.
-Since a parameter type is opaque to the generic function, the set of operations
-that can be performed on it is limited. Values of parameter type can only be
-moved, not copied.
-
-```
-fn id<T>(x: T) -> T { x }
-```
-
-Similarly, [trait](#traits) bounds can be specified for type parameters to
-allow methods with that trait to be called on values of that type.
-
#### Unsafety
Unsafe operations are those that potentially violate the memory-safety
[noalias]: http://llvm.org/docs/LangRef.html#noalias
-##### Behaviour not considered unsafe
+##### Behavior not considered unsafe
-This is a list of behaviour not considered *unsafe* in Rust terms, but that may
+This is a list of behavior not considered *unsafe* in Rust terms, but that may
be undesired.
* Deadlocks
several different type constraints.
For example, the following defines the type `Point` as a synonym for the type
-`(u8, u8)`, the type of pairs of unsigned 8 bit integers.:
+`(u8, u8)`, the type of pairs of unsigned 8 bit integers:
```
type Point = (u8, u8);
}
```
-Traits also define an [object type](#object-types) with the same name as the
+Traits also define an [trait object](#trait-objects) with the same name as the
trait. Values of this type are created by [casting](#type-cast-expressions)
pointer values (pointing to a type for which an implementation of the given
trait is in scope) to pointers to the trait name, used as a type.
### Crate-only attributes
-- `crate_name` - specify the this crate's crate name.
+- `crate_name` - specify the crate's crate name.
- `crate_type` - see [linkage](#linkage).
- `feature` - see [compiler features](#compiler-features).
- `no_builtins` - disable optimizing certain code patterns to invocations of
`"unix"` or `"windows"`. The value of this configuration option is defined
as a configuration itself, like `unix` or `windows`.
* `target_os = "..."`. Operating system of the target, examples include
- `"win32"`, `"macos"`, `"linux"`, `"android"`, `"freebsd"`, `"dragonfly"`,
+ `"windows"`, `"macos"`, `"ios"`, `"linux"`, `"android"`, `"freebsd"`, `"dragonfly"`,
`"bitrig"` or `"openbsd"`.
* `target_pointer_width = "..."`. Target pointer width in bits. This is set
to `"32"` for targets with 32-bit pointers, and likewise set to `"64"` for
terms of encapsulation).
If a feature is promoted to a language feature, then all existing programs will
-start to receive compilation warnings about #[feature] directives which enabled
+start to receive compilation warnings about `#![feature]` directives which enabled
the new feature (because the directive is no longer necessary). However, if a
feature is decided to be removed from the language, errors will be issued (if
there isn't a parser error first). The directive in this case is no longer
identifier, and a parenthesized expression-list. Method calls are resolved to
methods on specific traits, either statically dispatching to a method if the
exact `self`-type of the left-hand-side is known, or dynamically dispatching if
-the left-hand-side expression is an indirect [object type](#object-types).
+the left-hand-side expression is an indirect [trait object](#trait-objects).
### Field expressions
(["a", "b"])[10]; // panics
```
+### Range expressions
+
+```{.ebnf .gram}
+range_expr : expr ".." expr |
+ expr ".." |
+ ".." expr |
+ ".." ;
+```
+
+The `..` operator will construct an object of one of the `std::ops::Range` variants.
+
+```
+1..2; // std::ops::Range
+3..; // std::ops::RangeFrom
+..4; // std::ops::RangeTo
+..; // std::ops::RangeFull
+```
+
+The following expressions are equivalent.
+
+```
+let x = std::ops::Range {start: 0, end: 10};
+let y = 0..10;
+
+assert_eq!(x,y);
+```
+
### Unary operator expressions
Rust defines three unary operators. They are all written as prefix operators,
ten_times(|j| println!("hello, {}", j));
```
-### While loops
-
-```{.ebnf .gram}
-while_expr : [ lifetime ':' ] "while" no_struct_literal_expr '{' block '}' ;
-```
-
-A `while` loop begins by evaluating the boolean loop conditional expression.
-If the loop conditional expression evaluates to `true`, the loop body block
-executes and control returns to the loop conditional expression. If the loop
-conditional expression evaluates to `false`, the `while` expression completes.
-
-An example:
-
-```
-let mut i = 0;
-
-while i < 10 {
- println!("hello");
- i = i + 1;
-}
-```
-
### Infinite loops
A `loop` expression denotes an infinite loop.
loop_expr : [ lifetime ':' ] "loop" '{' block '}';
```
-A `loop` expression may optionally have a _label_. If a label is present, then
-labeled `break` and `continue` expressions nested within this loop may exit out
-of this loop or return control to its head. See [Break
-expressions](#break-expressions) and [Continue
+A `loop` expression may optionally have a _label_. The label is written as
+a lifetime preceding the loop expression, as in `'foo: loop{ }`. If a
+label is present, then labeled `break` and `continue` expressions nested
+within this loop may exit out of this loop or return control to its head.
+See [Break expressions](#break-expressions) and [Continue
expressions](#continue-expressions).
### Break expressions
A `break` expression has an optional _label_. If the label is absent, then
executing a `break` expression immediately terminates the innermost loop
enclosing it. It is only permitted in the body of a loop. If the label is
-present, then `break foo` terminates the loop with label `foo`, which need not
+present, then `break 'foo` terminates the loop with label `'foo`, which need not
be the innermost label enclosing the `break` expression, but must enclose it.
### Continue expressions
of the innermost loop enclosing it, returning control to the loop *head*. In
the case of a `while` loop, the head is the conditional expression controlling
the loop. In the case of a `for` loop, the head is the call-expression
-controlling the loop. If the label is present, then `continue foo` returns
-control to the head of the loop with label `foo`, which need not be the
+controlling the loop. If the label is present, then `continue 'foo` returns
+control to the head of the loop with label `'foo`, which need not be the
innermost label enclosing the `break` expression, but must enclose it.
A `continue` expression is only permitted in the body of a loop.
+### While loops
+
+```{.ebnf .gram}
+while_expr : [ lifetime ':' ] "while" no_struct_literal_expr '{' block '}' ;
+```
+
+A `while` loop begins by evaluating the boolean loop conditional expression.
+If the loop conditional expression evaluates to `true`, the loop body block
+executes and control returns to the loop conditional expression. If the loop
+conditional expression evaluates to `false`, the `while` expression completes.
+
+An example:
+
+```
+let mut i = 0;
+
+while i < 10 {
+ println!("hello");
+ i = i + 1;
+}
+```
+
+Like `loop` expressions, `while` loops can be controlled with `break` or
+`continue`, and may optionally have a _label_. See [infinite
+loops](#infinite-loops), [break expressions](#break-expressions), and
+[continue expressions](#continue-expressions) for more information.
+
### For expressions
```{.ebnf .gram}
}
```
+Like `loop` expressions, `for` loops can be controlled with `break` or
+`continue`, and may optionally have a _label_. See [infinite
+loops](#infinite-loops), [break expressions](#break-expressions), and
+[continue expressions](#continue-expressions) for more information.
+
### If expressions
```{.ebnf .gram}
UTF-32 string.
A value of type `str` is a Unicode string, represented as an array of 8-bit
-unsigned bytes holding a sequence of UTF-8 codepoints. Since `str` is of
+unsigned bytes holding a sequence of UTF-8 code points. Since `str` is of
unknown size, it is not a _first-class_ type, but can only be instantiated
through a pointer type, such as `&str` or `String`.
```
-### Object types
+### Trait objects
Every trait item (see [traits](#traits)) defines a type with the same name as
-the trait. This type is called the _object type_ of the trait. Object types
+the trait. This type is called the _trait object_ of the trait. Trait objects
permit "late binding" of methods, dispatched using _virtual method tables_
("vtables"). Whereas most calls to trait methods are "early bound" (statically
resolved) to specific implementations at compile time, a call to a method on an
-object type is only resolved to a vtable entry at compile time. The actual
+trait objects is only resolved to a vtable entry at compile time. The actual
implementation for each vtable entry can vary on an object-by-object basis.
Given a pointer-typed expression `E` of type `&T` or `Box<T>`, where `T`
implements trait `R`, casting `E` to the corresponding pointer type `&R` or
-`Box<R>` results in a value of the _object type_ `R`. This result is
+`Box<R>` results in a value of the _trait object_ `R`. This result is
represented as a pair of pointers: the vtable pointer for the `T`
implementation of `R`, and the pointer value of `E`.
-An example of an object type:
+An example of a trait object:
```
trait Printable {
}
```
-In this example, the trait `Printable` occurs as an object type in both the
+In this example, the trait `Printable` occurs as a trait object in both the
type signature of `print`, and the cast expression in `main`.
### Type parameters