3 Tokens are primitive productions in the grammar defined by regular
4 (non-recursive) languages. "Simple" tokens are given in [string table
5 production](#string-table-productions) form, and occur in the rest of the
6 grammar as double-quoted strings. Other tokens have exact rules given.
10 A literal is an expression consisting of a single token, rather than a sequence
11 of tokens, that immediately and directly denotes the value it evaluates to,
12 rather than referring to it by name or some other evaluation rule. A literal is
13 a form of constant expression, so is evaluated (primarily) at compile time.
17 #### Characters and strings
19 | | Example | `#` sets | Characters | Escapes |
20 |----------------------------------------------|-----------------|------------|-------------|---------------------|
21 | [Character](#character-literals) | `'H'` | `N/A` | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) |
22 | [String](#string-literals) | `"hello"` | `N/A` | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) |
23 | [Raw](#raw-string-literals) | `r#"hello"#` | `0...` | All Unicode | `N/A` |
24 | [Byte](#byte-literals) | `b'H'` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
25 | [Byte string](#byte-string-literals) | `b"hello"` | `N/A` | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
26 | [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | `0...` | All ASCII | `N/A` |
32 | `\x7F` | 8-bit character code (exactly 2 digits) |
34 | `\r` | Carriage return |
43 | `\u{7FFF}` | 24-bit Unicode character code (up to 6 digits) |
49 | `\'` | Single quote |
50 | `\"` | Double quote |
54 | [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
55 |----------------------------------------|---------|----------------|----------|
56 | Decimal integer | `98_222` | `N/A` | Integer suffixes |
57 | Hex integer | `0xff` | `N/A` | Integer suffixes |
58 | Octal integer | `0o77` | `N/A` | Integer suffixes |
59 | Binary integer | `0b1111_0000` | `N/A` | Integer suffixes |
60 | Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes |
62 `*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
66 | Integer | Floating-point |
67 |---------|----------------|
68 | `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `isize`, `usize` | `f32`, `f64` |
70 ### Character and string literals
72 #### Character literals
74 A _character literal_ is a single Unicode character enclosed within two
75 `U+0027` (single-quote) characters, with the exception of `U+0027` itself,
76 which must be _escaped_ by a preceding `U+005C` character (`\`).
80 A _string literal_ is a sequence of any Unicode characters enclosed within two
81 `U+0022` (double-quote) characters, with the exception of `U+0022` itself,
82 which must be _escaped_ by a preceding `U+005C` character (`\`).
84 Line-break characters are allowed in string literals. Normally they represent
85 themselves (i.e. no translation), but as a special exception, when an unescaped
86 `U+005C` character (`\`) occurs immediately before the newline (`U+000A`), the
87 `U+005C` character, the newline, and all whitespace at the beginning of the
88 next line are ignored. Thus `a` and `b` are equal:
98 #### Character escapes
100 Some additional _escapes_ are available in either character or non-raw string
101 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
104 * An _8-bit code point escape_ starts with `U+0078` (`x`) and is
105 followed by exactly two _hex digits_. It denotes the Unicode code point
106 equal to the provided hex value.
107 * A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed
108 by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D`
109 (`}`). It denotes the Unicode code point equal to the provided hex value.
110 * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
111 (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF),
112 `U+000D` (CR) or `U+0009` (HT) respectively.
113 * The _null escape_ is the character `U+0030` (`0`) and denotes the Unicode
114 value `U+0000` (NUL).
115 * The _backslash escape_ is the character `U+005C` (`\`) which must be
116 escaped in order to denote *itself*.
118 #### Raw string literals
120 Raw string literals do not process any escapes. They start with the character
121 `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a
122 `U+0022` (double-quote) character. The _raw string body_ can contain any sequence
123 of Unicode characters and is terminated only by another `U+0022` (double-quote)
124 character, followed by the same number of `U+0023` (`#`) characters that preceded
125 the opening `U+0022` (double-quote) character.
127 All Unicode characters contained in the raw string body represent themselves,
128 the characters `U+0022` (double-quote) (except when followed by at least as
129 many `U+0023` (`#`) characters as were used to start the raw string literal) or
130 `U+005C` (`\`) do not have any special meaning.
132 Examples for string literals:
135 "foo"; r"foo"; // foo
136 "\"foo\""; r#""foo""#; // "foo"
139 r##"foo #"# bar"##; // foo #"# bar
141 "\x52"; "R"; r"R"; // R
142 "\\x52"; r"\x52"; // \x52
145 ### Byte and byte string literals
149 A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F`
150 range) or a single _escape_ preceded by the characters `U+0062` (`b`) and
151 `U+0027` (single-quote), and followed by the character `U+0027`. If the character
152 `U+0027` is present within the literal, it must be _escaped_ by a preceding
153 `U+005C` (`\`) character. It is equivalent to a `u8` unsigned 8-bit integer
156 #### Byte string literals
158 A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_,
159 preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and
160 followed by the character `U+0022`. If the character `U+0022` is present within
161 the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
162 Alternatively, a byte string literal can be a _raw byte string literal_, defined
163 below. A byte string literal of length `n` is equivalent to a `&'static [u8; n]` borrowed fixed-sized array
164 of unsigned 8-bit integers.
166 Some additional _escapes_ are available in either byte or non-raw byte string
167 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
170 * A _byte escape_ escape starts with `U+0078` (`x`) and is
171 followed by exactly two _hex digits_. It denotes the byte
172 equal to the provided hex value.
173 * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
174 (`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF),
175 `0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively.
176 * The _null escape_ is the character `U+0030` (`0`) and denotes the byte
177 value `0x00` (ASCII NUL).
178 * The _backslash escape_ is the character `U+005C` (`\`) which must be
179 escaped in order to denote its ASCII encoding `0x5C`.
181 #### Raw byte string literals
183 Raw byte string literals do not process any escapes. They start with the
184 character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more
185 of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
186 _raw string body_ can contain any sequence of ASCII characters and is terminated
187 only by another `U+0022` (double-quote) character, followed by the same number of
188 `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote)
189 character. A raw byte string literal can not contain any non-ASCII byte.
191 All characters contained in the raw string body represent their ASCII encoding,
192 the characters `U+0022` (double-quote) (except when followed by at least as
193 many `U+0023` (`#`) characters as were used to start the raw string literal) or
194 `U+005C` (`\`) do not have any special meaning.
196 Examples for byte string literals:
199 b"foo"; br"foo"; // foo
200 b"\"foo\""; br#""foo""#; // "foo"
203 br##"foo #"# bar"##; // foo #"# bar
205 b"\x52"; b"R"; br"R"; // R
206 b"\\x52"; br"\x52"; // \x52
211 A _number literal_ is either an _integer literal_ or a _floating-point
212 literal_. The grammar for recognizing the two kinds of literals is mixed.
214 #### Integer literals
216 An _integer literal_ has one of four forms:
218 * A _decimal literal_ starts with a *decimal digit* and continues with any
219 mixture of *decimal digits* and _underscores_.
220 * A _hex literal_ starts with the character sequence `U+0030` `U+0078`
221 (`0x`) and continues as any mixture of hex digits and underscores.
222 * An _octal literal_ starts with the character sequence `U+0030` `U+006F`
223 (`0o`) and continues as any mixture of octal digits and underscores.
224 * A _binary literal_ starts with the character sequence `U+0030` `U+0062`
225 (`0b`) and continues as any mixture of binary digits and underscores.
227 Like any literal, an integer literal may be followed (immediately,
228 without any spaces) by an _integer suffix_, which forcibly sets the
229 type of the literal. The integer suffix must be the name of one of the
230 integral types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`,
233 The type of an _unsuffixed_ integer literal is determined by type inference:
235 * If an integer type can be _uniquely_ determined from the surrounding
236 program context, the unsuffixed integer literal has that type.
238 * If the program context under-constrains the type, it defaults to the
239 signed 32-bit integer `i32`.
241 * If the program context over-constrains the type, it is considered a
244 Examples of integer literals of various forms:
251 0o70_i16; // type i16
252 0b1111_1111_1001_0000_i32; // type i32
253 0usize; // type usize
256 Note that the Rust syntax considers `-1i8` as an application of the [unary minus
257 operator](#unary-operator-expressions) to an integer literal `1i8`, rather than
258 a single integer literal.
260 #### Floating-point literals
262 A _floating-point literal_ has one of two forms:
264 * A _decimal literal_ followed by a period character `U+002E` (`.`). This is
265 optionally followed by another decimal literal, with an optional _exponent_.
266 * A single _decimal literal_ followed by an _exponent_.
268 Like integer literals, a floating-point literal may be followed by a
269 suffix, so long as the pre-suffix part does not end with `U+002E` (`.`).
270 The suffix forcibly sets the type of the literal. There are two valid
271 _floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point
272 types), which explicitly determine the type of the literal.
274 The type of an _unsuffixed_ floating-point literal is determined by
277 * If a floating-point type can be _uniquely_ determined from the
278 surrounding program context, the unsuffixed floating-point literal
281 * If the program context under-constrains the type, it defaults to `f64`.
283 * If the program context over-constrains the type, it is considered a
286 Examples of floating-point literals of various forms:
289 123.0f64; // type f64
292 12E+99_f64; // type f64
293 let x: f64 = 2.; // type f64
296 This last example is different because it is not possible to use the suffix
297 syntax with a floating point literal ending in a period. `2.f64` would attempt
298 to call a method named `f64` on `2`.
300 The representation semantics of floating-point numbers are described in
301 ["Machine Types"](#machine-types).
305 The two values of the boolean type are written `true` and `false`.
309 Symbols are a general class of printable [tokens](#tokens) that play structural
310 roles in a variety of grammar productions. They are a
311 set of remaining miscellaneous printable tokens that do not
312 otherwise appear as [unary operators](#unary-operator-expressions), [binary
313 operators](#binary-operator-expressions), or [keywords][keywords].
314 They are catalogued in [the Symbols section][symbols] of the Grammar document.
316 [symbols]: grammar.html#symbols