3 Strings are an important concept to master in any programming language. If you
4 come from a managed language background, you may be surprised at the complexity
5 of string handling in a systems programming language. Efficient access and
6 allocation of memory for a dynamically sized structure involves a lot of
7 details. Luckily, Rust has lots of tools to help us here.
9 A **string** is a sequence of unicode scalar values encoded as a stream of
10 UTF-8 bytes. All strings are guaranteed to be validly-encoded UTF-8 sequences.
11 Additionally, strings are not null-terminated and can contain null bytes.
13 Rust has two main types of strings: `&str` and `String`.
17 The first kind is a `&str`. This is pronounced a 'string slice'.
18 String literals are of the type `&str`:
21 let string = "Hello there.";
24 Like any Rust reference, string slices have an associated lifetime. A string
25 literal is a `&'static str`. A string slice can be written without an explicit
26 lifetime in many cases, such as in function arguments. In these cases the
27 lifetime will be inferred:
30 fn takes_slice(slice: &str) {
31 println!("Got: {}", slice);
35 Like vector slices, string slices are simply a pointer plus a length. This
36 means that they're a 'view' into an already-allocated string, such as a
37 string literal or a `String`.
41 You may occasionally see references to a `str` type, without the `&`. While
42 this type does exist, it’s not something you want to use yourself. Sometimes,
43 people confuse `str` for `String`, and write this:
51 This leads to ugly errors:
54 error: the trait `core::marker::Sized` is not implemented for the type `str` [E0277]
55 note: `str` does not have a constant size known at compile-time
58 Instead, this `struct` should be
66 So let’s talk about `String`s.
70 A `String` is a heap-allocated string. This string is growable, and is
71 also guaranteed to be UTF-8. `String`s are commonly created by
72 converting from a string slice using the `to_string` method.
75 let mut s = "Hello".to_string();
78 s.push_str(", world.");
82 A reference to a `String` will automatically coerce to a string slice:
85 fn takes_slice(slice: &str) {
86 println!("Got: {}", slice);
90 let s = "Hello".to_string();
95 You can also get a `&str` from a stack-allocated array of bytes:
100 let x: &[u8] = &[b'a', b'b'];
101 let stack_str: &str = str::from_utf8(x).unwrap();
106 ## `String` vs. `&str`
108 In general, you should prefer `String` when you need ownership, and `&str` when
109 you just need to borrow a string. This is very similar to using `Vec<T>` vs. `&[T]`,
110 and `T` vs `&T` in general.
112 This means starting off with this:
118 and only moving to this:
124 if you have good reason. It's not polite to hold on to ownership you don't
125 need, and it can make your lifetimes more complex.
129 To write a function that's generic over types of strings, use `&str`.
132 fn some_string_length(x: &str) -> usize {
137 let s = "Hello, world";
139 println!("{}", some_string_length(s));
141 let s = "Hello, world".to_string();
143 println!("{}", some_string_length(&s));
147 Both of these lines will print `12`.
151 You may be tempted to try to access a certain character of a `String`, like
155 let s = "hello".to_string();
157 println!("{}", s[0]);
160 This does not compile. This is on purpose. In the world of UTF-8, direct
161 indexing is basically never what you want to do. The reason is that each
162 character can be a variable number of bytes. This means that you have to iterate
163 through the characters anyway, which is an O(n) operation.
165 There's 3 basic levels of unicode (and its encodings):
167 - code units, the underlying data type used to store everything
168 - code points/unicode scalar values (char)
169 - graphemes (visible characters)
171 Rust provides iterators for each of these situations:
173 - `.bytes()` will iterate over the underlying bytes
174 - `.chars()` will iterate over the code points
175 - `.graphemes()` will iterate over each grapheme
177 Usually, the `graphemes()` method on `&str` is what you want:
180 # #![feature(unicode)]
181 let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
183 for l in s.graphemes(true) {
200 Note that `l` has the type `&str` here, since a single grapheme can consist of
201 multiple codepoints, so a `char` wouldn't be appropriate.
203 This will print out each visible character in turn, as you'd expect: first `u͔`, then
204 `n͈̰̎`, etc. If you wanted each individual codepoint of each grapheme, you can use `.chars()`:
207 let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
245 You can see how some of them are combining characters, and therefore the output
248 If you want the individual byte representation of each codepoint, you can use
252 let s = "u͔n͈̰̎i̙̮͚̦c͚̉o̼̩̰͗d͔̆̓ͥé";
309 Many more bytes than graphemes!
313 References to `String`s will automatically coerce into `&str`s. Like this:
317 println!("Hello, {}!", s);
321 let string = "Steve".to_string();