src/doc/guide-ownership.md

   1 % The Rust Ownership Guide
   2
   3 This guide presents Rust's ownership system. This is one of Rust's most unique
   4 and compelling features, with which Rust developers should become quite
   5 acquainted. Ownership is how Rust achieves its largest goal, memory safety.
   6 The ownership system has a few distinct concepts: **ownership**, **borrowing**,
   7 and **lifetimes**. We'll talk about each one in turn.
   8
   9 # Meta
  10
  11 Before we get to the details, two important notes about the ownership system.
  12
  13 Rust has a focus on safety and speed. It accomplishes these goals through many
  14 "zero cost abstractions," which means that in Rust, abstractions cost as little
  15 as possible in order to make them work. The ownership system is a prime example
  16 of a zero cost abstraction. All of the analysis we'll talk about in this guide
  17 is _done at compile time_. You do not pay any run-time cost for any of these
  18 features.
  19
  20 However, this system does have a certain cost: learning curve. Many new users
  21 to Rust experience something we like to call "fighting with the borrow
  22 checker," where the Rust compiler refuses to compile a program that the author
  23 thinks is valid. This often happens because the programmer's mental model of
  24 how ownership should work doesn't match the actual rules that Rust implements.
  25 You probably will experience similar things at first. There is good news,
  26 however: more experienced Rust developers report that once they work with the
  27 rules of the ownership system for a period of time, they fight the borrow
  28 checker less and less.
  29
  30 With that in mind, let's learn about ownership.
  31
  32 # Ownership
  33
  34 At its core, ownership is about 'resources.' For the purposes of the vast
  35 majority of this guide, we will talk about a specific resource: memory. The
  36 concept generalizes to any kind of resource, like a file handle, but to make it
  37 more concrete, we'll focus on memory.
  38
  39 When your program allocates some memory, it needs some way to deallocate that
  40 memory. Imagine a function `foo` that allocates four bytes of memory, and then
  41 never deallocates that memory. We call this problem 'leaking' memory, because
  42 each time we call `foo`, we're allocating another four bytes. Eventually, with
  43 enough calls to `foo`, we will run our system out of memory. That's no good. So
  44 we need some way for `foo` to deallocate those four bytes. It's also important
  45 that we don't deallocate too many times, either. Without getting into the
  46 details, attempting to deallocate memory multiple times can lead to problems.
  47 In other words, any time some memory is allocated, we need to make sure that we
  48 deallocate that memory once and only once. Too many times is bad, not enough
  49 times is bad. The counts must match.
  50
  51 There's one other important detail with regards to allocating memory. Whenever
  52 we request some amount of memory, what we are given is a handle to that memory.
  53 This handle (often called a 'pointer', when we're referring to memory) is how
  54 we interact with the allocated memory. As long as we have that handle, we can
  55 do something with the memory. Once we're done with the handle, we're also done
  56 with the memory, as we can't do anything useful without a handle to it.
  57
  58 Historically, systems programming languages require you to track these
  59 allocations, deallocations, and handles yourself. For example, if we want some
  60 memory from the heap in a language like C, we do this:
  61
  62 ```c
  63 {
  64     int *x = malloc(sizeof(int));
  65
  66     // we can now do stuff with our handle x
  67     *x = 5;
  68
  69     free(x);
  70 }
  71 ```
  72
  73 The call to `malloc` allocates some memory. The call to `free` deallocates the
  74 memory. There's also bookkeeping about allocating the correct amount of memory.
  75
  76 Rust combines these two aspects of allocating memory (and other resources) into
  77 a concept called 'ownership.' Whenever we request some memory, that handle we
  78 receive is called the 'owning handle.' Whenever that handle goes out of scope,
  79 Rust knows that you cannot do anything with the memory anymore, and so
  80 therefore deallocates the memory for you. Here's the equivalent example in
  81 Rust:
  82
  83 ```rust
  84 {
  85     let x = box 5i;
  86 }
  87 ```
  88
  89 The `box` keyword creates a `Box<T>` (specifically `Box<int>` in this case) by
  90 allocating a small segment of memory on the heap with enough space to fit an
  91 `int`. But where in the code is the box deallocated? We said before that we
  92 must have a deallocation for each allocation. Rust handles this for you. It
  93 knows that our handle, `x`, is the owning reference to our box. Rust knows that
  94 `x` will go out of scope at the end of the block, and so it inserts a call to
  95 deallocate the memory at the end of the scope. Because the compiler does this
  96 for us, it's impossible to forget. We always have exactly one deallocation paired
  97 with each of our allocations.
  98
  99 This is pretty straightforward, but what happens when we want to pass our box
 100 to a function? Let's look at some code:
 101
 102 ```rust
 103 fn main() {
 104     let x = box 5i;
 105
 106     add_one(x);
 107 }
 108
 109 fn add_one(mut num: Box<int>) {
 110     *num += 1;
 111 }
 112 ```
 113
 114 This code works, but it's not ideal. For example, let's add one more line of
 115 code, where we print out the value of `x`:
 116
 117 ```{rust,ignore}
 118 fn main() {
 119     let x = box 5i;
 120
 121     add_one(x);
 122
 123     println!("{}", x);
 124 }
 125
 126 fn add_one(mut num: Box<int>) {
 127     *num += 1;
 128 }
 129 ```
 130
 131 This does not compile, and gives us an error:
 132
 133 ```text
 134 error: use of moved value: `x`
 135    println!("{}", x);
 136                   ^
 137 ```
 138
 139 Remember, we need one deallocation for every allocation. When we try to pass
 140 our box to `add_one`, we would have two handles to the memory: `x` in `main`,
 141 and `num` in `add_one`. If we deallocated the memory when each handle went out
 142 of scope, we would have two deallocations and one allocation, and that's wrong.
 143 So when we call `add_one`, Rust defines `num` as the owner of the handle. And
 144 so, now that we've given ownership to `num`, `x` is invalid. `x`'s value has
 145 "moved" from `x` to `num`. Hence the error: use of moved value `x`.
 146
 147 To fix this, we can have `add_one` give ownership back when it's done with the
 148 box:
 149
 150 ```rust
 151 fn main() {
 152     let x = box 5i;
 153
 154     let y = add_one(x);
 155
 156     println!("{}", y);
 157 }
 158
 159 fn add_one(mut num: Box<int>) -> Box<int> {
 160     *num += 1;
 161
 162     num
 163 }
 164 ```
 165
 166 This code will compile and run just fine. Now, we return a `box`, and so the
 167 ownership is transferred back to `y` in `main`. We only have ownership for the
 168 duration of our function before giving it back. This pattern is very common,
 169 and so Rust introduces a concept to describe a handle which temporarily refers
 170 to something another handle owns. It's called "borrowing," and it's done with
 171 "references", designated by the `&` symbol.
 172
 173 # Borrowing
 174
 175 Here's the current state of our `add_one` function:
 176
 177 ```rust
 178 fn add_one(mut num: Box<int>) -> Box<int> {
 179     *num += 1;
 180
 181     num
 182 }
 183 ```
 184
 185 This function takes ownership, because it takes a `Box`, which owns its
 186 contents. But then we give ownership right back.
 187
 188 In the physical world, you can give one of your possessions to someone for a
 189 short period of time. You still own your possession, you're just letting someone
 190 else use it for a while. We call that 'lending' something to someone, and that
 191 person is said to be 'borrowing' that something from you.
 192
 193 Rust's ownership system also allows an owner to lend out a handle for a limited
 194 period. This is also called 'borrowing.' Here's a version of `add_one` which
 195 borrows its argument rather than taking ownership:
 196
 197 ```rust
 198 fn add_one(num: &mut int) {
 199     *num += 1;
 200 }
 201 ```
 202
 203 This function borrows an `int` from its caller, and then increments it. When
 204 the function is over, and `num` goes out of scope, the borrow is over.
 205
 206 # Lifetimes
 207
 208 Lending out a reference to a resource that someone else owns can be
 209 complicated, however. For example, imagine this set of operations:
 210
 211 1. I acquire a handle to some kind of resource.
 212 2. I lend you a reference to the resource.
 213 3. I decide I'm done with the resource, and deallocate it, while you still have
 214    your reference.
 215 4. You decide to use the resource.
 216
 217 Uh oh! Your reference is pointing to an invalid resource. This is called a
 218 "dangling pointer" or "use after free," when the resource is memory.
 219
 220 To fix this, we have to make sure that step four never happens after step
 221 three. The ownership system in Rust does this through a concept called
 222 "lifetimes," which describe the scope that a reference is valid for.
 223
 224 Let's look at that function which borrows an `int` again:
 225
 226 ```rust
 227 fn add_one(num: &int) -> int {
 228     *num + 1
 229 }
 230 ```
 231
 232 Rust has a feature called 'lifetime elision,' which allows you to not write
 233 lifetime annotations in certain circumstances. This is one of them. We will
 234 cover the others later. Without eliding the lifetimes, `add_one` looks like
 235 this:
 236
 237 ```rust
 238 fn add_one<'a>(num: &'a int) -> int {
 239     *num + 1
 240 }
 241 ```
 242
 243 The `'a` is called a **lifetime**. Most lifetimes are used in places where
 244 short names like `'a`, `'b` and `'c` are clearest, but it's often useful to
 245 have more descriptive names. Let's dig into the syntax in a bit more detail:
 246
 247 ```{rust,ignore}
 248 fn add_one<'a>(...)
 249 ```
 250
 251 This part _declares_ our lifetimes. This says that `add_one` has one lifetime,
 252 `'a`. If we had two, it would look like this:
 253
 254 ```{rust,ignore}
 255 fn add_two<'a, 'b>(...)
 256 ```
 257
 258 Then in our parameter list, we use the lifetimes we've named:
 259
 260 ```{rust,ignore}
 261 ...(num: &'a int) -> ...
 262 ```
 263
 264 If you compare `&int` to `&'a int`, they're the same, it's just that the
 265 lifetime `'a` has snuck in between the `&` and the `int`. We read `&int` as "a
 266 reference to an int" and `&'a int` as "a reference to an int with the lifetime 'a.'"
 267
 268 Why do lifetimes matter? Well, for example, here's some code:
 269
 270 ```rust
 271 struct Foo<'a> {
 272     x: &'a int,
 273 }
 274
 275 fn main() {
 276     let y = &5i; // this is the same as `let _y = 5; let y = &_y;
 277     let f = Foo { x: y };
 278
 279     println!("{}", f.x);
 280 }
 281 ```
 282
 283 As you can see, `struct`s can also have lifetimes. In a similar way to functions,
 284
 285 ```{rust}
 286 struct Foo<'a> {
 287 # x: &'a int,
 288 # }
 289 ```
 290
 291 declares a lifetime, and
 292
 293 ```rust
 294 # struct Foo<'a> {
 295 x: &'a int,
 296 # }
 297 ```
 298
 299 uses it. So why do we need a lifetime here? We need to ensure that any reference
 300 to a `Foo` cannot outlive the reference to an `int` it contains.
 301
 302 ## Thinking in scopes
 303
 304 A way to think about lifetimes is to visualize the scope that a reference is
 305 valid for. For example:
 306
 307 ```rust
 308 fn main() {
 309     let y = &5i;    // -+ y goes into scope
 310                     //  |
 311     // stuff        //  |
 312                     //  |
 313 }                   // -+ y goes out of scope
 314 ```
 315
 316 Adding in our `Foo`:
 317
 318 ```rust
 319 struct Foo<'a> {
 320     x: &'a int,
 321 }
 322
 323 fn main() {
 324     let y = &5i;          // -+ y goes into scope
 325     let f = Foo { x: y }; // -+ f goes into scope
 326     // stuff              //  |
 327                           //  |
 328 }                         // -+ f and y go out of scope
 329 ```
 330
 331 Our `f` lives within the scope of `y`, so everything works. What if it didn't?
 332 This code won't work:
 333
 334 ```{rust,ignore}
 335 struct Foo<'a> {
 336     x: &'a int,
 337 }
 338
 339 fn main() {
 340     let x;                    // -+ x goes into scope
 341                               //  |
 342     {                         //  |
 343         let y = &5i;          // ---+ y goes into scope
 344         let f = Foo { x: y }; // ---+ f goes into scope
 345         x = &f.x;             //  | | error here
 346     }                         // ---+ f and y go out of scope
 347                               //  |
 348     println!("{}", x);        //  |
 349 }                             // -+ x goes out of scope
 350 ```
 351
 352 Whew! As you can see here, the scopes of `f` and `y` are smaller than the scope
 353 of `x`. But when we do `x = &f.x`, we make `x` a reference to something that's
 354 about to go out of scope.
 355
 356 Named lifetimes are a way of giving these scopes a name. Giving something a
 357 name is the first step towards being able to talk about it.
 358
 359 ## 'static
 360
 361 The lifetime named 'static' is a special lifetime. It signals that something
 362 has the lifetime of the entire program. Most Rust programmers first come across
 363 `'static` when dealing with strings:
 364
 365 ```rust
 366 let x: &'static str = "Hello, world.";
 367 ```
 368
 369 String literals have the type `&'static str` because the reference is always
 370 alive: they are baked into the data segment of the final binary. Another
 371 example are globals:
 372
 373 ```rust
 374 static FOO: int = 5i;
 375 let x: &'static int = &FOO;
 376 ```
 377
 378 This adds an `int` to the data segment of the binary, and FOO is a reference to
 379 it.
 380
 381 # Shared Ownership
 382
 383 In all the examples we've considered so far, we've assumed that each handle has
 384 a singular owner. But sometimes, this doesn't work. Consider a car. Cars have
 385 four wheels. We would want a wheel to know which car it was attached to. But
 386 this won't work:
 387
 388 ```{rust,ignore}
 389 struct Car {
 390     name: String,
 391 }
 392
 393 struct Wheel {
 394     size: int,
 395     owner: Car,
 396 }
 397
 398 fn main() {
 399     let car = Car { name: "DeLorean".to_string() };
 400
 401     for _ in range(0u, 4) {
 402         Wheel { size: 360, owner: car };
 403     }
 404 }
 405 ```
 406
 407 We try to make four `Wheel`s, each with a `Car` that it's attached to. But the
 408 compiler knows that on the second iteration of the loop, there's a problem:
 409
 410 ```text
 411 error: use of moved value: `car`
 412     Wheel { size: 360, owner: car };
 413                               ^~~
 414 note: `car` moved here because it has type `Car`, which is non-copyable
 415     Wheel { size: 360, owner: car };
 416                               ^~~
 417 ```
 418
 419 We need our `Car` to be pointed to by multiple `Wheel`s. We can't do that with
 420 `Box<T>`, because it has a single owner. We can do it with `Rc<T>` instead:
 421
 422 ```rust
 423 use std::rc::Rc;
 424
 425 struct Car {
 426     name: String,
 427 }
 428
 429 struct Wheel {
 430     size: int,
 431     owner: Rc<Car>,
 432 }
 433
 434 fn main() {
 435     let car = Car { name: "DeLorean".to_string() };
 436
 437     let car_owner = Rc::new(car);
 438
 439     for _ in range(0u, 4) {
 440         Wheel { size: 360, owner: car_owner.clone() };
 441     }
 442 }
 443 ```
 444
 445 We wrap our `Car` in an `Rc<T>`, getting an `Rc<Car>`, and then use the
 446 `clone()` method to make new references. We've also changed our `Wheel` to have
 447 an `Rc<Car>` rather than just a `Car`.
 448
 449 This is the simplest kind of multiple ownership possible. For example, there's
 450 also `Arc<T>`, which uses more expensive atomic instructions to be the
 451 thread-safe counterpart of `Rc<T>`.
 452
 453 ## Lifetime Elision
 454
 455 Earlier, we mentioned 'lifetime elision,' a feature of Rust which allows you to
 456 not write lifetime annotations in certain circumstances. All references have a
 457 lifetime, and so if you elide a lifetime (like `&T` instead of `&'a T`), Rust
 458 will do three things to determine what those lifetimes should be.
 459
 460 When talking about lifetime elision, we use the term 'input lifetime' and
 461 'output lifetime'. An 'input lifetime' is a lifetime associated with a parameter
 462 of a function, and an 'output lifetime' is a lifetime associated with the return
 463 value of a function. For example, this function has an input lifetime:
 464
 465 ```{rust,ignore}
 466 fn foo<'a>(bar: &'a str)
 467 ```
 468
 469 This one has an output lifetime:
 470
 471 ```{rust,ignore}
 472 fn foo<'a>() -> &'a str
 473 ```
 474
 475 This one has a lifetime in both positions:
 476
 477 ```{rust,ignore}
 478 fn foo<'a>(bar: &'a str) -> &'a str
 479 ```
 480
 481 Here are the three rules:
 482
 483 * Each elided lifetime in a function's arguments becomes a distinct lifetime
 484   parameter.
 485
 486 * If there is exactly one input lifetime, elided or not, that lifetime is
 487   assigned to all elided lifetimes in the return values of that function..
 488
 489 * If there are multiple input lifetimes, but one of them is `&self` or `&mut
 490   self`, the lifetime of `self` is assigned to all elided output lifetimes.
 491
 492 Otherwise, it is an error to elide an output lifetime.
 493
 494 ### Examples
 495
 496 Here are some examples of functions with elided lifetimes, and the version of
 497 what the elided lifetimes are expand to:
 498
 499 ```{rust,ignore}
 500 fn print(s: &str);                                      // elided
 501 fn print<'a>(s: &'a str);                               // expanded
 502
 503 fn debug(lvl: uint, s: &str);                           // elided
 504 fn debug<'a>(lvl: uint, s: &'a str);                    // expanded
 505
 506 // In the preceeding example, `lvl` doesn't need a lifetime because it's not a
 507 // reference (`&`). Only things relating to references (such as a `struct`
 508 // which contains a reference) need lifetimes.
 509
 510 fn substr(s: &str, until: uint) -> &str;                // elided
 511 fn substr<'a>(s: &'a str, until: uint) -> &'a str;      // expanded
 512
 513 fn get_str() -> &str;                                   // ILLEGAL, no inputs
 514
 515 fn frob(s: &str, t: &str) -> &str;                      // ILLEGAL, two inputs
 516
 517 fn get_mut(&mut self) -> &mut T;                        // elided
 518 fn get_mut<'a>(&'a mut self) -> &'a mut T;              // expanded
 519
 520 fn args<T:ToCStr>(&mut self, args: &[T]) -> &mut Command                  // elided
 521 fn args<'a, 'b, T:ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command // expanded
 522
 523 fn new(buf: &mut [u8]) -> BufWriter;                    // elided
 524 fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a>          // expanded
 525 ```
 526
 527 # Related Resources
 528
 529 Coming Soon.