src/doc/guide-lifetimes.md

   1 % The Rust References and Lifetimes Guide
   2
   3 # Introduction
   4
   5 References are one of the more flexible and powerful tools available in
   6 Rust. They can point anywhere: into the heap, stack, and even into the
   7 interior of another data structure. A reference is as flexible as a C pointer
   8 or C++ reference.
   9
  10 Unlike C and C++ compilers, the Rust compiler includes special static
  11 checks that ensure that programs use references safely.
  12
  13 Despite their complete safety, a reference's representation at runtime
  14 is the same as that of an ordinary pointer in a C program. They introduce zero
  15 overhead. The compiler does all safety checks at compile time.
  16
  17 Although references have rather elaborate theoretical underpinnings
  18 (e.g. region pointers), the core concepts will be familiar to anyone
  19 who has worked with C or C++. The best way to explain how they are
  20 used—and their limitations—is probably just to work through several examples.
  21
  22 # By example
  23
  24 References, sometimes known as *borrowed pointers*, are only valid for
  25 a limited duration. References never claim any kind of ownership
  26 over the data that they point to. Instead, they are used for cases
  27 where you would like to use data for a short time.
  28
  29 Consider a simple struct type `Point`:
  30
  31 ~~~
  32 struct Point {x: f64, y: f64}
  33 ~~~
  34
  35 We can use this simple definition to allocate points in many different ways. For
  36 example, in this code, each of these local variables contains a point,
  37 but allocated in a different place:
  38
  39 ~~~
  40 # struct Point {x: f64, y: f64}
  41 let on_the_stack : Point      =     Point {x: 3.0, y: 4.0};
  42 let on_the_heap  : Box<Point> = box Point {x: 7.0, y: 9.0};
  43 ~~~
  44
  45 Suppose we wanted to write a procedure that computed the distance between any
  46 two points, no matter where they were stored. One option is to define a function
  47 that takes two arguments of type `Point`—that is, it takes the points by value.
  48 But if we define it this way, calling the function will cause the points to be
  49 copied. For points, this is probably not so bad, but often copies are
  50 expensive. So we'd like to define a function that takes the points just as
  51 a reference.
  52
  53 ~~~
  54 # struct Point {x: f64, y: f64}
  55 # fn sqrt(f: f64) -> f64 { 0.0 }
  56 fn compute_distance(p1: &Point, p2: &Point) -> f64 {
  57     let x_d = p1.x - p2.x;
  58     let y_d = p1.y - p2.y;
  59     sqrt(x_d * x_d + y_d * y_d)
  60 }
  61 ~~~
  62
  63 Now we can call `compute_distance()`:
  64
  65 ~~~
  66 # struct Point {x: f64, y: f64}
  67 # let on_the_stack :     Point  =     Point{x: 3.0, y: 4.0};
  68 # let on_the_heap  : Box<Point> = box Point{x: 7.0, y: 9.0};
  69 # fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 }
  70 compute_distance(&on_the_stack, on_the_heap);
  71 ~~~
  72
  73 Here, the `&` operator takes the address of the variable
  74 `on_the_stack`; this is because `on_the_stack` has the type `Point`
  75 (that is, a struct value) and we have to take its address to get a
  76 value. We also call this _borrowing_ the local variable
  77 `on_the_stack`, because we have created an alias: that is, another
  78 name for the same data.
  79
  80 In the case of `on_the_heap`, however, no explicit action is necessary.
  81 The compiler will automatically convert a box box point to a reference like &point.
  82 This is another form of borrowing; in this case, the contents of the owned box
  83 are being lent out.
  84
  85 Whenever a caller lends data to a callee, there are some limitations on what
  86 the caller can do with the original. For example, if the contents of a
  87 variable have been lent out, you cannot send that variable to another task. In
  88 addition, the compiler will reject any code that might cause the borrowed
  89 value to be freed or overwrite its component fields with values of different
  90 types (I'll get into what kinds of actions those are shortly). This rule
  91 should make intuitive sense: you must wait for a borrower to return the value
  92 that you lent it (that is, wait for the reference to go out of scope)
  93 before you can make full use of it again.
  94
  95 # Other uses for the & operator
  96
  97 In the previous example, the value `on_the_stack` was defined like so:
  98
  99 ~~~
 100 # struct Point {x: f64, y: f64}
 101 let on_the_stack: Point = Point {x: 3.0, y: 4.0};
 102 ~~~
 103
 104 This declaration means that code can only pass `Point` by value to other
 105 functions. As a consequence, we had to explicitly take the address of
 106 `on_the_stack` to get a reference. Sometimes however it is more
 107 convenient to move the & operator into the definition of `on_the_stack`:
 108
 109 ~~~
 110 # struct Point {x: f64, y: f64}
 111 let on_the_stack2: &Point = &Point {x: 3.0, y: 4.0};
 112 ~~~
 113
 114 Applying `&` to an rvalue (non-assignable location) is just a convenient
 115 shorthand for creating a temporary and taking its address. A more verbose
 116 way to write the same code is:
 117
 118 ~~~
 119 # struct Point {x: f64, y: f64}
 120 let tmp = Point {x: 3.0, y: 4.0};
 121 let on_the_stack2 : &Point = &tmp;
 122 ~~~
 123
 124 # Taking the address of fields
 125
 126 The `&` operator is not limited to taking the address of
 127 local variables. It can also take the address of fields or
 128 individual array elements. For example, consider this type definition
 129 for `Rectangle`:
 130
 131 ~~~
 132 struct Point {x: f64, y: f64} // as before
 133 struct Size {w: f64, h: f64} // as before
 134 struct Rectangle {origin: Point, size: Size}
 135 ~~~
 136
 137 Now, as before, we can define rectangles in a few different ways:
 138
 139 ~~~
 140 # struct Point {x: f64, y: f64}
 141 # struct Size {w: f64, h: f64} // as before
 142 # struct Rectangle {origin: Point, size: Size}
 143 let rect_stack   =    &Rectangle {origin: Point {x: 1.0, y: 2.0},
 144                                   size: Size {w: 3.0, h: 4.0}};
 145 let rect_heap    = box Rectangle {origin: Point {x: 5.0, y: 6.0},
 146                                   size: Size {w: 3.0, h: 4.0}};
 147 ~~~
 148
 149 In each case, we can extract out individual subcomponents with the `&`
 150 operator. For example, I could write:
 151
 152 ~~~
 153 # struct Point {x: f64, y: f64} // as before
 154 # struct Size {w: f64, h: f64} // as before
 155 # struct Rectangle {origin: Point, size: Size}
 156 # let rect_stack  = &Rectangle {origin: Point {x: 1.0, y: 2.0}, size: Size {w: 3.0, h: 4.0}};
 157 # let rect_heap   = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}};
 158 # fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 }
 159 compute_distance(&rect_stack.origin, &rect_heap.origin);
 160 ~~~
 161
 162 which would borrow the field `origin` from the rectangle on the stack
 163 as well as from the owned box, and then compute the distance between them.
 164
 165 # Lifetimes
 166
 167 We’ve seen a few examples of borrowing data. To this point, we’ve glossed
 168 over issues of safety. As stated in the introduction, at runtime a reference
 169 is simply a pointer, nothing more. Therefore, avoiding C's problems with
 170 dangling pointers requires a compile-time safety check.
 171
 172 The basis for the check is the notion of _lifetimes_. A lifetime is a
 173 static approximation of the span of execution during which the pointer
 174 is valid: it always corresponds to some expression or block within the
 175 program.
 176
 177 The compiler will only allow a borrow *if it can guarantee that the data will
 178 not be reassigned or moved for the lifetime of the pointer*. This does not
 179 necessarily mean that the data is stored in immutable memory. For example,
 180 the following function is legal:
 181
 182 ~~~
 183 # fn some_condition() -> bool { true }
 184 # struct Foo { f: int }
 185 fn example3() -> int {
 186     let mut x = box Foo {f: 3};
 187     if some_condition() {
 188         let y = &x.f;      // -+ L
 189         return *y;         //  |
 190     }                      // -+
 191     x = box Foo {f: 4};
 192     // ...
 193 # return 0;
 194 }
 195 ~~~
 196
 197 Here, the interior of the variable `x` is being borrowed
 198 and `x` is declared as mutable. However, the compiler can prove that
 199 `x` is not assigned anywhere in the lifetime L of the variable
 200 `y`. Therefore, it accepts the function, even though `x` is mutable
 201 and in fact is mutated later in the function.
 202
 203 It may not be clear why we are so concerned about mutating a borrowed
 204 variable. The reason is that the runtime system frees any box
 205 _as soon as its owning reference changes or goes out of
 206 scope_. Therefore, a program like this is illegal (and would be
 207 rejected by the compiler):
 208
 209 ~~~ {.ignore}
 210 fn example3() -> int {
 211     let mut x = box X {f: 3};
 212     let y = &x.f;
 213     x = box X {f: 4};  // Error reported here.
 214     *y
 215 }
 216 ~~~
 217
 218 To make this clearer, consider this diagram showing the state of
 219 memory immediately before the re-assignment of `x`:
 220
 221 ~~~ {.text}
 222     Stack               Exchange Heap
 223
 224   x +-------------+
 225     | box {f:int} | ----+
 226   y +-------------+     |
 227     | &int        | ----+
 228     +-------------+     |    +---------+
 229                         +--> |  f: 3   |
 230                              +---------+
 231 ~~~
 232
 233 Once the reassignment occurs, the memory will look like this:
 234
 235 ~~~ {.text}
 236     Stack               Exchange Heap
 237
 238   x +-------------+          +---------+
 239     | box {f:int} | -------> |  f: 4   |
 240   y +-------------+          +---------+
 241     | &int        | ----+
 242     +-------------+     |    +---------+
 243                         +--> | (freed) |
 244                              +---------+
 245 ~~~
 246
 247 Here you can see that the variable `y` still points at the old `f`
 248 property of Foo, which has been freed.
 249
 250 In fact, the compiler can apply the same kind of reasoning to any
 251 memory that is (uniquely) owned by the stack frame. So we could
 252 modify the previous example to introduce additional owned pointers
 253 and structs, and the compiler will still be able to detect possible
 254 mutations. This time, we'll use an analogy to illustrate the concept.
 255
 256 ~~~ {.ignore}
 257 fn example3() -> int {
 258     struct House { owner: Box<Person> }
 259     struct Person { age: int }
 260
 261     let mut house = box House {
 262         owner: box Person {age: 30}
 263     };
 264
 265     let owner_age = &house.owner.age;
 266     house = box House {owner: box Person {age: 40}};  // Error reported here.
 267     house.owner = box Person {age: 50};               // Error reported here.
 268     *owner_age
 269 }
 270 ~~~
 271
 272 In this case, two errors are reported, one when the variable `house` is
 273 modified and another when `house.owner` is modified. Either modification would
 274 invalidate the pointer `owner_age`.
 275
 276 # Borrowing and enums
 277
 278 The previous example showed that the type system forbids any mutations
 279 of owned boxed values while they are being borrowed. In general, the type
 280 system also forbids borrowing a value as mutable if it is already being
 281 borrowed - either as a mutable reference or an immutable one. This restriction
 282 prevents pointers from pointing into freed memory. There is one other
 283 case where the compiler must be very careful to ensure that pointers
 284 remain valid: pointers into the interior of an `enum`.
 285
 286 Let’s look at the following `shape` type that can represent both rectangles
 287 and circles:
 288
 289 ~~~
 290 struct Point {x: f64, y: f64}; // as before
 291 struct Size {w: f64, h: f64}; // as before
 292 enum Shape {
 293     Circle(Point, f64),   // origin, radius
 294     Rectangle(Point, Size)  // upper-left, dimensions
 295 }
 296 ~~~
 297
 298 Now we might write a function to compute the area of a shape. This
 299 function takes a reference to a shape, to avoid the need for
 300 copying.
 301
 302 ~~~
 303 # struct Point {x: f64, y: f64}; // as before
 304 # struct Size {w: f64, h: f64}; // as before
 305 # enum Shape {
 306 #     Circle(Point, f64),   // origin, radius
 307 #     Rectangle(Point, Size)  // upper-left, dimensions
 308 # }
 309 # static tau: f64 = 6.28;
 310 fn compute_area(shape: &Shape) -> f64 {
 311     match *shape {
 312         Circle(_, radius) => 0.5 * tau * radius * radius,
 313         Rectangle(_, ref size) => size.w * size.h
 314     }
 315 }
 316 ~~~
 317
 318 The first case matches against circles. Here, the pattern extracts the
 319 radius from the shape variant and the action uses it to compute the
 320 area of the circle. (Like any up-to-date engineer, we use the [tau
 321 circle constant][tau] and not that dreadfully outdated notion of pi).
 322
 323 [tau]: http://www.math.utah.edu/~palais/pi.html
 324
 325 The second match is more interesting. Here we match against a
 326 rectangle and extract its size: but rather than copy the `size`
 327 struct, we use a by-reference binding to create a pointer to it. In
 328 other words, a pattern binding like `ref size` binds the name `size`
 329 to a pointer of type `&size` into the _interior of the enum_.
 330
 331 To make this more clear, let's look at a diagram of memory layout in
 332 the case where `shape` points at a rectangle:
 333
 334 ~~~ {.text}
 335 Stack             Memory
 336
 337 +-------+         +---------------+
 338 | shape | ------> | rectangle(    |
 339 +-------+         |   {x: f64,    |
 340 | size  | -+      |    y: f64},   |
 341 +-------+  +----> |   {w: f64,    |
 342                   |    h: f64})   |
 343                   +---------------+
 344 ~~~
 345
 346 Here you can see that rectangular shapes are composed of five words of
 347 memory. The first is a tag indicating which variant this enum is
 348 (`rectangle`, in this case). The next two words are the `x` and `y`
 349 fields for the point and the remaining two are the `w` and `h` fields
 350 for the size. The binding `size` is then a pointer into the inside of
 351 the shape.
 352
 353 Perhaps you can see where the danger lies: if the shape were somehow
 354 to be reassigned, perhaps to a circle, then although the memory used
 355 to store that shape value would still be valid, _it would have a
 356 different type_! The following diagram shows what memory would look
 357 like if code overwrote `shape` with a circle:
 358
 359 ~~~ {.text}
 360 Stack             Memory
 361
 362 +-------+         +---------------+
 363 | shape | ------> | circle(       |
 364 +-------+         |   {x: f64,    |
 365 | size  | -+      |    y: f64},   |
 366 +-------+  +----> |   f64)        |
 367                   |               |
 368                   +---------------+
 369 ~~~
 370
 371 As you can see, the `size` pointer would be pointing at a `f64`
 372 instead of a struct. This is not good: dereferencing the second field
 373 of a `f64` as if it were a struct with two fields would be a memory
 374 safety violation.
 375
 376 So, in fact, for every `ref` binding, the compiler will impose the
 377 same rules as the ones we saw for borrowing the interior of an owned
 378 box: it must be able to guarantee that the `enum` will not be
 379 overwritten for the duration of the borrow.  In fact, the compiler
 380 would accept the example we gave earlier. The example is safe because
 381 the shape pointer has type `&Shape`, which means "reference to
 382 immutable memory containing a `shape`". If, however, the type of that
 383 pointer were `&mut Shape`, then the ref binding would be ill-typed.
 384 Just as with owned boxes, the compiler will permit `ref` bindings
 385 into data owned by the stack frame even if the data are mutable,
 386 but otherwise it requires that the data reside in immutable memory.
 387
 388 # Returning references
 389
 390 So far, all of the examples we have looked at, use references in a
 391 “downward” direction. That is, a method or code block creates a
 392 reference, then uses it within the same scope. It is also
 393 possible to return references as the result of a function, but
 394 as we'll see, doing so requires some explicit annotation.
 395
 396 We could write a subroutine like this:
 397
 398 ~~~
 399 struct Point {x: f64, y: f64}
 400 fn get_x<'r>(p: &'r Point) -> &'r f64 { &p.x }
 401 ~~~
 402
 403 Here, the function `get_x()` returns a pointer into the structure it
 404 was given. The type of the parameter (`&'r Point`) and return type
 405 (`&'r f64`) both use a new syntactic form that we have not seen so
 406 far.  Here the identifier `r` names the lifetime of the pointer
 407 explicitly. So in effect, this function declares that it takes a
 408 pointer with lifetime `r` and returns a pointer with that same
 409 lifetime.
 410
 411 In general, it is only possible to return references if they
 412 are derived from a parameter to the procedure. In that case, the
 413 pointer result will always have the same lifetime as one of the
 414 parameters; named lifetimes indicate which parameter that
 415 is.
 416
 417 In the previous code samples, function parameter types did not include a
 418 lifetime name. The compiler simply creates a fresh name for the lifetime
 419 automatically: that is, the lifetime name is guaranteed to refer to a distinct
 420 lifetime from the lifetimes of all other parameters.
 421
 422 Named lifetimes that appear in function signatures are conceptually
 423 the same as the other lifetimes we have seen before, but they are a bit
 424 abstract: they don’t refer to a specific expression within `get_x()`,
 425 but rather to some expression within the *caller of `get_x()`*.  The
 426 lifetime `r` is actually a kind of *lifetime parameter*: it is defined
 427 by the caller to `get_x()`, just as the value for the parameter `p` is
 428 defined by that caller.
 429
 430 In any case, whatever the lifetime of `r` is, the pointer produced by
 431 `&p.x` always has the same lifetime as `p` itself: a pointer to a
 432 field of a struct is valid as long as the struct is valid. Therefore,
 433 the compiler accepts the function `get_x()`.
 434
 435 To emphasize this point, let’s look at a variation on the example, this
 436 time one that does not compile:
 437
 438 ~~~ {.ignore}
 439 struct Point {x: f64, y: f64}
 440 fn get_x_sh(p: &Point) -> &f64 {
 441     &p.x // Error reported here
 442 }
 443 ~~~
 444
 445 Here, the function `get_x_sh()` takes a reference as input and
 446 returns a reference. As before, the lifetime of the reference
 447 that will be returned is a parameter (specified by the
 448 caller). That means that `get_x_sh()` promises to return a reference
 449 that is valid for as long as the caller would like: this is
 450 subtly different from the first example, which promised to return a
 451 pointer that was valid for as long as its pointer argument was valid.
 452
 453 Within `get_x_sh()`, we see the expression `&p.x` which takes the
 454 address of a field of a Point. The presence of this expression
 455 implies that the compiler must guarantee that , so long as the
 456 resulting pointer is valid, the original Point won't be moved or changed.
 457
 458 But recall that `get_x_sh()` also promised to
 459 return a pointer that was valid for as long as the caller wanted it to
 460 be. Clearly, `get_x_sh()` is not in a position to make both of these
 461 guarantees; in fact, it cannot guarantee that the pointer will remain
 462 valid at all once it returns, as the parameter `p` may or may not be
 463 live in the caller. Therefore, the compiler will report an error here.
 464
 465 In general, if you borrow a struct or box to create a
 466 reference, it will only be valid within the function
 467 and cannot be returned. This is why the typical way to return references
 468 is to take references as input (the only other case in
 469 which it can be legal to return a reference is if it
 470 points at a static constant).
 471
 472 # Named lifetimes
 473
 474 Lifetimes can be named and referenced. For example, the special lifetime
 475 `'static`, which does not go out of scope, can be used to create global
 476 variables and communicate between tasks (see the manual for use cases).
 477
 478 ## Parameter Lifetimes
 479
 480 Named lifetimes allow for grouping of parameters by lifetime.
 481 For example, consider this function:
 482
 483 ~~~
 484 # struct Point {x: f64, y: f64}; // as before
 485 # struct Size {w: f64, h: f64}; // as before
 486 # enum Shape {
 487 #     Circle(Point, f64),   // origin, radius
 488 #     Rectangle(Point, Size)  // upper-left, dimensions
 489 # }
 490 # fn compute_area(shape: &Shape) -> f64 { 0.0 }
 491 fn select<'r, T>(shape: &'r Shape, threshold: f64,
 492                  a: &'r T, b: &'r T) -> &'r T {
 493     if compute_area(shape) > threshold {a} else {b}
 494 }
 495 ~~~
 496
 497 This function takes three references and assigns each the same
 498 lifetime `r`.  In practice, this means that, in the caller, the
 499 lifetime `r` will be the *intersection of the lifetime of the three
 500 region parameters*. This may be overly conservative, as in this
 501 example:
 502
 503 ~~~
 504 # struct Point {x: f64, y: f64}; // as before
 505 # struct Size {w: f64, h: f64}; // as before
 506 # enum Shape {
 507 #     Circle(Point, f64),   // origin, radius
 508 #     Rectangle(Point, Size)  // upper-left, dimensions
 509 # }
 510 # fn compute_area(shape: &Shape) -> f64 { 0.0 }
 511 # fn select<'r, T>(shape: &Shape, threshold: f64,
 512 #                  a: &'r T, b: &'r T) -> &'r T {
 513 #     if compute_area(shape) > threshold {a} else {b}
 514 # }
 515                                                      // -+ r
 516 fn select_based_on_unit_circle<'r, T>(               //  |-+ B
 517     threshold: f64, a: &'r T, b: &'r T) -> &'r T {   //  | |
 518                                                      //  | |
 519     let shape = Circle(Point {x: 0., y: 0.}, 1.);    //  | |
 520     select(&shape, threshold, a, b)                  //  | |
 521 }                                                    //  |-+
 522                                                      // -+
 523 ~~~
 524
 525 In this call to `select()`, the lifetime of the first parameter shape
 526 is B, the function body. Both of the second two parameters `a` and `b`
 527 share the same lifetime, `r`, which is a lifetime parameter of
 528 `select_based_on_unit_circle()`. The caller will infer the
 529 intersection of these two lifetimes as the lifetime of the returned
 530 value, and hence the return value of `select()` will be assigned a
 531 lifetime of B. This will in turn lead to a compilation error, because
 532 `select_based_on_unit_circle()` is supposed to return a value with the
 533 lifetime `r`.
 534
 535 To address this, we can modify the definition of `select()` to
 536 distinguish the lifetime of the first parameter from the lifetime of
 537 the latter two. After all, the first parameter is not being
 538 returned. Here is how the new `select()` might look:
 539
 540 ~~~
 541 # struct Point {x: f64, y: f64}; // as before
 542 # struct Size {w: f64, h: f64}; // as before
 543 # enum Shape {
 544 #     Circle(Point, f64),   // origin, radius
 545 #     Rectangle(Point, Size)  // upper-left, dimensions
 546 # }
 547 # fn compute_area(shape: &Shape) -> f64 { 0.0 }
 548 fn select<'r, 'tmp, T>(shape: &'tmp Shape, threshold: f64,
 549                        a: &'r T, b: &'r T) -> &'r T {
 550     if compute_area(shape) > threshold {a} else {b}
 551 }
 552 ~~~
 553
 554 Here you can see that `shape`'s lifetime is now named `tmp`. The
 555 parameters `a`, `b`, and the return value all have the lifetime `r`.
 556 However, since the lifetime `tmp` is not returned, it would be more
 557 concise to just omit the named lifetime for `shape` altogether:
 558
 559 ~~~
 560 # struct Point {x: f64, y: f64}; // as before
 561 # struct Size {w: f64, h: f64}; // as before
 562 # enum Shape {
 563 #     Circle(Point, f64),   // origin, radius
 564 #     Rectangle(Point, Size)  // upper-left, dimensions
 565 # }
 566 # fn compute_area(shape: &Shape) -> f64 { 0.0 }
 567 fn select<'r, T>(shape: &Shape, threshold: f64,
 568                  a: &'r T, b: &'r T) -> &'r T {
 569     if compute_area(shape) > threshold {a} else {b}
 570 }
 571 ~~~
 572
 573 This is equivalent to the previous definition.
 574
 575 ## Labeled Control Structures
 576
 577 Named lifetime notation can also be used to control the flow of execution:
 578
 579 ~~~
 580 'h: for i in range(0,10) {
 581     'g: loop {
 582         if i % 2 == 0 { continue 'h; }
 583         if i == 9 { break 'h; }
 584         break 'g;
 585     }
 586 }
 587 ~~~
 588
 589 > *Note:* Labelled breaks are not currently supported within `while` loops.
 590
 591 Named labels are hygienic and can be used safely within macros.
 592 See the macros guide section on hygiene for more details.
 593
 594 # Conclusion
 595
 596 So there you have it: a (relatively) brief tour of the lifetime
 597 system. For more details, we refer to the (yet to be written) reference
 598 document on references, which will explain the full notation
 599 and give more examples.