src/doc/guide-lifetimes.md

   1 % The Rust References and Lifetimes Guide
   2
   3 # Introduction
   4
   5 References are one of the more flexible and powerful tools available in
   6 Rust. They can point anywhere: into the heap, stack, and even into the
   7 interior of another data structure. A reference is as flexible as a C pointer
   8 or C++ reference.
   9
  10 Unlike C and C++ compilers, the Rust compiler includes special static
  11 checks that ensure that programs use references safely.
  12
  13 Despite their complete safety, a reference's representation at runtime
  14 is the same as that of an ordinary pointer in a C program. They introduce zero
  15 overhead. The compiler does all safety checks at compile time.
  16
  17 Although references have rather elaborate theoretical underpinnings usually
  18 introduced as (e.g. region pointers), the core concepts will be familiar to
  19 anyone who has worked with C or C++. The best way to explain how they are
  20 used—and their limitations—is probably just to work through several examples.
  21
  22 # By example
  23
  24 References, sometimes known as *borrowed pointers*, are only valid for
  25 a limited duration. References never claim any kind of ownership
  26 over the data that they point to. Instead, they are used for cases
  27 where you would like to use data for a short time.
  28
  29 Consider a simple struct type `Point`:
  30
  31 ~~~
  32 struct Point {x: f64, y: f64}
  33 ~~~
  34
  35 We can use this simple definition to allocate points in many different ways. For
  36 example, in this code, each of these local variables contains a point,
  37 but allocated in a different place:
  38
  39 ~~~
  40 # struct Point {x: f64, y: f64}
  41 let on_the_stack : Point      =     Point {x: 3.0, y: 4.0};
  42 let on_the_heap  : Box<Point> = box Point {x: 7.0, y: 9.0};
  43 ~~~
  44
  45 Suppose we wanted to write a procedure that computed the distance between any
  46 two points, no matter where they were stored. One option is to define a function
  47 that takes two arguments of type `Point`—that is, it takes the points by value.
  48 But if we define it this way, calling the function will cause the points to be
  49 copied. For points, this is probably not so bad, but often copies are
  50 expensive. So we'd like to define a function that takes the points just as
  51 a reference.
  52
  53 ~~~
  54 # struct Point {x: f64, y: f64}
  55 # fn sqrt(f: f64) -> f64 { 0.0 }
  56 fn compute_distance(p1: &Point, p2: &Point) -> f64 {
  57     let x_d = p1.x - p2.x;
  58     let y_d = p1.y - p2.y;
  59     sqrt(x_d * x_d + y_d * y_d)
  60 }
  61 ~~~
  62
  63 Now we can call `compute_distance()`:
  64
  65 ~~~
  66 # struct Point {x: f64, y: f64}
  67 # let on_the_stack :     Point  =     Point{x: 3.0, y: 4.0};
  68 # let on_the_heap  : Box<Point> = box Point{x: 7.0, y: 9.0};
  69 # fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 }
  70 compute_distance(&on_the_stack, &*on_the_heap);
  71 ~~~
  72
  73 Here, the `&` operator takes the address of the variable
  74 `on_the_stack`; this is because `on_the_stack` has the type `Point`
  75 (that is, a struct value) and we have to take its address to get a
  76 value. We also call this _borrowing_ the local variable
  77 `on_the_stack`, because we have created an alias: that is, another
  78 name for the same data.
  79
  80 For the second argument, we need to extract the contents of `on_the_heap`
  81 by derefercing with the `*` symbol. Now that we have the data, we need
  82 to create a reference with the `&` symbol.
  83
  84 Whenever a caller lends data to a callee, there are some limitations on what
  85 the caller can do with the original. For example, if the contents of a
  86 variable have been lent out, you cannot send that variable to another task. In
  87 addition, the compiler will reject any code that might cause the borrowed
  88 value to be freed or overwrite its component fields with values of different
  89 types (I'll get into what kinds of actions those are shortly). This rule
  90 should make intuitive sense: you must wait for a borrower to return the value
  91 that you lent it (that is, wait for the reference to go out of scope)
  92 before you can make full use of it again.
  93
  94 # Other uses for the & operator
  95
  96 In the previous example, the value `on_the_stack` was defined like so:
  97
  98 ~~~
  99 # struct Point {x: f64, y: f64}
 100 let on_the_stack: Point = Point {x: 3.0, y: 4.0};
 101 ~~~
 102
 103 This declaration means that code can only pass `Point` by value to other
 104 functions. As a consequence, we had to explicitly take the address of
 105 `on_the_stack` to get a reference. Sometimes however it is more
 106 convenient to move the & operator into the definition of `on_the_stack`:
 107
 108 ~~~
 109 # struct Point {x: f64, y: f64}
 110 let on_the_stack2: &Point = &Point {x: 3.0, y: 4.0};
 111 ~~~
 112
 113 Applying `&` to an rvalue (non-assignable location) is just a convenient
 114 shorthand for creating a temporary and taking its address. A more verbose
 115 way to write the same code is:
 116
 117 ~~~
 118 # struct Point {x: f64, y: f64}
 119 let tmp = Point {x: 3.0, y: 4.0};
 120 let on_the_stack2 : &Point = &tmp;
 121 ~~~
 122
 123 # Taking the address of fields
 124
 125 The `&` operator is not limited to taking the address of
 126 local variables. It can also take the address of fields or
 127 individual array elements. For example, consider this type definition
 128 for `Rectangle`:
 129
 130 ~~~
 131 struct Point {x: f64, y: f64} // as before
 132 struct Size {w: f64, h: f64} // as before
 133 struct Rectangle {origin: Point, size: Size}
 134 ~~~
 135
 136 Now, as before, we can define rectangles in a few different ways:
 137
 138 ~~~
 139 # struct Point {x: f64, y: f64}
 140 # struct Size {w: f64, h: f64} // as before
 141 # struct Rectangle {origin: Point, size: Size}
 142 let rect_stack   =    &Rectangle {origin: Point {x: 1.0, y: 2.0},
 143                                   size: Size {w: 3.0, h: 4.0}};
 144 let rect_heap    = box Rectangle {origin: Point {x: 5.0, y: 6.0},
 145                                   size: Size {w: 3.0, h: 4.0}};
 146 ~~~
 147
 148 In each case, we can extract out individual subcomponents with the `&`
 149 operator. For example, I could write:
 150
 151 ~~~
 152 # struct Point {x: f64, y: f64} // as before
 153 # struct Size {w: f64, h: f64} // as before
 154 # struct Rectangle {origin: Point, size: Size}
 155 # let rect_stack  = &Rectangle {origin: Point {x: 1.0, y: 2.0}, size: Size {w: 3.0, h: 4.0}};
 156 # let rect_heap   = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}};
 157 # fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 }
 158 compute_distance(&rect_stack.origin, &rect_heap.origin);
 159 ~~~
 160
 161 which would borrow the field `origin` from the rectangle on the stack
 162 as well as from the owned box, and then compute the distance between them.
 163
 164 # Lifetimes
 165
 166 We’ve seen a few examples of borrowing data. To this point, we’ve glossed
 167 over issues of safety. As stated in the introduction, at runtime a reference
 168 is simply a pointer, nothing more. Therefore, avoiding C's problems with
 169 dangling pointers requires a compile-time safety check.
 170
 171 The basis for the check is the notion of _lifetimes_. A lifetime is a
 172 static approximation of the span of execution during which the pointer
 173 is valid: it always corresponds to some expression or block within the
 174 program.
 175
 176 The compiler will only allow a borrow *if it can guarantee that the data will
 177 not be reassigned or moved for the lifetime of the pointer*. This does not
 178 necessarily mean that the data is stored in immutable memory. For example,
 179 the following function is legal:
 180
 181 ~~~
 182 # fn some_condition() -> bool { true }
 183 # struct Foo { f: int }
 184 fn example3() -> int {
 185     let mut x = box Foo {f: 3};
 186     if some_condition() {
 187         let y = &x.f;      // -+ L
 188         return *y;         //  |
 189     }                      // -+
 190     x = box Foo {f: 4};
 191     // ...
 192 # return 0;
 193 }
 194 ~~~
 195
 196 Here, the interior of the variable `x` is being borrowed
 197 and `x` is declared as mutable. However, the compiler can prove that
 198 `x` is not assigned anywhere in the lifetime L of the variable
 199 `y`. Therefore, it accepts the function, even though `x` is mutable
 200 and in fact is mutated later in the function.
 201
 202 It may not be clear why we are so concerned about mutating a borrowed
 203 variable. The reason is that the runtime system frees any box
 204 _as soon as its owning reference changes or goes out of
 205 scope_. Therefore, a program like this is illegal (and would be
 206 rejected by the compiler):
 207
 208 ~~~ {.ignore}
 209 fn example3() -> int {
 210     let mut x = box X {f: 3};
 211     let y = &x.f;
 212     x = box X {f: 4};  // Error reported here.
 213     *y
 214 }
 215 ~~~
 216
 217 To make this clearer, consider this diagram showing the state of
 218 memory immediately before the re-assignment of `x`:
 219
 220 ~~~ {.notrust}
 221     Stack               Exchange Heap
 222
 223   x +-------------+
 224     | box {f:int} | ----+
 225   y +-------------+     |
 226     | &int        | ----+
 227     +-------------+     |    +---------+
 228                         +--> |  f: 3   |
 229                              +---------+
 230 ~~~
 231
 232 Once the reassignment occurs, the memory will look like this:
 233
 234 ~~~ {.notrust}
 235     Stack               Exchange Heap
 236
 237   x +-------------+          +---------+
 238     | box {f:int} | -------> |  f: 4   |
 239   y +-------------+          +---------+
 240     | &int        | ----+
 241     +-------------+     |    +---------+
 242                         +--> | (freed) |
 243                              +---------+
 244 ~~~
 245
 246 Here you can see that the variable `y` still points at the old `f`
 247 property of Foo, which has been freed.
 248
 249 In fact, the compiler can apply the same kind of reasoning to any
 250 memory that is (uniquely) owned by the stack frame. So we could
 251 modify the previous example to introduce additional owned pointers
 252 and structs, and the compiler will still be able to detect possible
 253 mutations. This time, we'll use an analogy to illustrate the concept.
 254
 255 ~~~ {.ignore}
 256 fn example3() -> int {
 257     struct House { owner: Box<Person> }
 258     struct Person { age: int }
 259
 260     let mut house = box House {
 261         owner: box Person {age: 30}
 262     };
 263
 264     let owner_age = &house.owner.age;
 265     house = box House {owner: box Person {age: 40}};  // Error reported here.
 266     house.owner = box Person {age: 50};               // Error reported here.
 267     *owner_age
 268 }
 269 ~~~
 270
 271 In this case, two errors are reported, one when the variable `house` is
 272 modified and another when `house.owner` is modified. Either modification would
 273 invalidate the pointer `owner_age`.
 274
 275 # Borrowing and enums
 276
 277 The previous example showed that the type system forbids any borrowing
 278 of owned boxes found in aliasable, mutable memory. This restriction
 279 prevents pointers from pointing into freed memory. There is one other
 280 case where the compiler must be very careful to ensure that pointers
 281 remain valid: pointers into the interior of an `enum`.
 282
 283 Let’s look at the following `shape` type that can represent both rectangles
 284 and circles:
 285
 286 ~~~
 287 struct Point {x: f64, y: f64}; // as before
 288 struct Size {w: f64, h: f64}; // as before
 289 enum Shape {
 290     Circle(Point, f64),   // origin, radius
 291     Rectangle(Point, Size)  // upper-left, dimensions
 292 }
 293 ~~~
 294
 295 Now we might write a function to compute the area of a shape. This
 296 function takes a reference to a shape, to avoid the need for
 297 copying.
 298
 299 ~~~
 300 # struct Point {x: f64, y: f64}; // as before
 301 # struct Size {w: f64, h: f64}; // as before
 302 # enum Shape {
 303 #     Circle(Point, f64),   // origin, radius
 304 #     Rectangle(Point, Size)  // upper-left, dimensions
 305 # }
 306 # static tau: f64 = 6.28;
 307 fn compute_area(shape: &Shape) -> f64 {
 308     match *shape {
 309         Circle(_, radius) => 0.5 * tau * radius * radius,
 310         Rectangle(_, ref size) => size.w * size.h
 311     }
 312 }
 313 ~~~
 314
 315 The first case matches against circles. Here, the pattern extracts the
 316 radius from the shape variant and the action uses it to compute the
 317 area of the circle. (Like any up-to-date engineer, we use the [tau
 318 circle constant][tau] and not that dreadfully outdated notion of pi).
 319
 320 [tau]: http://www.math.utah.edu/~palais/pi.html
 321
 322 The second match is more interesting. Here we match against a
 323 rectangle and extract its size: but rather than copy the `size`
 324 struct, we use a by-reference binding to create a pointer to it. In
 325 other words, a pattern binding like `ref size` binds the name `size`
 326 to a pointer of type `&size` into the _interior of the enum_.
 327
 328 To make this more clear, let's look at a diagram of memory layout in
 329 the case where `shape` points at a rectangle:
 330
 331 ~~~ {.notrust}
 332 Stack             Memory
 333
 334 +-------+         +---------------+
 335 | shape | ------> | rectangle(    |
 336 +-------+         |   {x: f64,    |
 337 | size  | -+      |    y: f64},   |
 338 +-------+  +----> |   {w: f64,    |
 339                   |    h: f64})   |
 340                   +---------------+
 341 ~~~
 342
 343 Here you can see that rectangular shapes are composed of five words of
 344 memory. The first is a tag indicating which variant this enum is
 345 (`rectangle`, in this case). The next two words are the `x` and `y`
 346 fields for the point and the remaining two are the `w` and `h` fields
 347 for the size. The binding `size` is then a pointer into the inside of
 348 the shape.
 349
 350 Perhaps you can see where the danger lies: if the shape were somehow
 351 to be reassigned, perhaps to a circle, then although the memory used
 352 to store that shape value would still be valid, _it would have a
 353 different type_! The following diagram shows what memory would look
 354 like if code overwrote `shape` with a circle:
 355
 356 ~~~ {.notrust}
 357 Stack             Memory
 358
 359 +-------+         +---------------+
 360 | shape | ------> | circle(       |
 361 +-------+         |   {x: f64,    |
 362 | size  | -+      |    y: f64},   |
 363 +-------+  +----> |   f64)        |
 364                   |               |
 365                   +---------------+
 366 ~~~
 367
 368 As you can see, the `size` pointer would be pointing at a `f64`
 369 instead of a struct. This is not good: dereferencing the second field
 370 of a `f64` as if it were a struct with two fields would be a memory
 371 safety violation.
 372
 373 So, in fact, for every `ref` binding, the compiler will impose the
 374 same rules as the ones we saw for borrowing the interior of an owned
 375 box: it must be able to guarantee that the `enum` will not be
 376 overwritten for the duration of the borrow.  In fact, the compiler
 377 would accept the example we gave earlier. The example is safe because
 378 the shape pointer has type `&Shape`, which means "reference to
 379 immutable memory containing a `shape`". If, however, the type of that
 380 pointer were `&mut Shape`, then the ref binding would be ill-typed.
 381 Just as with owned boxes, the compiler will permit `ref` bindings
 382 into data owned by the stack frame even if the data are mutable,
 383 but otherwise it requires that the data reside in immutable memory.
 384
 385 # Returning references
 386
 387 So far, all of the examples we have looked at, use references in a
 388 “downward” direction. That is, a method or code block creates a
 389 reference, then uses it within the same scope. It is also
 390 possible to return references as the result of a function, but
 391 as we'll see, doing so requires some explicit annotation.
 392
 393 We could write a subroutine like this:
 394
 395 ~~~
 396 struct Point {x: f64, y: f64}
 397 fn get_x<'r>(p: &'r Point) -> &'r f64 { &p.x }
 398 ~~~
 399
 400 Here, the function `get_x()` returns a pointer into the structure it
 401 was given. The type of the parameter (`&'r Point`) and return type
 402 (`&'r f64`) both use a new syntactic form that we have not seen so
 403 far.  Here the identifier `r` names the lifetime of the pointer
 404 explicitly. So in effect, this function declares that it takes a
 405 pointer with lifetime `r` and returns a pointer with that same
 406 lifetime.
 407
 408 In general, it is only possible to return references if they
 409 are derived from a parameter to the procedure. In that case, the
 410 pointer result will always have the same lifetime as one of the
 411 parameters; named lifetimes indicate which parameter that
 412 is.
 413
 414 In the previous code samples, function parameter types did not include a
 415 lifetime name. The compiler simply creates a fresh name for the lifetime
 416 automatically: that is, the lifetime name is guaranteed to refer to a distinct
 417 lifetime from the lifetimes of all other parameters.
 418
 419 Named lifetimes that appear in function signatures are conceptually
 420 the same as the other lifetimes we have seen before, but they are a bit
 421 abstract: they don’t refer to a specific expression within `get_x()`,
 422 but rather to some expression within the *caller of `get_x()`*.  The
 423 lifetime `r` is actually a kind of *lifetime parameter*: it is defined
 424 by the caller to `get_x()`, just as the value for the parameter `p` is
 425 defined by that caller.
 426
 427 In any case, whatever the lifetime of `r` is, the pointer produced by
 428 `&p.x` always has the same lifetime as `p` itself: a pointer to a
 429 field of a struct is valid as long as the struct is valid. Therefore,
 430 the compiler accepts the function `get_x()`.
 431
 432 To emphasize this point, let’s look at a variation on the example, this
 433 time one that does not compile:
 434
 435 ~~~ {.ignore}
 436 struct Point {x: f64, y: f64}
 437 fn get_x_sh(p: &Point) -> &f64 {
 438     &p.x // Error reported here
 439 }
 440 ~~~
 441
 442 Here, the function `get_x_sh()` takes a reference as input and
 443 returns a reference. As before, the lifetime of the reference
 444 that will be returned is a parameter (specified by the
 445 caller). That means that `get_x_sh()` promises to return a reference
 446 that is valid for as long as the caller would like: this is
 447 subtly different from the first example, which promised to return a
 448 pointer that was valid for as long as its pointer argument was valid.
 449
 450 Within `get_x_sh()`, we see the expression `&p.x` which takes the
 451 address of a field of a Point. The presence of this expression
 452 implies that the compiler must guarantee that , so long as the
 453 resulting pointer is valid, the original Point won't be moved or changed.
 454
 455 But recall that `get_x_sh()` also promised to
 456 return a pointer that was valid for as long as the caller wanted it to
 457 be. Clearly, `get_x_sh()` is not in a position to make both of these
 458 guarantees; in fact, it cannot guarantee that the pointer will remain
 459 valid at all once it returns, as the parameter `p` may or may not be
 460 live in the caller. Therefore, the compiler will report an error here.
 461
 462 In general, if you borrow a struct or box to create a
 463 reference, it will only be valid within the function
 464 and cannot be returned. This is why the typical way to return references
 465 is to take references as input (the only other case in
 466 which it can be legal to return a reference is if it
 467 points at a static constant).
 468
 469 # Named lifetimes
 470
 471 Lifetimes can be named and referenced. For example, the special lifetime
 472 `'static`, which does not go out of scope, can be used to create global
 473 variables and communicate between tasks (see the manual for use cases).
 474
 475 ## Parameter Lifetimes
 476
 477 Named lifetimes allow for grouping of parameters by lifetime.
 478 For example, consider this function:
 479
 480 ~~~
 481 # struct Point {x: f64, y: f64}; // as before
 482 # struct Size {w: f64, h: f64}; // as before
 483 # enum Shape {
 484 #     Circle(Point, f64),   // origin, radius
 485 #     Rectangle(Point, Size)  // upper-left, dimensions
 486 # }
 487 # fn compute_area(shape: &Shape) -> f64 { 0.0 }
 488 fn select<'r, T>(shape: &'r Shape, threshold: f64,
 489                  a: &'r T, b: &'r T) -> &'r T {
 490     if compute_area(shape) > threshold {a} else {b}
 491 }
 492 ~~~
 493
 494 This function takes three references and assigns each the same
 495 lifetime `r`.  In practice, this means that, in the caller, the
 496 lifetime `r` will be the *intersection of the lifetime of the three
 497 region parameters*. This may be overly conservative, as in this
 498 example:
 499
 500 ~~~
 501 # struct Point {x: f64, y: f64}; // as before
 502 # struct Size {w: f64, h: f64}; // as before
 503 # enum Shape {
 504 #     Circle(Point, f64),   // origin, radius
 505 #     Rectangle(Point, Size)  // upper-left, dimensions
 506 # }
 507 # fn compute_area(shape: &Shape) -> f64 { 0.0 }
 508 # fn select<'r, T>(shape: &Shape, threshold: f64,
 509 #                  a: &'r T, b: &'r T) -> &'r T {
 510 #     if compute_area(shape) > threshold {a} else {b}
 511 # }
 512                                                      // -+ r
 513 fn select_based_on_unit_circle<'r, T>(               //  |-+ B
 514     threshold: f64, a: &'r T, b: &'r T) -> &'r T {   //  | |
 515                                                      //  | |
 516     let shape = Circle(Point {x: 0., y: 0.}, 1.);    //  | |
 517     select(&shape, threshold, a, b)                  //  | |
 518 }                                                    //  |-+
 519                                                      // -+
 520 ~~~
 521
 522 In this call to `select()`, the lifetime of the first parameter shape
 523 is B, the function body. Both of the second two parameters `a` and `b`
 524 share the same lifetime, `r`, which is a lifetime parameter of
 525 `select_based_on_unit_circle()`. The caller will infer the
 526 intersection of these two lifetimes as the lifetime of the returned
 527 value, and hence the return value of `select()` will be assigned a
 528 lifetime of B. This will in turn lead to a compilation error, because
 529 `select_based_on_unit_circle()` is supposed to return a value with the
 530 lifetime `r`.
 531
 532 To address this, we can modify the definition of `select()` to
 533 distinguish the lifetime of the first parameter from the lifetime of
 534 the latter two. After all, the first parameter is not being
 535 returned. Here is how the new `select()` might look:
 536
 537 ~~~
 538 # struct Point {x: f64, y: f64}; // as before
 539 # struct Size {w: f64, h: f64}; // as before
 540 # enum Shape {
 541 #     Circle(Point, f64),   // origin, radius
 542 #     Rectangle(Point, Size)  // upper-left, dimensions
 543 # }
 544 # fn compute_area(shape: &Shape) -> f64 { 0.0 }
 545 fn select<'r, 'tmp, T>(shape: &'tmp Shape, threshold: f64,
 546                        a: &'r T, b: &'r T) -> &'r T {
 547     if compute_area(shape) > threshold {a} else {b}
 548 }
 549 ~~~
 550
 551 Here you can see that `shape`'s lifetime is now named `tmp`. The
 552 parameters `a`, `b`, and the return value all have the lifetime `r`.
 553 However, since the lifetime `tmp` is not returned, it would be more
 554 concise to just omit the named lifetime for `shape` altogether:
 555
 556 ~~~
 557 # struct Point {x: f64, y: f64}; // as before
 558 # struct Size {w: f64, h: f64}; // as before
 559 # enum Shape {
 560 #     Circle(Point, f64),   // origin, radius
 561 #     Rectangle(Point, Size)  // upper-left, dimensions
 562 # }
 563 # fn compute_area(shape: &Shape) -> f64 { 0.0 }
 564 fn select<'r, T>(shape: &Shape, threshold: f64,
 565                  a: &'r T, b: &'r T) -> &'r T {
 566     if compute_area(shape) > threshold {a} else {b}
 567 }
 568 ~~~
 569
 570 This is equivalent to the previous definition.
 571
 572 ## Labeled Control Structures
 573
 574 Named lifetime notation can also be used to control the flow of execution:
 575
 576 ~~~
 577 'h: for i in range(0,10) {
 578     'g: loop {
 579         if i % 2 == 0 { continue 'h; }
 580         if i == 9 { break 'h; }
 581         break 'g;
 582     }
 583 }
 584 ~~~
 585
 586 > *Note:* Labelled breaks are not currently supported within `while` loops.
 587
 588 Named labels are hygienic and can be used safely within macros.
 589 See the macros guide section on hygiene for more details.
 590
 591 # Conclusion
 592
 593 So there you have it: a (relatively) brief tour of the lifetime
 594 system. For more details, we refer to the (yet to be written) reference
 595 document on references, which will explain the full notation
 596 and give more examples.