3 This pass has the job of enforcing memory safety. This is a subtle
4 topic. This docs aim to explain both the practice and the theory
5 behind the borrow checker. They start with a high-level overview of
6 how it works, and then proceed to dive into the theoretical
7 background. Finally, they go into detail on some of the more subtle
12 These docs are long. Search for the section you are interested in.
17 - Moves and initialization
18 - Drop flags and structural fragments
23 The borrow checker checks one function at a time. It operates in two
24 passes. The first pass, called `gather_loans`, walks over the function
25 and identifies all of the places where borrows (e.g., `&` expressions
26 and `ref` bindings) and moves (copies or captures of a linear value)
27 occur. It also tracks initialization sites. For each borrow and move,
28 it checks various basic safety conditions at this time (for example,
29 that the lifetime of the borrow doesn't exceed the lifetime of the
30 value being borrowed, or that there is no move out of an `&T`
33 It then uses the dataflow module to propagate which of those borrows
34 may be in scope at each point in the procedure. A loan is considered
35 to come into scope at the expression that caused it and to go out of
36 scope when the lifetime of the resulting reference expires.
38 Once the in-scope loans are known for each point in the program, the
39 borrow checker walks the IR again in a second pass called
40 `check_loans`. This pass examines each statement and makes sure that
41 it is safe with respect to the in-scope loans.
45 Throughout the docs we'll consider a simple subset of Rust in which
46 you can only borrow from lvalues, defined like so:
52 Here `x` represents some variable, `LV.f` is a field reference,
53 and `*LV` is a pointer dereference. There is no auto-deref or other
54 niceties. This means that if you have a type like:
60 and a variable `a: Box<S>`, then the rust expression `a.f` would correspond
61 to an `LV` of `(*a).f`.
63 Here is the formal grammar for the types we'll consider:
66 TY = i32 | bool | S<'LT...> | Box<TY> | & 'LT MQ TY
70 Most of these types should be pretty self explanatory. Here `S` is a
71 struct name and we assume structs are declared like so:
74 SD = struct S<'LT...> { (f: TY)... }
79 ## An intuitive explanation
83 Now, imagine we had a program like this:
86 struct Foo { f: i32, g: i32 }
89 let mut x: Box<Foo> = ...;
95 This is of course dangerous because mutating `x` will free the old
96 value and hence invalidate `y`. The borrow checker aims to prevent
99 #### Loans and restrictions
101 The way the borrow checker works is that it analyzes each borrow
102 expression (in our simple model, that's stuff like `&LV`, though in
103 real life there are a few other cases to consider). For each borrow
104 expression, it computes a `Loan`, which is a data structure that
105 records (1) the value being borrowed, (2) the mutability and scope of
106 the borrow, and (3) a set of restrictions. In the code, `Loan` is a
107 struct defined in `middle::borrowck`. Formally, we define `LOAN` as
111 LOAN = (LV, LT, MQ, RESTRICTION*)
112 RESTRICTION = (LV, ACTION*)
113 ACTION = MUTATE | CLAIM | FREEZE
116 Here the `LOAN` tuple defines the lvalue `LV` being borrowed; the
117 lifetime `LT` of that borrow; the mutability `MQ` of the borrow; and a
118 list of restrictions. The restrictions indicate actions which, if
119 taken, could invalidate the loan and lead to type safety violations.
121 Each `RESTRICTION` is a pair of a restrictive lvalue `LV` (which will
122 either be the path that was borrowed or some prefix of the path that
123 was borrowed) and a set of restricted actions. There are three kinds
124 of actions that may be restricted for the path `LV`:
126 - `MUTATE` means that `LV` cannot be assigned to;
127 - `CLAIM` means that the `LV` cannot be borrowed mutably;
128 - `FREEZE` means that the `LV` cannot be borrowed immutably;
130 Finally, it is never possible to move from an lvalue that appears in a
131 restriction. This implies that the "empty restriction" `(LV, [])`,
132 which contains an empty set of actions, still has a purpose---it
133 prevents moves from `LV`. I chose not to make `MOVE` a fourth kind of
134 action because that would imply that sometimes moves are permitted
135 from restricted values, which is not the case.
139 To give you a better feeling for what kind of restrictions derived
140 from a loan, let's look at the loan `L` that would be issued as a
141 result of the borrow `&mut (*x).f` in the example above:
144 L = ((*x).f, 'a, mut, RS) where
145 RS = [((*x).f, [MUTATE, CLAIM, FREEZE]),
146 (*x, [MUTATE, CLAIM, FREEZE]),
147 (x, [MUTATE, CLAIM, FREEZE])]
150 The loan states that the expression `(*x).f` has been loaned as
151 mutable for the lifetime `'a`. Because the loan is mutable, that means
152 that the value `(*x).f` may be mutated via the newly created reference
153 (and *only* via that pointer). This is reflected in the
154 restrictions `RS` that accompany the loan.
156 The first restriction `((*x).f, [MUTATE, CLAIM, FREEZE])` states that
157 the lender may not mutate, freeze, nor alias `(*x).f`. Mutation is
158 illegal because `(*x).f` is only supposed to be mutated via the new
159 reference, not by mutating the original path `(*x).f`. Freezing is
160 illegal because the path now has an `&mut` alias; so even if we the
161 lender were to consider `(*x).f` to be immutable, it might be mutated
162 via this alias. They will be enforced for the lifetime `'a` of the
163 loan. After the loan expires, the restrictions no longer apply.
165 The second restriction on `*x` is interesting because it does not
166 apply to the path that was lent (`(*x).f`) but rather to a prefix of
167 the borrowed path. This is due to the rules of inherited mutability:
168 if the user were to assign to (or freeze) `*x`, they would indirectly
169 overwrite (or freeze) `(*x).f`, and thus invalidate the reference
170 that was created. In general it holds that when a path is
171 lent, restrictions are issued for all the owning prefixes of that
172 path. In this case, the path `*x` owns the path `(*x).f` and,
173 because `x` has ownership, the path `x` owns the path `*x`.
174 Therefore, borrowing `(*x).f` yields restrictions on both
177 ### Checking for illegal assignments, moves, and reborrows
179 Once we have computed the loans introduced by each borrow, the borrow
180 checker uses a data flow propagation to compute the full set of loans
181 in scope at each expression and then uses that set to decide whether
182 that expression is legal. Remember that the scope of loan is defined
183 by its lifetime LT. We sometimes say that a loan which is in-scope at
184 a particular point is an "outstanding loan", and the set of
185 restrictions included in those loans as the "outstanding
188 The kinds of expressions which in-scope loans can render illegal are:
189 - *assignments* (`lv = v`): illegal if there is an in-scope restriction
190 against mutating `lv`;
191 - *moves*: illegal if there is any in-scope restriction on `lv` at all;
192 - *mutable borrows* (`&mut lv`): illegal there is an in-scope restriction
193 against claiming `lv`;
194 - *immutable borrows* (`&lv`): illegal there is an in-scope restriction
195 against freezing `lv`.
199 Now that we hopefully have some kind of intuitive feeling for how the
200 borrow checker works, let's look a bit more closely now at the precise
201 conditions that it uses.
203 I will present the rules in a modified form of standard inference
204 rules, which looks as follows:
207 PREDICATE(X, Y, Z) // Rule-Name
213 The initial line states the predicate that is to be satisfied. The
214 indented lines indicate the conditions that must be met for the
215 predicate to be satisfied. The right-justified comment states the name
216 of this rule: there are comments in the borrowck source referencing
217 these names, so that you can cross reference to find the actual code
218 that corresponds to the formal rule.
222 I want to collect, at a high-level, the invariants the borrow checker
223 maintains. I will give them names and refer to them throughout the
224 text. Together these invariants are crucial for the overall soundness
227 **Mutability requires uniqueness.** To mutate a path
229 **Unique mutability.** There is only one *usable* mutable path to any
230 given memory at any given time. This implies that when claiming memory
231 with an expression like `p = &mut x`, the compiler must guarantee that
232 the borrowed value `x` can no longer be mutated so long as `p` is
233 live. (This is done via restrictions, read on.)
238 ### The `gather_loans` pass
240 We start with the `gather_loans` pass, which walks the AST looking for
241 borrows. For each borrow, there are three bits of information: the
242 lvalue `LV` being borrowed and the mutability `MQ` and lifetime `LT`
243 of the resulting pointer. Given those, `gather_loans` applies four
246 1. `MUTABILITY(LV, MQ)`: The mutability of the reference is
247 compatible with the mutability of `LV` (i.e., not borrowing immutable
250 2. `ALIASABLE(LV, MQ)`: The aliasability of the reference is
251 compatible with the aliasability of `LV`. The goal is to prevent
252 `&mut` borrows of aliasability data.
254 3. `LIFETIME(LV, LT, MQ)`: The lifetime of the borrow does not exceed
255 the lifetime of the value being borrowed.
257 4. `RESTRICTIONS(LV, LT, ACTIONS) = RS`: This pass checks and computes the
258 restrictions to maintain memory safety. These are the restrictions
259 that will go into the final loan. We'll discuss in more detail below.
261 ## Checking mutability
263 Checking mutability is fairly straightforward. We just want to prevent
264 immutable data from being borrowed as mutable. Note that it is ok to borrow
265 mutable data as immutable, since that is simply a freeze. The judgement
266 `MUTABILITY(LV, MQ)` means the mutability of `LV` is compatible with a borrow
267 of mutability `MQ`. The Rust code corresponding to this predicate is the
268 function `check_mutability` in `middle::borrowck::gather_loans`.
270 ### Checking mutability of variables
272 *Code pointer:* Function `check_mutability()` in `gather_loans/mod.rs`,
273 but also the code in `mem_categorization`.
275 Let's begin with the rules for variables, which state that if a
276 variable is declared as mutable, it may be borrowed any which way, but
277 otherwise the variable must be borrowed as immutable:
280 MUTABILITY(X, MQ) // M-Var-Mut
283 MUTABILITY(X, imm) // M-Var-Imm
287 ### Checking mutability of owned content
289 Fields and boxes inherit their mutability from
290 their base expressions, so both of their rules basically
291 delegate the check to the base expression `LV`:
294 MUTABILITY(LV.f, MQ) // M-Field
297 MUTABILITY(*LV, MQ) // M-Deref-Unique
302 ### Checking mutability of immutable pointer types
304 Immutable pointer types like `&T` can only
305 be borrowed if MQ is immutable:
308 MUTABILITY(*LV, imm) // M-Deref-Borrowed-Imm
312 ### Checking mutability of mutable pointer types
314 `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut:
317 MUTABILITY(*LV, MQ) // M-Deref-Borrowed-Mut
321 ## Checking aliasability
323 The goal of the aliasability check is to ensure that we never permit `&mut`
324 borrows of aliasable data. The judgement `ALIASABLE(LV, MQ)` means the
325 aliasability of `LV` is compatible with a borrow of mutability `MQ`. The Rust
326 code corresponding to this predicate is the function `check_aliasability()` in
327 `middle::borrowck::gather_loans`.
329 ### Checking aliasability of variables
331 Local variables are never aliasable as they are accessible only within
335 ALIASABLE(X, MQ) // M-Var-Mut
338 ### Checking aliasable of owned content
340 Owned content is aliasable if it is found in an aliasable location:
343 ALIASABLE(LV.f, MQ) // M-Field
346 ALIASABLE(*LV, MQ) // M-Deref-Unique
350 ### Checking aliasability of immutable pointer types
352 Immutable pointer types like `&T` are aliasable, and hence can only be
356 ALIASABLE(*LV, imm) // M-Deref-Borrowed-Imm
360 ### Checking aliasability of mutable pointer types
362 `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut:
365 ALIASABLE(*LV, MQ) // M-Deref-Borrowed-Mut
371 These rules aim to ensure that no data is borrowed for a scope that exceeds
372 its lifetime. These two computations wind up being intimately related.
373 Formally, we define a predicate `LIFETIME(LV, LT, MQ)`, which states that
374 "the lvalue `LV` can be safely borrowed for the lifetime `LT` with mutability
375 `MQ`". The Rust code corresponding to this predicate is the module
376 `middle::borrowck::gather_loans::lifetime`.
378 ### Checking lifetime of variables
380 The rule for variables states that a variable can only be borrowed a
381 lifetime `LT` that is a subregion of the variable's scope:
384 LIFETIME(X, LT, MQ) // L-Local
385 LT <= block where X is declared
388 ### Checking lifetime for owned content
390 The lifetime of a field or box is the same as the lifetime
394 LIFETIME(LV.f, LT, MQ) // L-Field
397 LIFETIME(*LV, LT, MQ) // L-Deref-Send
402 ### Checking lifetime for derefs of references
404 References have a lifetime `LT'` associated with them. The
405 data they point at has been guaranteed to be valid for at least this
406 lifetime. Therefore, the borrow is valid so long as the lifetime `LT`
407 of the borrow is shorter than the lifetime `LT'` of the pointer
411 LIFETIME(*LV, LT, MQ) // L-Deref-Borrowed
412 TYPE(LV) = <' Ty OR <' mut Ty
416 ## Computing the restrictions
418 The final rules govern the computation of *restrictions*, meaning that
419 we compute the set of actions that will be illegal for the life of the
420 loan. The predicate is written `RESTRICTIONS(LV, LT, ACTIONS) =
421 RESTRICTION*`, which can be read "in order to prevent `ACTIONS` from
422 occurring on `LV`, the restrictions `RESTRICTION*` must be respected
423 for the lifetime of the loan".
425 Note that there is an initial set of restrictions: these restrictions
426 are computed based on the kind of borrow:
429 &mut LV => RESTRICTIONS(LV, LT, MUTATE|CLAIM|FREEZE)
430 &LV => RESTRICTIONS(LV, LT, MUTATE|CLAIM)
433 The reasoning here is that a mutable borrow must be the only writer,
434 therefore it prevents other writes (`MUTATE`), mutable borrows
435 (`CLAIM`), and immutable borrows (`FREEZE`). An immutable borrow
436 permits other immutable borrows but forbids writes and mutable borrows.
438 ### Restrictions for loans of a local variable
440 The simplest case is a borrow of a local variable `X`:
443 RESTRICTIONS(X, LT, ACTIONS) = (X, ACTIONS) // R-Variable
446 In such cases we just record the actions that are not permitted.
448 ### Restrictions for loans of fields
450 Restricting a field is the same as restricting the owner of that
454 RESTRICTIONS(LV.f, LT, ACTIONS) = RS, (LV.f, ACTIONS) // R-Field
455 RESTRICTIONS(LV, LT, ACTIONS) = RS
458 The reasoning here is as follows. If the field must not be mutated,
459 then you must not mutate the owner of the field either, since that
460 would indirectly modify the field. Similarly, if the field cannot be
461 frozen or aliased, we cannot allow the owner to be frozen or aliased,
462 since doing so indirectly freezes/aliases the field. This is the
463 origin of inherited mutability.
465 ### Restrictions for loans of owned referents
467 Because the mutability of owned referents is inherited, restricting an
468 owned referent is similar to restricting a field, in that it implies
469 restrictions on the pointer. However, boxes have an important
470 twist: if the owner `LV` is mutated, that causes the owned referent
471 `*LV` to be freed! So whenever an owned referent `*LV` is borrowed, we
472 must prevent the box `LV` from being mutated, which means
473 that we always add `MUTATE` and `CLAIM` to the restriction set imposed
477 RESTRICTIONS(*LV, LT, ACTIONS) = RS, (*LV, ACTIONS) // R-Deref-Send-Pointer
479 RESTRICTIONS(LV, LT, ACTIONS|MUTATE|CLAIM) = RS
482 ### Restrictions for loans of immutable borrowed referents
484 Immutable borrowed referents are freely aliasable, meaning that
485 the compiler does not prevent you from copying the pointer. This
486 implies that issuing restrictions is useless. We might prevent the
487 user from acting on `*LV` itself, but there could be another path
488 `*LV1` that refers to the exact same memory, and we would not be
489 restricting that path. Therefore, the rule for `&Ty` pointers
490 always returns an empty set of restrictions, and it only permits
491 restricting `MUTATE` and `CLAIM` actions:
494 RESTRICTIONS(*LV, LT, ACTIONS) = [] // R-Deref-Imm-Borrowed
497 ACTIONS subset of [MUTATE, CLAIM]
500 The reason that we can restrict `MUTATE` and `CLAIM` actions even
501 without a restrictions list is that it is never legal to mutate nor to
502 borrow mutably the contents of a `&Ty` pointer. In other words,
503 those restrictions are already inherent in the type.
505 Clause (1) in the rule for `&Ty` deserves mention. Here I
506 specify that the lifetime of the loan must be less than the lifetime
507 of the `&Ty` pointer. In simple cases, this clause is redundant, since
508 the `LIFETIME()` function will already enforce the required rule:
511 fn foo(point: &'a Point) -> &'static i32 {
516 The above example fails to compile both because of clause (1) above
517 but also by the basic `LIFETIME()` check. However, in more advanced
518 examples involving multiple nested pointers, clause (1) is needed:
521 fn foo(point: &'a &'b mut Point) -> &'b i32 {
526 The `LIFETIME` rule here would accept `'b` because, in fact, the
527 *memory is* guaranteed to remain valid (i.e., not be freed) for the
528 lifetime `'b`, since the `&mut` pointer is valid for `'b`. However, we
529 are returning an immutable reference, so we need the memory to be both
530 valid and immutable. Even though `point.x` is referenced by an `&mut`
531 pointer, it can still be considered immutable so long as that `&mut`
532 pointer is found in an aliased location. That means the memory is
533 guaranteed to be *immutable* for the lifetime of the `&` pointer,
534 which is only `'a`, not `'b`. Hence this example yields an error.
536 As a final twist, consider the case of two nested *immutable*
537 pointers, rather than a mutable pointer within an immutable one:
540 fn foo(point: &'a &'b Point) -> &'b i32 {
545 This function is legal. The reason for this is that the inner pointer
546 (`*point : &'b Point`) is enough to guarantee the memory is immutable
547 and valid for the lifetime `'b`. This is reflected in
548 `RESTRICTIONS()` by the fact that we do not recurse (i.e., we impose
549 no restrictions on `LV`, which in this particular case is the pointer
550 `point : &'a &'b Point`).
552 #### Why both `LIFETIME()` and `RESTRICTIONS()`?
554 Given the previous text, it might seem that `LIFETIME` and
555 `RESTRICTIONS` should be folded together into one check, but there is
556 a reason that they are separated. They answer separate concerns.
557 The rules pertaining to `LIFETIME` exist to ensure that we don't
558 create a borrowed pointer that outlives the memory it points at. So
559 `LIFETIME` prevents a function like this:
562 fn get_1<'a>() -> &'a i32 {
568 Here we would be returning a pointer into the stack. Clearly bad.
570 However, the `RESTRICTIONS` rules are more concerned with how memory
571 is used. The example above doesn't generate an error according to
572 `RESTRICTIONS` because, for local variables, we don't require that the
573 loan lifetime be a subset of the local variable lifetime. The idea
574 here is that we *can* guarantee that `x` is not (e.g.) mutated for the
575 lifetime `'a`, even though `'a` exceeds the function body and thus
576 involves unknown code in the caller -- after all, `x` ceases to exist
577 after we return and hence the remaining code in `'a` cannot possibly
578 mutate it. This distinction is important for type checking functions
582 fn inc_and_get<'a>(p: &'a mut Point) -> &'a i32 {
588 In this case, we take in a `&mut` and return a frozen borrowed pointer
589 with the same lifetime. So long as the lifetime of the returned value
590 doesn't exceed the lifetime of the `&mut` we receive as input, this is
591 fine, though it may seem surprising at first (it surprised me when I
592 first worked it through). After all, we're guaranteeing that `*p`
593 won't be mutated for the lifetime `'a`, even though we can't "see" the
594 entirety of the code during that lifetime, since some of it occurs in
595 our caller. But we *do* know that nobody can mutate `*p` except
596 through `p`. So if we don't mutate `*p` and we don't return `p`, then
597 we know that the right to mutate `*p` has been lost to our caller --
598 in terms of capability, the caller passed in the ability to mutate
599 `*p`, and we never gave it back. (Note that we can't return `p` while
600 `*p` is borrowed since that would be a move of `p`, as `&mut` pointers
603 ### Restrictions for loans of mutable borrowed referents
605 Mutable borrowed pointers are guaranteed to be the only way to mutate
606 their referent. This permits us to take greater license with them; for
607 example, the referent can be frozen simply be ensuring that we do not
608 use the original pointer to perform mutate. Similarly, we can allow
609 the referent to be claimed, so long as the original pointer is unused
610 while the new claimant is live.
612 The rule for mutable borrowed pointers is as follows:
615 RESTRICTIONS(*LV, LT, ACTIONS) = RS, (*LV, ACTIONS) // R-Deref-Mut-Borrowed
616 TYPE(LV) = <' mut Ty
618 RESTRICTIONS(LV, LT, ACTIONS) = RS // (2)
621 Let's examine the two numbered clauses:
623 Clause (1) specifies that the lifetime of the loan (`LT`) cannot
624 exceed the lifetime of the `&mut` pointer (`LT'`). The reason for this
625 is that the `&mut` pointer is guaranteed to be the only legal way to
626 mutate its referent -- but only for the lifetime `LT'`. After that
627 lifetime, the loan on the referent expires and hence the data may be
628 modified by its owner again. This implies that we are only able to
629 guarantee that the referent will not be modified or aliased for a
632 Here is a concrete example of a bug this rule prevents:
635 // Test region-reborrow-from-shorter-mut-ref.rs:
636 fn copy_borrowed_ptr<'a,'b,T>(x: &'a mut &'b mut T) -> &'b mut T {
637 &mut **p // ERROR due to clause (1)
641 let mut y = &mut x; // <-'b-----------------------------+
642 // +-'a--------------------+ |
644 let z = copy_borrowed_ptr(&mut y); // y is lent |
645 *y += 1; // Here y==z, so both should not be usable... |
646 *z += 1; // ...and yet they would be, but for clause 1. |
647 } // <------------------------------------------------------+
650 Clause (2) propagates the restrictions on the referent to the pointer
651 itself. This is the same as with an box, though the
652 reasoning is mildly different. The basic goal in all cases is to
653 prevent the user from establishing another route to the same data. To
654 see what I mean, let's examine various cases of what can go wrong and
655 show how it is prevented.
657 **Example danger 1: Moving the base pointer.** One of the simplest
658 ways to violate the rules is to move the base pointer to a new name
659 and access it via that new name, thus bypassing the restrictions on
660 the old name. Here is an example:
663 // src/test/compile-fail/borrowck-move-mut-base-ptr.rs
664 fn foo(t0: &mut i32) {
665 let p: &i32 = &*t0; // Freezes `*t0`
666 let t1 = t0; //~ ERROR cannot move out of `t0`
667 *t1 = 22; // OK, not a write through `*t0`
671 Remember that `&mut` pointers are linear, and hence `let t1 = t0` is a
672 move of `t0` -- or would be, if it were legal. Instead, we get an
673 error, because clause (2) imposes restrictions on `LV` (`t0`, here),
674 and any restrictions on a path make it impossible to move from that
677 **Example danger 2: Claiming the base pointer.** Another possible
678 danger is to mutably borrow the base path. This can lead to two bad
679 scenarios. The most obvious is that the mutable borrow itself becomes
680 another path to access the same data, as shown here:
683 // src/test/compile-fail/borrowck-mut-borrow-of-mut-base-ptr.rs
684 fn foo<'a>(mut t0: &'a mut i32,
685 mut t1: &'a mut i32) {
686 let p: &i32 = &*t0; // Freezes `*t0`
687 let mut t2 = &mut t0; //~ ERROR cannot borrow `t0`
688 **t2 += 1; // Mutates `*t0`
692 In this example, `**t2` is the same memory as `*t0`. Because `t2` is
693 an `&mut` pointer, `**t2` is a unique path and hence it would be
694 possible to mutate `**t2` even though that memory was supposed to be
695 frozen by the creation of `p`. However, an error is reported -- the
696 reason is that the freeze `&*t0` will restrict claims and mutation
697 against `*t0` which, by clause 2, in turn prevents claims and mutation
698 of `t0`. Hence the claim `&mut t0` is illegal.
700 Another danger with an `&mut` pointer is that we could swap the `t0`
701 value away to create a new path:
704 // src/test/compile-fail/borrowck-swap-mut-base-ptr.rs
705 fn foo<'a>(mut t0: &'a mut i32,
706 mut t1: &'a mut i32) {
707 let p: &i32 = &*t0; // Freezes `*t0`
708 swap(&mut t0, &mut t1); //~ ERROR cannot borrow `t0`
713 This is illegal for the same reason as above. Note that if we added
714 back a swap operator -- as we used to have -- we would want to be very
715 careful to ensure this example is still illegal.
717 **Example danger 3: Freeze the base pointer.** In the case where the
718 referent is claimed, even freezing the base pointer can be dangerous,
719 as shown in the following example:
722 // src/test/compile-fail/borrowck-borrow-of-mut-base-ptr.rs
723 fn foo<'a>(mut t0: &'a mut i32,
724 mut t1: &'a mut i32) {
725 let p: &mut i32 = &mut *t0; // Claims `*t0`
726 let mut t2 = &t0; //~ ERROR cannot borrow `t0`
727 let q: &i32 = &*t2; // Freezes `*t0` but not through `*p`
728 *p += 1; // violates type of `*q`
732 Here the problem is that `*t0` is claimed by `p`, and hence `p` wants
733 to be the controlling pointer through which mutation or freezes occur.
734 But `t2` would -- if it were legal -- have the type `& &mut i32`, and
735 hence would be a mutable pointer in an aliasable location, which is
736 considered frozen (since no one can write to `**t2` as it is not a
737 unique path). Therefore, we could reasonably create a frozen `&i32`
738 pointer pointing at `*t0` that coexists with the mutable pointer `p`,
739 which is clearly unsound.
741 However, it is not always unsafe to freeze the base pointer. In
742 particular, if the referent is frozen, there is no harm in it:
745 // src/test/run-pass/borrowck-borrow-of-mut-base-ptr-safe.rs
746 fn foo<'a>(mut t0: &'a mut i32,
747 mut t1: &'a mut i32) {
748 let p: &i32 = &*t0; // Freezes `*t0`
750 let q: &i32 = &*t2; // Freezes `*t0`, but that's ok...
751 let r: &i32 = &*t0; // ...after all, could do same thing directly.
755 In this case, creating the alias `t2` of `t0` is safe because the only
756 thing `t2` can be used for is to further freeze `*t0`, which is
757 already frozen. In particular, we cannot assign to `*t0` through the
758 new alias `t2`, as demonstrated in this test case:
761 // src/test/run-pass/borrowck-borrow-mut-base-ptr-in-aliasable-loc.rs
762 fn foo(t0: & &mut i32) {
765 **t1 = 22; //~ ERROR cannot assign
769 This distinction is reflected in the rules. When doing an `&mut`
770 borrow -- as in the first example -- the set `ACTIONS` will be
771 `CLAIM|MUTATE|FREEZE`, because claiming the referent implies that it
772 cannot be claimed, mutated, or frozen by anyone else. These
773 restrictions are propagated back to the base path and hence the base
774 path is considered unfreezable.
776 In contrast, when the referent is merely frozen -- as in the second
777 example -- the set `ACTIONS` will be `CLAIM|MUTATE`, because freezing
778 the referent implies that it cannot be claimed or mutated but permits
779 others to freeze. Hence when these restrictions are propagated back to
780 the base path, it will still be considered freezable.
784 **FIXME #10520: Restrictions against mutating the base pointer.** When
785 an `&mut` pointer is frozen or claimed, we currently pass along the
786 restriction against MUTATE to the base pointer. I do not believe this
787 restriction is needed. It dates from the days when we had a way to
788 mutate that preserved the value being mutated (i.e., swap). Nowadays
789 the only form of mutation is assignment, which destroys the pointer
790 being mutated -- therefore, a mutation cannot create a new path to the
791 same data. Rather, it removes an existing path. This implies that not
792 only can we permit mutation, we can have mutation kill restrictions in
795 **WARNING:** We do not currently have `const` borrows in the
796 language. If they are added back in, we must ensure that they are
797 consistent with all of these examples. The crucial question will be
798 what sorts of actions are permitted with a `&const &mut` pointer. I
799 would suggest that an `&mut` referent found in an `&const` location be
800 prohibited from both freezes and claims. This would avoid the need to
801 prevent `const` borrows of the base pointer when the referent is
804 [ Previous revisions of this document discussed `&const` in more detail.
805 See the revision history. ]
807 # Moves and initialization
809 The borrow checker is also in charge of ensuring that:
811 - all memory which is accessed is initialized
812 - immutable local variables are assigned at most once.
814 These are two separate dataflow analyses built on the same
815 framework. Let's look at checking that memory is initialized first;
816 the checking of immutable local variable assignments works in a very
819 To track the initialization of memory, we actually track all the
820 points in the program that *create uninitialized memory*, meaning
821 moves and the declaration of uninitialized variables. For each of
822 these points, we create a bit in the dataflow set. Assignments to a
823 variable `x` or path `a.b.c` kill the move/uninitialization bits for
824 those paths and any subpaths (e.g., `x`, `x.y`, `a.b.c`, `*a.b.c`).
825 Bits are unioned when two control-flow paths join. Thus, the
826 presence of a bit indicates that the move may have occurred without an
827 intervening assignment to the same memory. At each use of a variable,
828 we examine the bits in scope, and check that none of them are
829 moves/uninitializations of the variable that is being used.
831 Let's look at a simple example:
834 fn foo(a: Box<i32>) {
835 let b: Box<i32>; // Gen bit 0.
839 b = a; // Gen bit 1, kill bit 0.
852 In this example, the variable `b` is created uninitialized. In one
853 branch of an `if`, we then move the variable `a` into `b`. Once we
854 exit the `if`, therefore, it is an error to use `a` or `b` since both
855 are only conditionally initialized. I have annotated the dataflow
856 state using comments. There are two dataflow bits, with bit 0
857 corresponding to the creation of `b` without an initializer, and bit 1
858 corresponding to the move of `a`. The assignment `b = a` both
859 generates bit 1, because it is a move of `a`, and kills bit 0, because
860 `b` is now initialized. On the else branch, though, `b` is never
861 initialized, and so bit 0 remains untouched. When the two flows of
862 control join, we union the bits from both sides, resulting in both
863 bits 0 and 1 being set. Thus any attempt to use `a` uncovers the bit 1
864 from the "then" branch, showing that `a` may be moved, and any attempt
865 to use `b` uncovers bit 0, from the "else" branch, showing that `b`
866 may not be initialized.
868 ## Initialization of immutable variables
870 Initialization of immutable variables works in a very similar way,
873 1. we generate bits for each assignment to a variable;
874 2. the bits are never killed except when the variable goes out of scope.
876 Thus the presence of an assignment bit indicates that the assignment
877 may have occurred. Note that assignments are only killed when the
878 variable goes out of scope, as it is not relevant whether or not there
879 has been a move in the meantime. Using these bits, we can declare that
880 an assignment to an immutable variable is legal iff there is no other
881 assignment bit to that same variable in scope.
883 ## Why is the design made this way?
885 It may seem surprising that we assign dataflow bits to *each move*
886 rather than *each path being moved*. This is somewhat less efficient,
887 since on each use, we must iterate through all moves and check whether
888 any of them correspond to the path in question. Similar concerns apply
889 to the analysis for double assignments to immutable variables. The
890 main reason to do it this way is that it allows us to print better
891 error messages, because when a use occurs, we can print out the
892 precise move that may be in scope, rather than simply having to say
893 "the variable may not be initialized".
895 ## Data structures used in the move analysis
897 The move analysis maintains several data structures that enable it to
898 cross-reference moves and assignments to determine when they may be
899 moving/assigning the same memory. These are all collected into the
900 `MoveData` and `FlowedMoveData` structs. The former represents the set
901 of move paths, moves, and assignments, and the latter adds in the
902 results of a dataflow computation.
906 The `MovePath` tree tracks every path that is moved or assigned to.
907 These paths have the same form as the `LoanPath` data structure, which
908 in turn is the "real world version of the lvalues `LV` that we
909 introduced earlier. The difference between a `MovePath` and a `LoanPath`
910 is that move paths are:
912 1. Canonicalized, so that we have exactly one copy of each, and
913 we can refer to move paths by index;
914 2. Cross-referenced with other paths into a tree, so that given a move
915 path we can efficiently find all parent move paths and all
916 extensions (e.g., given the `a.b` move path, we can easily find the
917 move path `a` and also the move paths `a.b.c`)
918 3. Cross-referenced with moves and assignments, so that we can
919 easily find all moves and assignments to a given path.
921 The mechanism that we use is to create a `MovePath` record for each
922 move path. These are arranged in an array and are referenced using
923 `MovePathIndex` values, which are newtype'd indices. The `MovePath`
924 structs are arranged into a tree, representing using the standard
925 Knuth representation where each node has a child 'pointer' and a "next
926 sibling" 'pointer'. In addition, each `MovePath` has a parent
927 'pointer'. In this case, the 'pointers' are just `MovePathIndex`
930 In this way, if we want to find all base paths of a given move path,
931 we can just iterate up the parent pointers (see `each_base_path()` in
932 the `move_data` module). If we want to find all extensions, we can
933 iterate through the subtree (see `each_extending_path()`).
935 ### Moves and assignments
937 There are structs to represent moves (`Move`) and assignments
938 (`Assignment`), and these are also placed into arrays and referenced
939 by index. All moves of a particular path are arranged into a linked
940 lists, beginning with `MovePath.first_move` and continuing through
943 We distinguish between "var" assignments, which are assignments to a
944 variable like `x = foo`, and "path" assignments (`x.f = foo`). This
945 is because we need to assign dataflows to the former, but not the
946 latter, so as to check for double initialization of immutable
949 ### Gathering and checking moves
951 Like loans, we distinguish two phases. The first, gathering, is where
952 we uncover all the moves and assignments. As with loans, we do some
953 basic sanity checking in this phase, so we'll report errors if you
954 attempt to move out of a borrowed pointer etc. Then we do the dataflow
955 (see `FlowedMoveData::new`). Finally, in the `check_loans.rs` code, we
956 walk back over, identify all uses, assignments, and captures, and
957 check that they are legal given the set of dataflow bits we have
958 computed for that program point.
960 # Drop flags and structural fragments
962 In addition to the job of enforcing memory safety, the borrow checker
963 code is also responsible for identifying the *structural fragments* of
964 data in the function, to support out-of-band dynamic drop flags
965 allocated on the stack. (For background, see [RFC PR #320].)
967 [RFC PR #320]: https://github.com/rust-lang/rfcs/pull/320
969 Semantically, each piece of data that has a destructor may need a
970 boolean flag to indicate whether or not its destructor has been run
971 yet. However, in many cases there is no need to actually maintain such
972 a flag: It can be apparent from the code itself that a given path is
973 always initialized (or always deinitialized) when control reaches the
974 end of its owner's scope, and thus we can unconditionally emit (or
975 not) the destructor invocation for that path.
977 A simple example of this is the following:
981 impl D { fn new(x: i32) -> D { ... }
982 impl Drop for D { ... }
984 fn foo(a: D, b: D, t: || -> bool) {
991 At the end of the body of `foo`, the compiler knows that `a` is
992 initialized, introducing a drop obligation (deallocating the boxed
993 integer) for the end of `a`'s scope that is run unconditionally.
994 Likewise the compiler knows that `d` is not initialized, and thus it
995 leave out the drop code for `d`.
997 The compiler cannot statically know the drop-state of `b` nor `c` at
998 the end of their scope, since that depends on the value of
999 `t`. Therefore, we need to insert boolean flags to track whether we
1000 need to drop `b` and `c`.
1002 However, the matter is not as simple as just mapping local variables
1003 to their corresponding drop flags when necessary. In particular, in
1004 addition to being able to move data out of local variables, Rust
1005 allows one to move values in and out of structured data.
1007 Consider the following:
1010 struct S { x: D, y: D, z: D }
1012 fn foo(a: S, mut b: S, t: || -> bool) {
1015 let e: S = a.clone();
1020 if t() { c.y = D::new(4); }
1024 As before, the drop obligations of `a` and `d` can be statically
1025 determined, and again the state of `b` and `c` depend on dynamic
1026 state. But additionally, the dynamic drop obligations introduced by
1027 `b` and `c` are not just per-local boolean flags. For example, if the
1028 first call to `t` returns `false` and the second call `true`, then at
1029 the end of their scope, `b` will be completely initialized, but only
1030 `c.y` in `c` will be initialized. If both calls to `t` return `true`,
1031 then at the end of their scope, `c` will be completely initialized,
1032 but only `b.x` will be initialized in `b`, and only `e.x` and `e.z`
1033 will be initialized in `e`.
1035 Note that we need to cover the `z` field in each case in some way,
1036 since it may (or may not) need to be dropped, even though `z` is never
1037 directly mentioned in the body of the `foo` function. We call a path
1038 like `b.z` a *fragment sibling* of `b.x`, since the field `z` comes
1039 from the same structure `S` that declared the field `x` in `b.x`.
1041 In general we need to maintain boolean flags that match the
1042 `S`-structure of both `b` and `c`. In addition, we need to consult
1043 such a flag when doing an assignment (such as `c.y = D::new(4);`
1044 above), in order to know whether or not there is a previous value that
1045 needs to be dropped before we do the assignment.
1047 So for any given function, we need to determine what flags are needed
1048 to track its drop obligations. Our strategy for determining the set of
1049 flags is to represent the fragmentation of the structure explicitly:
1050 by starting initially from the paths that are explicitly mentioned in
1051 moves and assignments (such as `b.x` and `c.y` above), and then
1052 traversing the structure of the path's type to identify leftover
1053 *unmoved fragments*: assigning into `c.y` means that `c.x` and `c.z`
1054 are leftover unmoved fragments. Each fragment represents a drop
1055 obligation that may need to be tracked. Paths that are only moved or
1056 assigned in their entirety (like `a` and `d`) are treated as a single
1059 The fragment construction process works by piggy-backing on the
1060 existing `move_data` module. We already have callbacks that visit each
1061 direct move and assignment; these form the basis for the sets of
1062 moved_leaf_paths and assigned_leaf_paths. From these leaves, we can
1063 walk up their parent chain to identify all of their parent paths.
1064 We need to identify the parents because of cases like the following:
1067 struct Pair<X,Y>{ x: X, y: Y }
1068 fn foo(dd_d_d: Pair<Pair<Pair<D, D>, D>, D>) {
1069 other_function(dd_d_d.x.y);
1073 In this code, the move of the path `dd_d.x.y` leaves behind not only
1074 the fragment drop-obligation `dd_d.x.x` but also `dd_d.y` as well.
1076 Once we have identified the directly-referenced leaves and their
1077 parents, we compute the left-over fragments, in the function
1078 `fragments::add_fragment_siblings`. As of this writing this works by
1079 looking at each directly-moved or assigned path P, and blindly
1080 gathering all sibling fields of P (as well as siblings for the parents
1081 of P, etc). After accumulating all such siblings, we filter out the
1082 entries added as siblings of P that turned out to be
1083 directly-referenced paths (or parents of directly referenced paths)
1084 themselves, thus leaving the never-referenced "left-overs" as the only
1085 thing left from the gathering step.
1087 ## Array structural fragments
1089 A special case of the structural fragments discussed above are
1090 the elements of an array that has been passed by value, such as
1094 fn foo(a: [D; 10], i: i32) -> D {
1099 The above code moves a single element out of the input array `a`.
1100 The remainder of the array still needs to be dropped; i.e., it
1101 is a structural fragment. Note that after performing such a move,
1102 it is not legal to read from the array `a`. There are a number of
1103 ways to deal with this, but the important thing to note is that
1104 the semantics needs to distinguish in some manner between a
1105 fragment that is the *entire* array versus a fragment that represents
1106 all-but-one element of the array. A place where that distinction
1107 would arise is the following:
1110 fn foo(a: [D; 10], b: [D; 10], i: i32, t: bool) -> D {
1117 // When control exits, we will need either to drop all of `a`
1118 // and all-but-one of `b`, or to drop all of `b` and all-but-one
1123 There are a number of ways that the trans backend could choose to
1124 compile this (e.g. a `[bool; 10]` array for each such moved array;
1125 or an `Option<usize>` for each moved array). From the viewpoint of the
1126 borrow-checker, the important thing is to record what kind of fragment
1127 is implied by the relevant moves.
1131 While writing up these docs, I encountered some rules I believe to be
1132 stricter than necessary:
1134 - I think restricting the `&mut` LV against moves and `ALIAS` is sufficient,
1135 `MUTATE` and `CLAIM` are overkill. `MUTATE` was necessary when swap was
1136 a built-in operator, but as it is not, it is implied by `CLAIM`,
1137 and `CLAIM` is implied by `ALIAS`. The only net effect of this is an
1138 extra error message in some cases, though.
1139 - I have not described how closures interact. Current code is unsound.
1140 I am working on describing and implementing the fix.
1141 - If we wish, we can easily extend the move checking to allow finer-grained
1142 tracking of what is initialized and what is not, enabling code like
1145 a = x.f.g; // x.f.g is now uninitialized
1146 // here, x and x.f are not usable, but x.f.h *is*
1147 x.f.g = b; // x.f.g is not initialized
1148 // now x, x.f, x.f.g, x.f.h are all usable
1150 What needs to change here, most likely, is that the `moves` module
1151 should record not only what paths are moved, but what expressions
1152 are actual *uses*. For example, the reference to `x` in `x.f.g = b`
1153 is not a true *use* in the sense that it requires `x` to be fully
1154 initialized. This is in fact why the above code produces an error
1155 today: the reference to `x` in `x.f.g = b` is considered illegal
1156 because `x` is not fully initialized.
1158 There are also some possible refactorings:
1160 - It might be nice to replace all loan paths with the MovePath mechanism,
1161 since they allow lightweight comparison using an integer.