3 This pass has the job of enforcing memory safety. This is a subtle
4 topic. This docs aim to explain both the practice and the theory
5 behind the borrow checker. They start with a high-level overview of
6 how it works, and then proceed to dive into the theoretical
7 background. Finally, they go into detail on some of the more subtle
12 These docs are long. Search for the section you are interested in.
17 - Moves and initialization
18 - Drop flags and structural fragments
23 The borrow checker checks one function at a time. It operates in two
24 passes. The first pass, called `gather_loans`, walks over the function
25 and identifies all of the places where borrows (e.g., `&` expressions
26 and `ref` bindings) and moves (copies or captures of a linear value)
27 occur. It also tracks initialization sites. For each borrow and move,
28 it checks various basic safety conditions at this time (for example,
29 that the lifetime of the borrow doesn't exceed the lifetime of the
30 value being borrowed, or that there is no move out of an `&T`
33 It then uses the dataflow module to propagate which of those borrows
34 may be in scope at each point in the procedure. A loan is considered
35 to come into scope at the expression that caused it and to go out of
36 scope when the lifetime of the resulting reference expires.
38 Once the in-scope loans are known for each point in the program, the
39 borrow checker walks the IR again in a second pass called
40 `check_loans`. This pass examines each statement and makes sure that
41 it is safe with respect to the in-scope loans.
45 Throughout the docs we'll consider a simple subset of Rust in which
46 you can only borrow from lvalues, defined like so:
52 Here `x` represents some variable, `LV.f` is a field reference,
53 and `*LV` is a pointer dereference. There is no auto-deref or other
54 niceties. This means that if you have a type like:
60 and a variable `a: Box<S>`, then the rust expression `a.f` would correspond
61 to an `LV` of `(*a).f`.
63 Here is the formal grammar for the types we'll consider:
66 TY = i32 | bool | S<'LT...> | Box<TY> | & 'LT MQ TY
70 Most of these types should be pretty self explanatory. Here `S` is a
71 struct name and we assume structs are declared like so:
74 SD = struct S<'LT...> { (f: TY)... }
79 ## An intuitive explanation
83 Now, imagine we had a program like this:
86 struct Foo { f: i32, g: i32 }
89 let mut x: Box<Foo> = ...;
95 This is of course dangerous because mutating `x` will free the old
96 value and hence invalidate `y`. The borrow checker aims to prevent
99 #### Loans and restrictions
101 The way the borrow checker works is that it analyzes each borrow
102 expression (in our simple model, that's stuff like `&LV`, though in
103 real life there are a few other cases to consider). For each borrow
104 expression, it computes a `Loan`, which is a data structure that
105 records (1) the value being borrowed, (2) the mutability and scope of
106 the borrow, and (3) a set of restrictions. In the code, `Loan` is a
107 struct defined in `middle::borrowck`. Formally, we define `LOAN` as
111 LOAN = (LV, LT, MQ, RESTRICTION*)
112 RESTRICTION = (LV, ACTION*)
113 ACTION = MUTATE | CLAIM | FREEZE
116 Here the `LOAN` tuple defines the lvalue `LV` being borrowed; the
117 lifetime `LT` of that borrow; the mutability `MQ` of the borrow; and a
118 list of restrictions. The restrictions indicate actions which, if
119 taken, could invalidate the loan and lead to type safety violations.
121 Each `RESTRICTION` is a pair of a restrictive lvalue `LV` (which will
122 either be the path that was borrowed or some prefix of the path that
123 was borrowed) and a set of restricted actions. There are three kinds
124 of actions that may be restricted for the path `LV`:
126 - `MUTATE` means that `LV` cannot be assigned to;
127 - `CLAIM` means that the `LV` cannot be borrowed mutably;
128 - `FREEZE` means that the `LV` cannot be borrowed immutably;
130 Finally, it is never possible to move from an lvalue that appears in a
131 restriction. This implies that the "empty restriction" `(LV, [])`,
132 which contains an empty set of actions, still has a purpose---it
133 prevents moves from `LV`. I chose not to make `MOVE` a fourth kind of
134 action because that would imply that sometimes moves are permitted
135 from restricted values, which is not the case.
139 To give you a better feeling for what kind of restrictions derived
140 from a loan, let's look at the loan `L` that would be issued as a
141 result of the borrow `&mut (*x).f` in the example above:
144 L = ((*x).f, 'a, mut, RS) where
145 RS = [((*x).f, [MUTATE, CLAIM, FREEZE]),
146 (*x, [MUTATE, CLAIM, FREEZE]),
147 (x, [MUTATE, CLAIM, FREEZE])]
150 The loan states that the expression `(*x).f` has been loaned as
151 mutable for the lifetime `'a`. Because the loan is mutable, that means
152 that the value `(*x).f` may be mutated via the newly created reference
153 (and *only* via that pointer). This is reflected in the
154 restrictions `RS` that accompany the loan.
156 The first restriction `((*x).f, [MUTATE, CLAIM, FREEZE])` states that
157 the lender may not mutate, freeze, nor alias `(*x).f`. Mutation is
158 illegal because `(*x).f` is only supposed to be mutated via the new
159 reference, not by mutating the original path `(*x).f`. Freezing is
160 illegal because the path now has an `&mut` alias; so even if we the
161 lender were to consider `(*x).f` to be immutable, it might be mutated
162 via this alias. They will be enforced for the lifetime `'a` of the
163 loan. After the loan expires, the restrictions no longer apply.
165 The second restriction on `*x` is interesting because it does not
166 apply to the path that was lent (`(*x).f`) but rather to a prefix of
167 the borrowed path. This is due to the rules of inherited mutability:
168 if the user were to assign to (or freeze) `*x`, they would indirectly
169 overwrite (or freeze) `(*x).f`, and thus invalidate the reference
170 that was created. In general it holds that when a path is
171 lent, restrictions are issued for all the owning prefixes of that
172 path. In this case, the path `*x` owns the path `(*x).f` and,
173 because `x` has ownership, the path `x` owns the path `*x`.
174 Therefore, borrowing `(*x).f` yields restrictions on both
177 ### Checking for illegal assignments, moves, and reborrows
179 Once we have computed the loans introduced by each borrow, the borrow
180 checker uses a data flow propagation to compute the full set of loans
181 in scope at each expression and then uses that set to decide whether
182 that expression is legal. Remember that the scope of loan is defined
183 by its lifetime LT. We sometimes say that a loan which is in-scope at
184 a particular point is an "outstanding loan", and the set of
185 restrictions included in those loans as the "outstanding
188 The kinds of expressions which in-scope loans can render illegal are:
189 - *assignments* (`lv = v`): illegal if there is an in-scope restriction
190 against mutating `lv`;
191 - *moves*: illegal if there is any in-scope restriction on `lv` at all;
192 - *mutable borrows* (`&mut lv`): illegal there is an in-scope restriction
193 against claiming `lv`;
194 - *immutable borrows* (`&lv`): illegal there is an in-scope restriction
195 against freezing `lv`.
199 Now that we hopefully have some kind of intuitive feeling for how the
200 borrow checker works, let's look a bit more closely now at the precise
201 conditions that it uses.
203 I will present the rules in a modified form of standard inference
204 rules, which looks as follows:
207 PREDICATE(X, Y, Z) // Rule-Name
213 The initial line states the predicate that is to be satisfied. The
214 indented lines indicate the conditions that must be met for the
215 predicate to be satisfied. The right-justified comment states the name
216 of this rule: there are comments in the borrowck source referencing
217 these names, so that you can cross reference to find the actual code
218 that corresponds to the formal rule.
222 I want to collect, at a high-level, the invariants the borrow checker
223 maintains. I will give them names and refer to them throughout the
224 text. Together these invariants are crucial for the overall soundness
227 **Mutability requires uniqueness.** To mutate a path
229 **Unique mutability.** There is only one *usable* mutable path to any
230 given memory at any given time. This implies that when claiming memory
231 with an expression like `p = &mut x`, the compiler must guarantee that
232 the borrowed value `x` can no longer be mutated so long as `p` is
233 live. (This is done via restrictions, read on.)
238 ### The `gather_loans` pass
240 We start with the `gather_loans` pass, which walks the AST looking for
241 borrows. For each borrow, there are three bits of information: the
242 lvalue `LV` being borrowed and the mutability `MQ` and lifetime `LT`
243 of the resulting pointer. Given those, `gather_loans` applies four
246 1. `MUTABILITY(LV, MQ)`: The mutability of the reference is
247 compatible with the mutability of `LV` (i.e., not borrowing immutable
250 2. `ALIASABLE(LV, MQ)`: The aliasability of the reference is
251 compatible with the aliasability of `LV`. The goal is to prevent
252 `&mut` borrows of aliasability data.
254 3. `LIFETIME(LV, LT, MQ)`: The lifetime of the borrow does not exceed
255 the lifetime of the value being borrowed.
257 4. `RESTRICTIONS(LV, LT, ACTIONS) = RS`: This pass checks and computes the
258 restrictions to maintain memory safety. These are the restrictions
259 that will go into the final loan. We'll discuss in more detail below.
261 ## Checking mutability
263 Checking mutability is fairly straightforward. We just want to prevent
264 immutable data from being borrowed as mutable. Note that it is ok to borrow
265 mutable data as immutable, since that is simply a freeze. The judgement
266 `MUTABILITY(LV, MQ)` means the mutability of `LV` is compatible with a borrow
267 of mutability `MQ`. The Rust code corresponding to this predicate is the
268 function `check_mutability` in `middle::borrowck::gather_loans`.
270 ### Checking mutability of variables
272 *Code pointer:* Function `check_mutability()` in `gather_loans/mod.rs`,
273 but also the code in `mem_categorization`.
275 Let's begin with the rules for variables, which state that if a
276 variable is declared as mutable, it may be borrowed any which way, but
277 otherwise the variable must be borrowed as immutable:
280 MUTABILITY(X, MQ) // M-Var-Mut
283 MUTABILITY(X, imm) // M-Var-Imm
287 ### Checking mutability of owned content
289 Fields and boxes inherit their mutability from
290 their base expressions, so both of their rules basically
291 delegate the check to the base expression `LV`:
294 MUTABILITY(LV.f, MQ) // M-Field
297 MUTABILITY(*LV, MQ) // M-Deref-Unique
302 ### Checking mutability of immutable pointer types
304 Immutable pointer types like `&T` can only
305 be borrowed if MQ is immutable:
308 MUTABILITY(*LV, imm) // M-Deref-Borrowed-Imm
312 ### Checking mutability of mutable pointer types
314 `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut:
317 MUTABILITY(*LV, MQ) // M-Deref-Borrowed-Mut
321 ## Checking aliasability
323 The goal of the aliasability check is to ensure that we never permit `&mut`
324 borrows of aliasable data. The judgement `ALIASABLE(LV, MQ)` means the
325 aliasability of `LV` is compatible with a borrow of mutability `MQ`. The Rust
326 code corresponding to this predicate is the function `check_aliasability()` in
327 `middle::borrowck::gather_loans`.
329 ### Checking aliasability of variables
331 Local variables are never aliasable as they are accessible only within
335 ALIASABLE(X, MQ) // M-Var-Mut
338 ### Checking aliasable of owned content
340 Owned content is aliasable if it is found in an aliasable location:
343 ALIASABLE(LV.f, MQ) // M-Field
346 ALIASABLE(*LV, MQ) // M-Deref-Unique
350 ### Checking aliasability of immutable pointer types
352 Immutable pointer types like `&T` are aliasable, and hence can only be
356 ALIASABLE(*LV, imm) // M-Deref-Borrowed-Imm
360 ### Checking aliasability of mutable pointer types
362 `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut:
365 ALIASABLE(*LV, MQ) // M-Deref-Borrowed-Mut
371 These rules aim to ensure that no data is borrowed for a scope that exceeds
372 its lifetime. These two computations wind up being intimately related.
373 Formally, we define a predicate `LIFETIME(LV, LT, MQ)`, which states that
374 "the lvalue `LV` can be safely borrowed for the lifetime `LT` with mutability
375 `MQ`". The Rust code corresponding to this predicate is the module
376 `middle::borrowck::gather_loans::lifetime`.
378 ### Checking lifetime of variables
380 The rule for variables states that a variable can only be borrowed a
381 lifetime `LT` that is a subregion of the variable's scope:
384 LIFETIME(X, LT, MQ) // L-Local
385 LT <= block where X is declared
388 ### Checking lifetime for owned content
390 The lifetime of a field or box is the same as the lifetime
394 LIFETIME(LV.f, LT, MQ) // L-Field
397 LIFETIME(*LV, LT, MQ) // L-Deref-Send
402 ### Checking lifetime for derefs of references
404 References have a lifetime `LT'` associated with them. The
405 data they point at has been guaranteed to be valid for at least this
406 lifetime. Therefore, the borrow is valid so long as the lifetime `LT`
407 of the borrow is shorter than the lifetime `LT'` of the pointer
411 LIFETIME(*LV, LT, MQ) // L-Deref-Borrowed
412 TYPE(LV) = <' Ty OR <' mut Ty
416 ## Computing the restrictions
418 The final rules govern the computation of *restrictions*, meaning that
419 we compute the set of actions that will be illegal for the life of the
420 loan. The predicate is written `RESTRICTIONS(LV, LT, ACTIONS) =
421 RESTRICTION*`, which can be read "in order to prevent `ACTIONS` from
422 occurring on `LV`, the restrictions `RESTRICTION*` must be respected
423 for the lifetime of the loan".
425 Note that there is an initial set of restrictions: these restrictions
426 are computed based on the kind of borrow:
429 &mut LV => RESTRICTIONS(LV, LT, MUTATE|CLAIM|FREEZE)
430 &LV => RESTRICTIONS(LV, LT, MUTATE|CLAIM)
433 The reasoning here is that a mutable borrow must be the only writer,
434 therefore it prevents other writes (`MUTATE`), mutable borrows
435 (`CLAIM`), and immutable borrows (`FREEZE`). An immutable borrow
436 permits other immutable borrows but forbids writes and mutable borrows.
438 ### Restrictions for loans of a local variable
440 The simplest case is a borrow of a local variable `X`:
443 RESTRICTIONS(X, LT, ACTIONS) = (X, ACTIONS) // R-Variable
446 In such cases we just record the actions that are not permitted.
448 ### Restrictions for loans of fields
450 Restricting a field is the same as restricting the owner of that
454 RESTRICTIONS(LV.f, LT, ACTIONS) = RS, (LV.f, ACTIONS) // R-Field
455 RESTRICTIONS(LV, LT, ACTIONS) = RS
458 The reasoning here is as follows. If the field must not be mutated,
459 then you must not mutate the owner of the field either, since that
460 would indirectly modify the field. Similarly, if the field cannot be
461 frozen or aliased, we cannot allow the owner to be frozen or aliased,
462 since doing so indirectly freezes/aliases the field. This is the
463 origin of inherited mutability.
465 ### Restrictions for loans of owned referents
467 Because the mutability of owned referents is inherited, restricting an
468 owned referent is similar to restricting a field, in that it implies
469 restrictions on the pointer. However, boxes have an important
470 twist: if the owner `LV` is mutated, that causes the owned referent
471 `*LV` to be freed! So whenever an owned referent `*LV` is borrowed, we
472 must prevent the box `LV` from being mutated, which means
473 that we always add `MUTATE` and `CLAIM` to the restriction set imposed
477 RESTRICTIONS(*LV, LT, ACTIONS) = RS, (*LV, ACTIONS) // R-Deref-Send-Pointer
479 RESTRICTIONS(LV, LT, ACTIONS|MUTATE|CLAIM) = RS
482 ### Restrictions for loans of immutable borrowed referents
484 Immutable borrowed referents are freely aliasable, meaning that
485 the compiler does not prevent you from copying the pointer. This
486 implies that issuing restrictions is useless. We might prevent the
487 user from acting on `*LV` itself, but there could be another path
488 `*LV1` that refers to the exact same memory, and we would not be
489 restricting that path. Therefore, the rule for `&Ty` pointers
490 always returns an empty set of restrictions, and it only permits
491 restricting `MUTATE` and `CLAIM` actions:
494 RESTRICTIONS(*LV, LT, ACTIONS) = [] // R-Deref-Imm-Borrowed
497 ACTIONS subset of [MUTATE, CLAIM]
500 The reason that we can restrict `MUTATE` and `CLAIM` actions even
501 without a restrictions list is that it is never legal to mutate nor to
502 borrow mutably the contents of a `&Ty` pointer. In other words,
503 those restrictions are already inherent in the type.
505 Clause (1) in the rule for `&Ty` deserves mention. Here I
506 specify that the lifetime of the loan must be less than the lifetime
507 of the `&Ty` pointer. In simple cases, this clause is redundant, since
508 the `LIFETIME()` function will already enforce the required rule:
511 fn foo(point: &'a Point) -> &'static i32 {
516 The above example fails to compile both because of clause (1) above
517 but also by the basic `LIFETIME()` check. However, in more advanced
518 examples involving multiple nested pointers, clause (1) is needed:
521 fn foo(point: &'a &'b mut Point) -> &'b i32 {
526 The `LIFETIME` rule here would accept `'b` because, in fact, the
527 *memory is* guaranteed to remain valid (i.e., not be freed) for the
528 lifetime `'b`, since the `&mut` pointer is valid for `'b`. However, we
529 are returning an immutable reference, so we need the memory to be both
530 valid and immutable. Even though `point.x` is referenced by an `&mut`
531 pointer, it can still be considered immutable so long as that `&mut`
532 pointer is found in an aliased location. That means the memory is
533 guaranteed to be *immutable* for the lifetime of the `&` pointer,
534 which is only `'a`, not `'b`. Hence this example yields an error.
536 As a final twist, consider the case of two nested *immutable*
537 pointers, rather than a mutable pointer within an immutable one:
540 fn foo(point: &'a &'b Point) -> &'b i32 {
545 This function is legal. The reason for this is that the inner pointer
546 (`*point : &'b Point`) is enough to guarantee the memory is immutable
547 and valid for the lifetime `'b`. This is reflected in
548 `RESTRICTIONS()` by the fact that we do not recurse (i.e., we impose
549 no restrictions on `LV`, which in this particular case is the pointer
550 `point : &'a &'b Point`).
552 #### Why both `LIFETIME()` and `RESTRICTIONS()`?
554 Given the previous text, it might seem that `LIFETIME` and
555 `RESTRICTIONS` should be folded together into one check, but there is
556 a reason that they are separated. They answer separate concerns.
557 The rules pertaining to `LIFETIME` exist to ensure that we don't
558 create a borrowed pointer that outlives the memory it points at. So
559 `LIFETIME` prevents a function like this:
562 fn get_1<'a>() -> &'a i32 {
568 Here we would be returning a pointer into the stack. Clearly bad.
570 However, the `RESTRICTIONS` rules are more concerned with how memory
571 is used. The example above doesn't generate an error according to
572 `RESTRICTIONS` because, for local variables, we don't require that the
573 loan lifetime be a subset of the local variable lifetime. The idea
574 here is that we *can* guarantee that `x` is not (e.g.) mutated for the
575 lifetime `'a`, even though `'a` exceeds the function body and thus
576 involves unknown code in the caller -- after all, `x` ceases to exist
577 after we return and hence the remaining code in `'a` cannot possibly
578 mutate it. This distinction is important for type checking functions
582 fn inc_and_get<'a>(p: &'a mut Point) -> &'a i32 {
588 In this case, we take in a `&mut` and return a frozen borrowed pointer
589 with the same lifetime. So long as the lifetime of the returned value
590 doesn't exceed the lifetime of the `&mut` we receive as input, this is
591 fine, though it may seem surprising at first (it surprised me when I
592 first worked it through). After all, we're guaranteeing that `*p`
593 won't be mutated for the lifetime `'a`, even though we can't "see" the
594 entirety of the code during that lifetime, since some of it occurs in
595 our caller. But we *do* know that nobody can mutate `*p` except
596 through `p`. So if we don't mutate `*p` and we don't return `p`, then
597 we know that the right to mutate `*p` has been lost to our caller --
598 in terms of capability, the caller passed in the ability to mutate
599 `*p`, and we never gave it back. (Note that we can't return `p` while
600 `*p` is borrowed since that would be a move of `p`, as `&mut` pointers
603 ### Restrictions for loans of mutable borrowed referents
605 Mutable borrowed pointers are guaranteed to be the only way to mutate
606 their referent. This permits us to take greater license with them; for
607 example, the referent can be frozen simply be ensuring that we do not
608 use the original pointer to perform mutate. Similarly, we can allow
609 the referent to be claimed, so long as the original pointer is unused
610 while the new claimant is live.
612 The rule for mutable borrowed pointers is as follows:
615 RESTRICTIONS(*LV, LT, ACTIONS) = RS, (*LV, ACTIONS) // R-Deref-Mut-Borrowed
616 TYPE(LV) = <' mut Ty
618 RESTRICTIONS(LV, LT, ACTIONS) = RS // (2)
621 Let's examine the two numbered clauses:
623 Clause (1) specifies that the lifetime of the loan (`LT`) cannot
624 exceed the lifetime of the `&mut` pointer (`LT'`). The reason for this
625 is that the `&mut` pointer is guaranteed to be the only legal way to
626 mutate its referent -- but only for the lifetime `LT'`. After that
627 lifetime, the loan on the referent expires and hence the data may be
628 modified by its owner again. This implies that we are only able to
629 guarantee that the referent will not be modified or aliased for a
632 Here is a concrete example of a bug this rule prevents:
635 // Test region-reborrow-from-shorter-mut-ref.rs:
636 fn copy_borrowed_ptr<'a,'b,T>(x: &'a mut &'b mut T) -> &'b mut T {
637 &mut **p // ERROR due to clause (1)
641 let mut y = &mut x; // <-'b-----------------------------+
642 // +-'a--------------------+ |
644 let z = copy_borrowed_ptr(&mut y); // y is lent |
645 *y += 1; // Here y==z, so both should not be usable... |
646 *z += 1; // ...and yet they would be, but for clause 1. |
647 } // <------------------------------------------------------+
650 Clause (2) propagates the restrictions on the referent to the pointer
651 itself. This is the same as with an box, though the
652 reasoning is mildly different. The basic goal in all cases is to
653 prevent the user from establishing another route to the same data. To
654 see what I mean, let's examine various cases of what can go wrong and
655 show how it is prevented.
657 **Example danger 1: Moving the base pointer.** One of the simplest
658 ways to violate the rules is to move the base pointer to a new name
659 and access it via that new name, thus bypassing the restrictions on
660 the old name. Here is an example:
663 // src/test/compile-fail/borrowck-move-mut-base-ptr.rs
664 fn foo(t0: &mut i32) {
665 let p: &i32 = &*t0; // Freezes `*t0`
666 let t1 = t0; //~ ERROR cannot move out of `t0`
667 *t1 = 22; // OK, not a write through `*t0`
671 Remember that `&mut` pointers are linear, and hence `let t1 = t0` is a
672 move of `t0` -- or would be, if it were legal. Instead, we get an
673 error, because clause (2) imposes restrictions on `LV` (`t0`, here),
674 and any restrictions on a path make it impossible to move from that
677 **Example danger 2: Claiming the base pointer.** Another possible
678 danger is to mutably borrow the base path. This can lead to two bad
679 scenarios. The most obvious is that the mutable borrow itself becomes
680 another path to access the same data, as shown here:
683 // src/test/compile-fail/borrowck-mut-borrow-of-mut-base-ptr.rs
684 fn foo<'a>(mut t0: &'a mut i32,
685 mut t1: &'a mut i32) {
686 let p: &i32 = &*t0; // Freezes `*t0`
687 let mut t2 = &mut t0; //~ ERROR cannot borrow `t0`
688 **t2 += 1; // Mutates `*t0`
692 In this example, `**t2` is the same memory as `*t0`. Because `t2` is
693 an `&mut` pointer, `**t2` is a unique path and hence it would be
694 possible to mutate `**t2` even though that memory was supposed to be
695 frozen by the creation of `p`. However, an error is reported -- the
696 reason is that the freeze `&*t0` will restrict claims and mutation
697 against `*t0` which, by clause 2, in turn prevents claims and mutation
698 of `t0`. Hence the claim `&mut t0` is illegal.
700 Another danger with an `&mut` pointer is that we could swap the `t0`
701 value away to create a new path:
704 // src/test/compile-fail/borrowck-swap-mut-base-ptr.rs
705 fn foo<'a>(mut t0: &'a mut i32,
706 mut t1: &'a mut i32) {
707 let p: &i32 = &*t0; // Freezes `*t0`
708 swap(&mut t0, &mut t1); //~ ERROR cannot borrow `t0`
713 This is illegal for the same reason as above. Note that if we added
714 back a swap operator -- as we used to have -- we would want to be very
715 careful to ensure this example is still illegal.
717 **Example danger 3: Freeze the base pointer.** In the case where the
718 referent is claimed, even freezing the base pointer can be dangerous,
719 as shown in the following example:
722 // src/test/compile-fail/borrowck-borrow-of-mut-base-ptr.rs
723 fn foo<'a>(mut t0: &'a mut i32,
724 mut t1: &'a mut i32) {
725 let p: &mut i32 = &mut *t0; // Claims `*t0`
726 let mut t2 = &t0; //~ ERROR cannot borrow `t0`
727 let q: &i32 = &*t2; // Freezes `*t0` but not through `*p`
728 *p += 1; // violates type of `*q`
732 Here the problem is that `*t0` is claimed by `p`, and hence `p` wants
733 to be the controlling pointer through which mutation or freezes occur.
734 But `t2` would -- if it were legal -- have the type `& &mut i32`, and
735 hence would be a mutable pointer in an aliasable location, which is
736 considered frozen (since no one can write to `**t2` as it is not a
737 unique path). Therefore, we could reasonably create a frozen `&i32`
738 pointer pointing at `*t0` that coexists with the mutable pointer `p`,
739 which is clearly unsound.
741 However, it is not always unsafe to freeze the base pointer. In
742 particular, if the referent is frozen, there is no harm in it:
745 // src/test/run-pass/borrowck-borrow-of-mut-base-ptr-safe.rs
746 fn foo<'a>(mut t0: &'a mut i32,
747 mut t1: &'a mut i32) {
748 let p: &i32 = &*t0; // Freezes `*t0`
750 let q: &i32 = &*t2; // Freezes `*t0`, but that's ok...
751 let r: &i32 = &*t0; // ...after all, could do same thing directly.
755 In this case, creating the alias `t2` of `t0` is safe because the only
756 thing `t2` can be used for is to further freeze `*t0`, which is
757 already frozen. In particular, we cannot assign to `*t0` through the
758 new alias `t2`, as demonstrated in this test case:
761 // src/test/run-pass/borrowck-borrow-mut-base-ptr-in-aliasable-loc.rs
762 fn foo(t0: & &mut i32) {
765 **t1 = 22; //~ ERROR cannot assign
769 This distinction is reflected in the rules. When doing an `&mut`
770 borrow -- as in the first example -- the set `ACTIONS` will be
771 `CLAIM|MUTATE|FREEZE`, because claiming the referent implies that it
772 cannot be claimed, mutated, or frozen by anyone else. These
773 restrictions are propagated back to the base path and hence the base
774 path is considered unfreezable.
776 In contrast, when the referent is merely frozen -- as in the second
777 example -- the set `ACTIONS` will be `CLAIM|MUTATE`, because freezing
778 the referent implies that it cannot be claimed or mutated but permits
779 others to freeze. Hence when these restrictions are propagated back to
780 the base path, it will still be considered freezable.
784 **FIXME [RFC 1751](https://github.com/rust-lang/rfcs/issues/1751)
785 Restrictions against mutating the base pointer.**
786 When an `&mut` pointer is frozen or claimed, we currently pass along the
787 restriction against MUTATE to the base pointer. I do not believe this
788 restriction is needed. It dates from the days when we had a way to
789 mutate that preserved the value being mutated (i.e., swap). Nowadays
790 the only form of mutation is assignment, which destroys the pointer
791 being mutated -- therefore, a mutation cannot create a new path to the
792 same data. Rather, it removes an existing path. This implies that not
793 only can we permit mutation, we can have mutation kill restrictions in
796 **WARNING:** We do not currently have `const` borrows in the
797 language. If they are added back in, we must ensure that they are
798 consistent with all of these examples. The crucial question will be
799 what sorts of actions are permitted with a `&const &mut` pointer. I
800 would suggest that an `&mut` referent found in an `&const` location be
801 prohibited from both freezes and claims. This would avoid the need to
802 prevent `const` borrows of the base pointer when the referent is
805 [ Previous revisions of this document discussed `&const` in more detail.
806 See the revision history. ]
808 # Moves and initialization
810 The borrow checker is also in charge of ensuring that:
812 - all memory which is accessed is initialized
813 - immutable local variables are assigned at most once.
815 These are two separate dataflow analyses built on the same
816 framework. Let's look at checking that memory is initialized first;
817 the checking of immutable local variable assignments works in a very
820 To track the initialization of memory, we actually track all the
821 points in the program that *create uninitialized memory*, meaning
822 moves and the declaration of uninitialized variables. For each of
823 these points, we create a bit in the dataflow set. Assignments to a
824 variable `x` or path `a.b.c` kill the move/uninitialization bits for
825 those paths and any subpaths (e.g., `x`, `x.y`, `a.b.c`, `*a.b.c`).
826 Bits are unioned when two control-flow paths join. Thus, the
827 presence of a bit indicates that the move may have occurred without an
828 intervening assignment to the same memory. At each use of a variable,
829 we examine the bits in scope, and check that none of them are
830 moves/uninitializations of the variable that is being used.
832 Let's look at a simple example:
835 fn foo(a: Box<i32>) {
836 let b: Box<i32>; // Gen bit 0.
840 b = a; // Gen bit 1, kill bit 0.
853 In this example, the variable `b` is created uninitialized. In one
854 branch of an `if`, we then move the variable `a` into `b`. Once we
855 exit the `if`, therefore, it is an error to use `a` or `b` since both
856 are only conditionally initialized. I have annotated the dataflow
857 state using comments. There are two dataflow bits, with bit 0
858 corresponding to the creation of `b` without an initializer, and bit 1
859 corresponding to the move of `a`. The assignment `b = a` both
860 generates bit 1, because it is a move of `a`, and kills bit 0, because
861 `b` is now initialized. On the else branch, though, `b` is never
862 initialized, and so bit 0 remains untouched. When the two flows of
863 control join, we union the bits from both sides, resulting in both
864 bits 0 and 1 being set. Thus any attempt to use `a` uncovers the bit 1
865 from the "then" branch, showing that `a` may be moved, and any attempt
866 to use `b` uncovers bit 0, from the "else" branch, showing that `b`
867 may not be initialized.
869 ## Initialization of immutable variables
871 Initialization of immutable variables works in a very similar way,
874 1. we generate bits for each assignment to a variable;
875 2. the bits are never killed except when the variable goes out of scope.
877 Thus the presence of an assignment bit indicates that the assignment
878 may have occurred. Note that assignments are only killed when the
879 variable goes out of scope, as it is not relevant whether or not there
880 has been a move in the meantime. Using these bits, we can declare that
881 an assignment to an immutable variable is legal iff there is no other
882 assignment bit to that same variable in scope.
884 ## Why is the design made this way?
886 It may seem surprising that we assign dataflow bits to *each move*
887 rather than *each path being moved*. This is somewhat less efficient,
888 since on each use, we must iterate through all moves and check whether
889 any of them correspond to the path in question. Similar concerns apply
890 to the analysis for double assignments to immutable variables. The
891 main reason to do it this way is that it allows us to print better
892 error messages, because when a use occurs, we can print out the
893 precise move that may be in scope, rather than simply having to say
894 "the variable may not be initialized".
896 ## Data structures used in the move analysis
898 The move analysis maintains several data structures that enable it to
899 cross-reference moves and assignments to determine when they may be
900 moving/assigning the same memory. These are all collected into the
901 `MoveData` and `FlowedMoveData` structs. The former represents the set
902 of move paths, moves, and assignments, and the latter adds in the
903 results of a dataflow computation.
907 The `MovePath` tree tracks every path that is moved or assigned to.
908 These paths have the same form as the `LoanPath` data structure, which
909 in turn is the "real world version of the lvalues `LV` that we
910 introduced earlier. The difference between a `MovePath` and a `LoanPath`
911 is that move paths are:
913 1. Canonicalized, so that we have exactly one copy of each, and
914 we can refer to move paths by index;
915 2. Cross-referenced with other paths into a tree, so that given a move
916 path we can efficiently find all parent move paths and all
917 extensions (e.g., given the `a.b` move path, we can easily find the
918 move path `a` and also the move paths `a.b.c`)
919 3. Cross-referenced with moves and assignments, so that we can
920 easily find all moves and assignments to a given path.
922 The mechanism that we use is to create a `MovePath` record for each
923 move path. These are arranged in an array and are referenced using
924 `MovePathIndex` values, which are newtype'd indices. The `MovePath`
925 structs are arranged into a tree, representing using the standard
926 Knuth representation where each node has a child 'pointer' and a "next
927 sibling" 'pointer'. In addition, each `MovePath` has a parent
928 'pointer'. In this case, the 'pointers' are just `MovePathIndex`
931 In this way, if we want to find all base paths of a given move path,
932 we can just iterate up the parent pointers (see `each_base_path()` in
933 the `move_data` module). If we want to find all extensions, we can
934 iterate through the subtree (see `each_extending_path()`).
936 ### Moves and assignments
938 There are structs to represent moves (`Move`) and assignments
939 (`Assignment`), and these are also placed into arrays and referenced
940 by index. All moves of a particular path are arranged into a linked
941 lists, beginning with `MovePath.first_move` and continuing through
944 We distinguish between "var" assignments, which are assignments to a
945 variable like `x = foo`, and "path" assignments (`x.f = foo`). This
946 is because we need to assign dataflows to the former, but not the
947 latter, so as to check for double initialization of immutable
950 ### Gathering and checking moves
952 Like loans, we distinguish two phases. The first, gathering, is where
953 we uncover all the moves and assignments. As with loans, we do some
954 basic sanity checking in this phase, so we'll report errors if you
955 attempt to move out of a borrowed pointer etc. Then we do the dataflow
956 (see `FlowedMoveData::new`). Finally, in the `check_loans.rs` code, we
957 walk back over, identify all uses, assignments, and captures, and
958 check that they are legal given the set of dataflow bits we have
959 computed for that program point.
961 # Drop flags and structural fragments
963 In addition to the job of enforcing memory safety, the borrow checker
964 code is also responsible for identifying the *structural fragments* of
965 data in the function, to support out-of-band dynamic drop flags
966 allocated on the stack. (For background, see [RFC PR #320].)
968 [RFC PR #320]: https://github.com/rust-lang/rfcs/pull/320
970 Semantically, each piece of data that has a destructor may need a
971 boolean flag to indicate whether or not its destructor has been run
972 yet. However, in many cases there is no need to actually maintain such
973 a flag: It can be apparent from the code itself that a given path is
974 always initialized (or always deinitialized) when control reaches the
975 end of its owner's scope, and thus we can unconditionally emit (or
976 not) the destructor invocation for that path.
978 A simple example of this is the following:
982 impl D { fn new(x: i32) -> D { ... }
983 impl Drop for D { ... }
985 fn foo(a: D, b: D, t: || -> bool) {
992 At the end of the body of `foo`, the compiler knows that `a` is
993 initialized, introducing a drop obligation (deallocating the boxed
994 integer) for the end of `a`'s scope that is run unconditionally.
995 Likewise the compiler knows that `d` is not initialized, and thus it
996 leave out the drop code for `d`.
998 The compiler cannot statically know the drop-state of `b` nor `c` at
999 the end of their scope, since that depends on the value of
1000 `t`. Therefore, we need to insert boolean flags to track whether we
1001 need to drop `b` and `c`.
1003 However, the matter is not as simple as just mapping local variables
1004 to their corresponding drop flags when necessary. In particular, in
1005 addition to being able to move data out of local variables, Rust
1006 allows one to move values in and out of structured data.
1008 Consider the following:
1011 struct S { x: D, y: D, z: D }
1013 fn foo(a: S, mut b: S, t: || -> bool) {
1016 let e: S = a.clone();
1021 if t() { c.y = D::new(4); }
1025 As before, the drop obligations of `a` and `d` can be statically
1026 determined, and again the state of `b` and `c` depend on dynamic
1027 state. But additionally, the dynamic drop obligations introduced by
1028 `b` and `c` are not just per-local boolean flags. For example, if the
1029 first call to `t` returns `false` and the second call `true`, then at
1030 the end of their scope, `b` will be completely initialized, but only
1031 `c.y` in `c` will be initialized. If both calls to `t` return `true`,
1032 then at the end of their scope, `c` will be completely initialized,
1033 but only `b.x` will be initialized in `b`, and only `e.x` and `e.z`
1034 will be initialized in `e`.
1036 Note that we need to cover the `z` field in each case in some way,
1037 since it may (or may not) need to be dropped, even though `z` is never
1038 directly mentioned in the body of the `foo` function. We call a path
1039 like `b.z` a *fragment sibling* of `b.x`, since the field `z` comes
1040 from the same structure `S` that declared the field `x` in `b.x`.
1042 In general we need to maintain boolean flags that match the
1043 `S`-structure of both `b` and `c`. In addition, we need to consult
1044 such a flag when doing an assignment (such as `c.y = D::new(4);`
1045 above), in order to know whether or not there is a previous value that
1046 needs to be dropped before we do the assignment.
1048 So for any given function, we need to determine what flags are needed
1049 to track its drop obligations. Our strategy for determining the set of
1050 flags is to represent the fragmentation of the structure explicitly:
1051 by starting initially from the paths that are explicitly mentioned in
1052 moves and assignments (such as `b.x` and `c.y` above), and then
1053 traversing the structure of the path's type to identify leftover
1054 *unmoved fragments*: assigning into `c.y` means that `c.x` and `c.z`
1055 are leftover unmoved fragments. Each fragment represents a drop
1056 obligation that may need to be tracked. Paths that are only moved or
1057 assigned in their entirety (like `a` and `d`) are treated as a single
1060 The fragment construction process works by piggy-backing on the
1061 existing `move_data` module. We already have callbacks that visit each
1062 direct move and assignment; these form the basis for the sets of
1063 moved_leaf_paths and assigned_leaf_paths. From these leaves, we can
1064 walk up their parent chain to identify all of their parent paths.
1065 We need to identify the parents because of cases like the following:
1068 struct Pair<X,Y>{ x: X, y: Y }
1069 fn foo(dd_d_d: Pair<Pair<Pair<D, D>, D>, D>) {
1070 other_function(dd_d_d.x.y);
1074 In this code, the move of the path `dd_d.x.y` leaves behind not only
1075 the fragment drop-obligation `dd_d.x.x` but also `dd_d.y` as well.
1077 Once we have identified the directly-referenced leaves and their
1078 parents, we compute the left-over fragments, in the function
1079 `fragments::add_fragment_siblings`. As of this writing this works by
1080 looking at each directly-moved or assigned path P, and blindly
1081 gathering all sibling fields of P (as well as siblings for the parents
1082 of P, etc). After accumulating all such siblings, we filter out the
1083 entries added as siblings of P that turned out to be
1084 directly-referenced paths (or parents of directly referenced paths)
1085 themselves, thus leaving the never-referenced "left-overs" as the only
1086 thing left from the gathering step.
1088 ## Array structural fragments
1090 A special case of the structural fragments discussed above are
1091 the elements of an array that has been passed by value, such as
1095 fn foo(a: [D; 10], i: i32) -> D {
1100 The above code moves a single element out of the input array `a`.
1101 The remainder of the array still needs to be dropped; i.e., it
1102 is a structural fragment. Note that after performing such a move,
1103 it is not legal to read from the array `a`. There are a number of
1104 ways to deal with this, but the important thing to note is that
1105 the semantics needs to distinguish in some manner between a
1106 fragment that is the *entire* array versus a fragment that represents
1107 all-but-one element of the array. A place where that distinction
1108 would arise is the following:
1111 fn foo(a: [D; 10], b: [D; 10], i: i32, t: bool) -> D {
1118 // When control exits, we will need either to drop all of `a`
1119 // and all-but-one of `b`, or to drop all of `b` and all-but-one
1124 There are a number of ways that the trans backend could choose to
1125 compile this (e.g. a `[bool; 10]` array for each such moved array;
1126 or an `Option<usize>` for each moved array). From the viewpoint of the
1127 borrow-checker, the important thing is to record what kind of fragment
1128 is implied by the relevant moves.
1132 While writing up these docs, I encountered some rules I believe to be
1133 stricter than necessary:
1135 - I think restricting the `&mut` LV against moves and `ALIAS` is sufficient,
1136 `MUTATE` and `CLAIM` are overkill. `MUTATE` was necessary when swap was
1137 a built-in operator, but as it is not, it is implied by `CLAIM`,
1138 and `CLAIM` is implied by `ALIAS`. The only net effect of this is an
1139 extra error message in some cases, though.
1140 - I have not described how closures interact. Current code is unsound.
1141 I am working on describing and implementing the fix.
1142 - If we wish, we can easily extend the move checking to allow finer-grained
1143 tracking of what is initialized and what is not, enabling code like
1146 a = x.f.g; // x.f.g is now uninitialized
1147 // here, x and x.f are not usable, but x.f.h *is*
1148 x.f.g = b; // x.f.g is not initialized
1149 // now x, x.f, x.f.g, x.f.h are all usable
1151 What needs to change here, most likely, is that the `moves` module
1152 should record not only what paths are moved, but what expressions
1153 are actual *uses*. For example, the reference to `x` in `x.f.g = b`
1154 is not a true *use* in the sense that it requires `x` to be fully
1155 initialized. This is in fact why the above code produces an error
1156 today: the reference to `x` in `x.f.g = b` is considered illegal
1157 because `x` is not fully initialized.
1159 There are also some possible refactorings:
1161 - It might be nice to replace all loan paths with the MovePath mechanism,
1162 since they allow lightweight comparison using an integer.