1 // Copyright 2012 The Rust Project Developers. See the COPYRIGHT
2 // file at the top-level directory of this distribution and at
3 // http://rust-lang.org/COPYRIGHT.
5 // Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6 // http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7 // <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8 // option. This file may not be copied, modified, or distributed
9 // except according to those terms.
11 //! # The Borrow Checker
13 //! This pass has the job of enforcing memory safety. This is a subtle
14 //! topic. This docs aim to explain both the practice and the theory
15 //! behind the borrow checker. They start with a high-level overview of
16 //! how it works, and then proceed to dive into the theoretical
17 //! background. Finally, they go into detail on some of the more subtle
20 //! # Table of contents
22 //! These docs are long. Search for the section you are interested in.
26 //! - Borrowing and loans
27 //! - Moves and initialization
28 //! - Drop flags and structural fragments
33 //! The borrow checker checks one function at a time. It operates in two
34 //! passes. The first pass, called `gather_loans`, walks over the function
35 //! and identifies all of the places where borrows (e.g., `&` expressions
36 //! and `ref` bindings) and moves (copies or captures of a linear value)
37 //! occur. It also tracks initialization sites. For each borrow and move,
38 //! it checks various basic safety conditions at this time (for example,
39 //! that the lifetime of the borrow doesn't exceed the lifetime of the
40 //! value being borrowed, or that there is no move out of an `&T`
43 //! It then uses the dataflow module to propagate which of those borrows
44 //! may be in scope at each point in the procedure. A loan is considered
45 //! to come into scope at the expression that caused it and to go out of
46 //! scope when the lifetime of the resulting reference expires.
48 //! Once the in-scope loans are known for each point in the program, the
49 //! borrow checker walks the IR again in a second pass called
50 //! `check_loans`. This pass examines each statement and makes sure that
51 //! it is safe with respect to the in-scope loans.
55 //! Throughout the docs we'll consider a simple subset of Rust in which
56 //! you can only borrow from lvalues, defined like so:
59 //! LV = x | LV.f | *LV
62 //! Here `x` represents some variable, `LV.f` is a field reference,
63 //! and `*LV` is a pointer dereference. There is no auto-deref or other
64 //! niceties. This means that if you have a type like:
67 //! struct S { f: uint }
70 //! and a variable `a: Box<S>`, then the rust expression `a.f` would correspond
71 //! to an `LV` of `(*a).f`.
73 //! Here is the formal grammar for the types we'll consider:
76 //! TY = () | S<'LT...> | Box<TY> | & 'LT MQ TY
77 //! MQ = mut | imm | const
80 //! Most of these types should be pretty self explanatory. Here `S` is a
81 //! struct name and we assume structs are declared like so:
84 //! SD = struct S<'LT...> { (f: TY)... }
87 //! # Borrowing and loans
89 //! ## An intuitive explanation
93 //! Now, imagine we had a program like this:
96 //! struct Foo { f: uint, g: uint }
99 //! let mut x: Box<Foo> = ...;
100 //! let y = &mut (*x).f;
105 //! This is of course dangerous because mutating `x` will free the old
106 //! value and hence invalidate `y`. The borrow checker aims to prevent
107 //! this sort of thing.
109 //! #### Loans and restrictions
111 //! The way the borrow checker works is that it analyzes each borrow
112 //! expression (in our simple model, that's stuff like `&LV`, though in
113 //! real life there are a few other cases to consider). For each borrow
114 //! expression, it computes a `Loan`, which is a data structure that
115 //! records (1) the value being borrowed, (2) the mutability and scope of
116 //! the borrow, and (3) a set of restrictions. In the code, `Loan` is a
117 //! struct defined in `middle::borrowck`. Formally, we define `LOAN` as
121 //! LOAN = (LV, LT, MQ, RESTRICTION*)
122 //! RESTRICTION = (LV, ACTION*)
123 //! ACTION = MUTATE | CLAIM | FREEZE
126 //! Here the `LOAN` tuple defines the lvalue `LV` being borrowed; the
127 //! lifetime `LT` of that borrow; the mutability `MQ` of the borrow; and a
128 //! list of restrictions. The restrictions indicate actions which, if
129 //! taken, could invalidate the loan and lead to type safety violations.
131 //! Each `RESTRICTION` is a pair of a restrictive lvalue `LV` (which will
132 //! either be the path that was borrowed or some prefix of the path that
133 //! was borrowed) and a set of restricted actions. There are three kinds
134 //! of actions that may be restricted for the path `LV`:
136 //! - `MUTATE` means that `LV` cannot be assigned to;
137 //! - `CLAIM` means that the `LV` cannot be borrowed mutably;
138 //! - `FREEZE` means that the `LV` cannot be borrowed immutably;
140 //! Finally, it is never possible to move from an lvalue that appears in a
141 //! restriction. This implies that the "empty restriction" `(LV, [])`,
142 //! which contains an empty set of actions, still has a purpose---it
143 //! prevents moves from `LV`. I chose not to make `MOVE` a fourth kind of
144 //! action because that would imply that sometimes moves are permitted
145 //! from restrictived values, which is not the case.
149 //! To give you a better feeling for what kind of restrictions derived
150 //! from a loan, let's look at the loan `L` that would be issued as a
151 //! result of the borrow `&mut (*x).f` in the example above:
154 //! L = ((*x).f, 'a, mut, RS) where
155 //! RS = [((*x).f, [MUTATE, CLAIM, FREEZE]),
156 //! (*x, [MUTATE, CLAIM, FREEZE]),
157 //! (x, [MUTATE, CLAIM, FREEZE])]
160 //! The loan states that the expression `(*x).f` has been loaned as
161 //! mutable for the lifetime `'a`. Because the loan is mutable, that means
162 //! that the value `(*x).f` may be mutated via the newly created reference
163 //! (and *only* via that pointer). This is reflected in the
164 //! restrictions `RS` that accompany the loan.
166 //! The first restriction `((*x).f, [MUTATE, CLAIM, FREEZE])` states that
167 //! the lender may not mutate, freeze, nor alias `(*x).f`. Mutation is
168 //! illegal because `(*x).f` is only supposed to be mutated via the new
169 //! reference, not by mutating the original path `(*x).f`. Freezing is
170 //! illegal because the path now has an `&mut` alias; so even if we the
171 //! lender were to consider `(*x).f` to be immutable, it might be mutated
172 //! via this alias. They will be enforced for the lifetime `'a` of the
173 //! loan. After the loan expires, the restrictions no longer apply.
175 //! The second restriction on `*x` is interesting because it does not
176 //! apply to the path that was lent (`(*x).f`) but rather to a prefix of
177 //! the borrowed path. This is due to the rules of inherited mutability:
178 //! if the user were to assign to (or freeze) `*x`, they would indirectly
179 //! overwrite (or freeze) `(*x).f`, and thus invalidate the reference
180 //! that was created. In general it holds that when a path is
181 //! lent, restrictions are issued for all the owning prefixes of that
182 //! path. In this case, the path `*x` owns the path `(*x).f` and,
183 //! because `x` is an owned pointer, the path `x` owns the path `*x`.
184 //! Therefore, borrowing `(*x).f` yields restrictions on both
187 //! ### Checking for illegal assignments, moves, and reborrows
189 //! Once we have computed the loans introduced by each borrow, the borrow
190 //! checker uses a data flow propagation to compute the full set of loans
191 //! in scope at each expression and then uses that set to decide whether
192 //! that expression is legal. Remember that the scope of loan is defined
193 //! by its lifetime LT. We sometimes say that a loan which is in-scope at
194 //! a particular point is an "outstanding loan", and the set of
195 //! restrictions included in those loans as the "outstanding
198 //! The kinds of expressions which in-scope loans can render illegal are:
199 //! - *assignments* (`lv = v`): illegal if there is an in-scope restriction
200 //! against mutating `lv`;
201 //! - *moves*: illegal if there is any in-scope restriction on `lv` at all;
202 //! - *mutable borrows* (`&mut lv`): illegal there is an in-scope restriction
203 //! against claiming `lv`;
204 //! - *immutable borrows* (`&lv`): illegal there is an in-scope restriction
205 //! against freezing `lv`.
209 //! Now that we hopefully have some kind of intuitive feeling for how the
210 //! borrow checker works, let's look a bit more closely now at the precise
211 //! conditions that it uses. For simplicity I will ignore const loans.
213 //! I will present the rules in a modified form of standard inference
214 //! rules, which looks as follows:
217 //! PREDICATE(X, Y, Z) // Rule-Name
223 //! The initial line states the predicate that is to be satisfied. The
224 //! indented lines indicate the conditions that must be met for the
225 //! predicate to be satisfied. The right-justified comment states the name
226 //! of this rule: there are comments in the borrowck source referencing
227 //! these names, so that you can cross reference to find the actual code
228 //! that corresponds to the formal rule.
232 //! I want to collect, at a high-level, the invariants the borrow checker
233 //! maintains. I will give them names and refer to them throughout the
234 //! text. Together these invariants are crucial for the overall soundness
237 //! **Mutability requires uniqueness.** To mutate a path
239 //! **Unique mutability.** There is only one *usable* mutable path to any
240 //! given memory at any given time. This implies that when claiming memory
241 //! with an expression like `p = &mut x`, the compiler must guarantee that
242 //! the borrowed value `x` can no longer be mutated so long as `p` is
243 //! live. (This is done via restrictions, read on.)
248 //! ### The `gather_loans` pass
250 //! We start with the `gather_loans` pass, which walks the AST looking for
251 //! borrows. For each borrow, there are three bits of information: the
252 //! lvalue `LV` being borrowed and the mutability `MQ` and lifetime `LT`
253 //! of the resulting pointer. Given those, `gather_loans` applies four
256 //! 1. `MUTABILITY(LV, MQ)`: The mutability of the reference is
257 //! compatible with the mutability of `LV` (i.e., not borrowing immutable
258 //! data as mutable).
260 //! 2. `ALIASABLE(LV, MQ)`: The aliasability of the reference is
261 //! compatible with the aliasability of `LV`. The goal is to prevent
262 //! `&mut` borrows of aliasability data.
264 //! 3. `LIFETIME(LV, LT, MQ)`: The lifetime of the borrow does not exceed
265 //! the lifetime of the value being borrowed.
267 //! 4. `RESTRICTIONS(LV, LT, ACTIONS) = RS`: This pass checks and computes the
268 //! restrictions to maintain memory safety. These are the restrictions
269 //! that will go into the final loan. We'll discuss in more detail below.
271 //! ## Checking mutability
273 //! Checking mutability is fairly straightforward. We just want to prevent
274 //! immutable data from being borrowed as mutable. Note that it is ok to
275 //! borrow mutable data as immutable, since that is simply a
276 //! freeze. Formally we define a predicate `MUTABLE(LV, MQ)` which, if
277 //! defined, means that "borrowing `LV` with mutability `MQ` is ok. The
278 //! Rust code corresponding to this predicate is the function
279 //! `check_mutability` in `middle::borrowck::gather_loans`.
281 //! ### Checking mutability of variables
283 //! *Code pointer:* Function `check_mutability()` in `gather_loans/mod.rs`,
284 //! but also the code in `mem_categorization`.
286 //! Let's begin with the rules for variables, which state that if a
287 //! variable is declared as mutable, it may be borrowed any which way, but
288 //! otherwise the variable must be borrowed as immutable or const:
291 //! MUTABILITY(X, MQ) // M-Var-Mut
294 //! MUTABILITY(X, MQ) // M-Var-Imm
299 //! ### Checking mutability of owned content
301 //! Fields and owned pointers inherit their mutability from
302 //! their base expressions, so both of their rules basically
303 //! delegate the check to the base expression `LV`:
306 //! MUTABILITY(LV.f, MQ) // M-Field
307 //! MUTABILITY(LV, MQ)
309 //! MUTABILITY(*LV, MQ) // M-Deref-Unique
310 //! TYPE(LV) = Box<Ty>
311 //! MUTABILITY(LV, MQ)
314 //! ### Checking mutability of immutable pointer types
316 //! Immutable pointer types like `&T` can only
317 //! be borrowed if MQ is immutable or const:
320 //! MUTABILITY(*LV, MQ) // M-Deref-Borrowed-Imm
322 //! MQ == imm | const
325 //! ### Checking mutability of mutable pointer types
327 //! `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut:
330 //! MUTABILITY(*LV, MQ) // M-Deref-Borrowed-Mut
331 //! TYPE(LV) = &mut Ty
334 //! ## Checking aliasability
336 //! The goal of the aliasability check is to ensure that we never permit
337 //! `&mut` borrows of aliasable data. Formally we define a predicate
338 //! `ALIASABLE(LV, MQ)` which if defined means that
339 //! "borrowing `LV` with mutability `MQ` is ok". The
340 //! Rust code corresponding to this predicate is the function
341 //! `check_aliasability()` in `middle::borrowck::gather_loans`.
343 //! ### Checking aliasability of variables
345 //! Local variables are never aliasable as they are accessible only within
349 //! ALIASABLE(X, MQ) // M-Var-Mut
352 //! ### Checking aliasable of owned content
354 //! Owned content is aliasable if it is found in an aliasable location:
357 //! ALIASABLE(LV.f, MQ) // M-Field
358 //! ALIASABLE(LV, MQ)
360 //! ALIASABLE(*LV, MQ) // M-Deref-Unique
361 //! ALIASABLE(LV, MQ)
364 //! ### Checking mutability of immutable pointer types
366 //! Immutable pointer types like `&T` are aliasable, and hence can only be
367 //! borrowed immutably:
370 //! ALIASABLE(*LV, imm) // M-Deref-Borrowed-Imm
374 //! ### Checking mutability of mutable pointer types
376 //! `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut:
379 //! ALIASABLE(*LV, MQ) // M-Deref-Borrowed-Mut
380 //! TYPE(LV) = &mut Ty
383 //! ## Checking lifetime
385 //! These rules aim to ensure that no data is borrowed for a scope that exceeds
386 //! its lifetime. These two computations wind up being intimately related.
387 //! Formally, we define a predicate `LIFETIME(LV, LT, MQ)`, which states that
388 //! "the lvalue `LV` can be safely borrowed for the lifetime `LT` with mutability
389 //! `MQ`". The Rust code corresponding to this predicate is the module
390 //! `middle::borrowck::gather_loans::lifetime`.
392 //! ### The Scope function
394 //! Several of the rules refer to a helper function `SCOPE(LV)=LT`. The
395 //! `SCOPE(LV)` yields the lifetime `LT` for which the lvalue `LV` is
396 //! guaranteed to exist, presuming that no mutations occur.
398 //! The scope of a local variable is the block where it is declared:
401 //! SCOPE(X) = block where X is declared
404 //! The scope of a field is the scope of the struct:
407 //! SCOPE(LV.f) = SCOPE(LV)
410 //! The scope of a unique referent is the scope of the pointer, since
411 //! (barring mutation or moves) the pointer will not be freed until
412 //! the pointer itself `LV` goes out of scope:
415 //! SCOPE(*LV) = SCOPE(LV) if LV has type Box<T>
418 //! The scope of a borrowed referent is the scope associated with the
419 //! pointer. This is a conservative approximation, since the data that
420 //! the pointer points at may actually live longer:
423 //! SCOPE(*LV) = LT if LV has type &'LT T or &'LT mut T
426 //! ### Checking lifetime of variables
428 //! The rule for variables states that a variable can only be borrowed a
429 //! lifetime `LT` that is a subregion of the variable's scope:
432 //! LIFETIME(X, LT, MQ) // L-Local
436 //! ### Checking lifetime for owned content
438 //! The lifetime of a field or owned pointer is the same as the lifetime
442 //! LIFETIME(LV.f, LT, MQ) // L-Field
443 //! LIFETIME(LV, LT, MQ)
445 //! LIFETIME(*LV, LT, MQ) // L-Deref-Send
446 //! TYPE(LV) = Box<Ty>
447 //! LIFETIME(LV, LT, MQ)
450 //! ### Checking lifetime for derefs of references
452 //! References have a lifetime `LT'` associated with them. The
453 //! data they point at has been guaranteed to be valid for at least this
454 //! lifetime. Therefore, the borrow is valid so long as the lifetime `LT`
455 //! of the borrow is shorter than the lifetime `LT'` of the pointer
459 //! LIFETIME(*LV, LT, MQ) // L-Deref-Borrowed
460 //! TYPE(LV) = <' Ty OR <' mut Ty
464 //! ## Computing the restrictions
466 //! The final rules govern the computation of *restrictions*, meaning that
467 //! we compute the set of actions that will be illegal for the life of the
468 //! loan. The predicate is written `RESTRICTIONS(LV, LT, ACTIONS) =
469 //! RESTRICTION*`, which can be read "in order to prevent `ACTIONS` from
470 //! occurring on `LV`, the restrictions `RESTRICTION*` must be respected
471 //! for the lifetime of the loan".
473 //! Note that there is an initial set of restrictions: these restrictions
474 //! are computed based on the kind of borrow:
477 //! &mut LV => RESTRICTIONS(LV, LT, MUTATE|CLAIM|FREEZE)
478 //! &LV => RESTRICTIONS(LV, LT, MUTATE|CLAIM)
479 //! &const LV => RESTRICTIONS(LV, LT, [])
482 //! The reasoning here is that a mutable borrow must be the only writer,
483 //! therefore it prevents other writes (`MUTATE`), mutable borrows
484 //! (`CLAIM`), and immutable borrows (`FREEZE`). An immutable borrow
485 //! permits other immutable borrows but forbids writes and mutable borrows.
486 //! Finally, a const borrow just wants to be sure that the value is not
487 //! moved out from under it, so no actions are forbidden.
489 //! ### Restrictions for loans of a local variable
491 //! The simplest case is a borrow of a local variable `X`:
494 //! RESTRICTIONS(X, LT, ACTIONS) = (X, ACTIONS) // R-Variable
497 //! In such cases we just record the actions that are not permitted.
499 //! ### Restrictions for loans of fields
501 //! Restricting a field is the same as restricting the owner of that
505 //! RESTRICTIONS(LV.f, LT, ACTIONS) = RS, (LV.f, ACTIONS) // R-Field
506 //! RESTRICTIONS(LV, LT, ACTIONS) = RS
509 //! The reasoning here is as follows. If the field must not be mutated,
510 //! then you must not mutate the owner of the field either, since that
511 //! would indirectly modify the field. Similarly, if the field cannot be
512 //! frozen or aliased, we cannot allow the owner to be frozen or aliased,
513 //! since doing so indirectly freezes/aliases the field. This is the
514 //! origin of inherited mutability.
516 //! ### Restrictions for loans of owned referents
518 //! Because the mutability of owned referents is inherited, restricting an
519 //! owned referent is similar to restricting a field, in that it implies
520 //! restrictions on the pointer. However, owned pointers have an important
521 //! twist: if the owner `LV` is mutated, that causes the owned referent
522 //! `*LV` to be freed! So whenever an owned referent `*LV` is borrowed, we
523 //! must prevent the owned pointer `LV` from being mutated, which means
524 //! that we always add `MUTATE` and `CLAIM` to the restriction set imposed
528 //! RESTRICTIONS(*LV, LT, ACTIONS) = RS, (*LV, ACTIONS) // R-Deref-Send-Pointer
529 //! TYPE(LV) = Box<Ty>
530 //! RESTRICTIONS(LV, LT, ACTIONS|MUTATE|CLAIM) = RS
533 //! ### Restrictions for loans of immutable borrowed referents
535 //! Immutable borrowed referents are freely aliasable, meaning that
536 //! the compiler does not prevent you from copying the pointer. This
537 //! implies that issuing restrictions is useless. We might prevent the
538 //! user from acting on `*LV` itself, but there could be another path
539 //! `*LV1` that refers to the exact same memory, and we would not be
540 //! restricting that path. Therefore, the rule for `&Ty` pointers
541 //! always returns an empty set of restrictions, and it only permits
542 //! restricting `MUTATE` and `CLAIM` actions:
545 //! RESTRICTIONS(*LV, LT, ACTIONS) = [] // R-Deref-Imm-Borrowed
546 //! TYPE(LV) = <' Ty
548 //! ACTIONS subset of [MUTATE, CLAIM]
551 //! The reason that we can restrict `MUTATE` and `CLAIM` actions even
552 //! without a restrictions list is that it is never legal to mutate nor to
553 //! borrow mutably the contents of a `&Ty` pointer. In other words,
554 //! those restrictions are already inherent in the type.
556 //! Clause (1) in the rule for `&Ty` deserves mention. Here I
557 //! specify that the lifetime of the loan must be less than the lifetime
558 //! of the `&Ty` pointer. In simple cases, this clause is redundant, since
559 //! the `LIFETIME()` function will already enforce the required rule:
562 //! fn foo(point: &'a Point) -> &'static f32 {
563 //! &point.x // Error
567 //! The above example fails to compile both because of clause (1) above
568 //! but also by the basic `LIFETIME()` check. However, in more advanced
569 //! examples involving multiple nested pointers, clause (1) is needed:
572 //! fn foo(point: &'a &'b mut Point) -> &'b f32 {
573 //! &point.x // Error
577 //! The `LIFETIME` rule here would accept `'b` because, in fact, the
578 //! *memory is* guaranteed to remain valid (i.e., not be freed) for the
579 //! lifetime `'b`, since the `&mut` pointer is valid for `'b`. However, we
580 //! are returning an immutable reference, so we need the memory to be both
581 //! valid and immutable. Even though `point.x` is referenced by an `&mut`
582 //! pointer, it can still be considered immutable so long as that `&mut`
583 //! pointer is found in an aliased location. That means the memory is
584 //! guaranteed to be *immutable* for the lifetime of the `&` pointer,
585 //! which is only `'a`, not `'b`. Hence this example yields an error.
587 //! As a final twist, consider the case of two nested *immutable*
588 //! pointers, rather than a mutable pointer within an immutable one:
591 //! fn foo(point: &'a &'b Point) -> &'b f32 {
596 //! This function is legal. The reason for this is that the inner pointer
597 //! (`*point : &'b Point`) is enough to guarantee the memory is immutable
598 //! and valid for the lifetime `'b`. This is reflected in
599 //! `RESTRICTIONS()` by the fact that we do not recurse (i.e., we impose
600 //! no restrictions on `LV`, which in this particular case is the pointer
601 //! `point : &'a &'b Point`).
603 //! #### Why both `LIFETIME()` and `RESTRICTIONS()`?
605 //! Given the previous text, it might seem that `LIFETIME` and
606 //! `RESTRICTIONS` should be folded together into one check, but there is
607 //! a reason that they are separated. They answer separate concerns.
608 //! The rules pertaining to `LIFETIME` exist to ensure that we don't
609 //! create a borrowed pointer that outlives the memory it points at. So
610 //! `LIFETIME` prevents a function like this:
613 //! fn get_1<'a>() -> &'a int {
619 //! Here we would be returning a pointer into the stack. Clearly bad.
621 //! However, the `RESTRICTIONS` rules are more concerned with how memory
622 //! is used. The example above doesn't generate an error according to
623 //! `RESTRICTIONS` because, for local variables, we don't require that the
624 //! loan lifetime be a subset of the local variable lifetime. The idea
625 //! here is that we *can* guarantee that `x` is not (e.g.) mutated for the
626 //! lifetime `'a`, even though `'a` exceeds the function body and thus
627 //! involves unknown code in the caller -- after all, `x` ceases to exist
628 //! after we return and hence the remaining code in `'a` cannot possibly
629 //! mutate it. This distinction is important for type checking functions
633 //! fn inc_and_get<'a>(p: &'a mut Point) -> &'a int {
639 //! In this case, we take in a `&mut` and return a frozen borrowed pointer
640 //! with the same lifetime. So long as the lifetime of the returned value
641 //! doesn't exceed the lifetime of the `&mut` we receive as input, this is
642 //! fine, though it may seem surprising at first (it surprised me when I
643 //! first worked it through). After all, we're guaranteeing that `*p`
644 //! won't be mutated for the lifetime `'a`, even though we can't "see" the
645 //! entirety of the code during that lifetime, since some of it occurs in
646 //! our caller. But we *do* know that nobody can mutate `*p` except
647 //! through `p`. So if we don't mutate `*p` and we don't return `p`, then
648 //! we know that the right to mutate `*p` has been lost to our caller --
649 //! in terms of capability, the caller passed in the ability to mutate
650 //! `*p`, and we never gave it back. (Note that we can't return `p` while
651 //! `*p` is borrowed since that would be a move of `p`, as `&mut` pointers
654 //! ### Restrictions for loans of const aliasable referents
656 //! Freeze pointers are read-only. There may be `&mut` or `&` aliases, and
657 //! we can not prevent *anything* but moves in that case. So the
658 //! `RESTRICTIONS` function is only defined if `ACTIONS` is the empty set.
659 //! Because moves from a `&const` lvalue are never legal, it is not
660 //! necessary to add any restrictions at all to the final result.
663 //! RESTRICTIONS(*LV, LT, []) = [] // R-Deref-Freeze-Borrowed
664 //! TYPE(LV) = &const Ty
667 //! ### Restrictions for loans of mutable borrowed referents
669 //! Mutable borrowed pointers are guaranteed to be the only way to mutate
670 //! their referent. This permits us to take greater license with them; for
671 //! example, the referent can be frozen simply be ensuring that we do not
672 //! use the original pointer to perform mutate. Similarly, we can allow
673 //! the referent to be claimed, so long as the original pointer is unused
674 //! while the new claimant is live.
676 //! The rule for mutable borrowed pointers is as follows:
679 //! RESTRICTIONS(*LV, LT, ACTIONS) = RS, (*LV, ACTIONS) // R-Deref-Mut-Borrowed
680 //! TYPE(LV) = <' mut Ty
682 //! RESTRICTIONS(LV, LT, ACTIONS) = RS // (2)
685 //! Let's examine the two numbered clauses:
687 //! Clause (1) specifies that the lifetime of the loan (`LT`) cannot
688 //! exceed the lifetime of the `&mut` pointer (`LT'`). The reason for this
689 //! is that the `&mut` pointer is guaranteed to be the only legal way to
690 //! mutate its referent -- but only for the lifetime `LT'`. After that
691 //! lifetime, the loan on the referent expires and hence the data may be
692 //! modified by its owner again. This implies that we are only able to
693 //! guarantee that the referent will not be modified or aliased for a
694 //! maximum of `LT'`.
696 //! Here is a concrete example of a bug this rule prevents:
699 //! // Test region-reborrow-from-shorter-mut-ref.rs:
700 //! fn copy_pointer<'a,'b,T>(x: &'a mut &'b mut T) -> &'b mut T {
701 //! &mut **p // ERROR due to clause (1)
705 //! let mut y = &mut x; // <-'b-----------------------------+
706 //! // +-'a--------------------+ |
708 //! let z = copy_borrowed_ptr(&mut y); // y is lent |
709 //! *y += 1; // Here y==z, so both should not be usable... |
710 //! *z += 1; // ...and yet they would be, but for clause 1. |
711 //! } // <------------------------------------------------------+
714 //! Clause (2) propagates the restrictions on the referent to the pointer
715 //! itself. This is the same as with an owned pointer, though the
716 //! reasoning is mildly different. The basic goal in all cases is to
717 //! prevent the user from establishing another route to the same data. To
718 //! see what I mean, let's examine various cases of what can go wrong and
719 //! show how it is prevented.
721 //! **Example danger 1: Moving the base pointer.** One of the simplest
722 //! ways to violate the rules is to move the base pointer to a new name
723 //! and access it via that new name, thus bypassing the restrictions on
724 //! the old name. Here is an example:
727 //! // src/test/compile-fail/borrowck-move-mut-base-ptr.rs
728 //! fn foo(t0: &mut int) {
729 //! let p: &int = &*t0; // Freezes `*t0`
730 //! let t1 = t0; //~ ERROR cannot move out of `t0`
731 //! *t1 = 22; // OK, not a write through `*t0`
735 //! Remember that `&mut` pointers are linear, and hence `let t1 = t0` is a
736 //! move of `t0` -- or would be, if it were legal. Instead, we get an
737 //! error, because clause (2) imposes restrictions on `LV` (`t0`, here),
738 //! and any restrictions on a path make it impossible to move from that
741 //! **Example danger 2: Claiming the base pointer.** Another possible
742 //! danger is to mutably borrow the base path. This can lead to two bad
743 //! scenarios. The most obvious is that the mutable borrow itself becomes
744 //! another path to access the same data, as shown here:
747 //! // src/test/compile-fail/borrowck-mut-borrow-of-mut-base-ptr.rs
748 //! fn foo<'a>(mut t0: &'a mut int,
749 //! mut t1: &'a mut int) {
750 //! let p: &int = &*t0; // Freezes `*t0`
751 //! let mut t2 = &mut t0; //~ ERROR cannot borrow `t0`
752 //! **t2 += 1; // Mutates `*t0`
756 //! In this example, `**t2` is the same memory as `*t0`. Because `t2` is
757 //! an `&mut` pointer, `**t2` is a unique path and hence it would be
758 //! possible to mutate `**t2` even though that memory was supposed to be
759 //! frozen by the creation of `p`. However, an error is reported -- the
760 //! reason is that the freeze `&*t0` will restrict claims and mutation
761 //! against `*t0` which, by clause 2, in turn prevents claims and mutation
762 //! of `t0`. Hence the claim `&mut t0` is illegal.
764 //! Another danger with an `&mut` pointer is that we could swap the `t0`
765 //! value away to create a new path:
768 //! // src/test/compile-fail/borrowck-swap-mut-base-ptr.rs
769 //! fn foo<'a>(mut t0: &'a mut int,
770 //! mut t1: &'a mut int) {
771 //! let p: &int = &*t0; // Freezes `*t0`
772 //! swap(&mut t0, &mut t1); //~ ERROR cannot borrow `t0`
777 //! This is illegal for the same reason as above. Note that if we added
778 //! back a swap operator -- as we used to have -- we would want to be very
779 //! careful to ensure this example is still illegal.
781 //! **Example danger 3: Freeze the base pointer.** In the case where the
782 //! referent is claimed, even freezing the base pointer can be dangerous,
783 //! as shown in the following example:
786 //! // src/test/compile-fail/borrowck-borrow-of-mut-base-ptr.rs
787 //! fn foo<'a>(mut t0: &'a mut int,
788 //! mut t1: &'a mut int) {
789 //! let p: &mut int = &mut *t0; // Claims `*t0`
790 //! let mut t2 = &t0; //~ ERROR cannot borrow `t0`
791 //! let q: &int = &*t2; // Freezes `*t0` but not through `*p`
792 //! *p += 1; // violates type of `*q`
796 //! Here the problem is that `*t0` is claimed by `p`, and hence `p` wants
797 //! to be the controlling pointer through which mutation or freezes occur.
798 //! But `t2` would -- if it were legal -- have the type `& &mut int`, and
799 //! hence would be a mutable pointer in an aliasable location, which is
800 //! considered frozen (since no one can write to `**t2` as it is not a
801 //! unique path). Therefore, we could reasonably create a frozen `&int`
802 //! pointer pointing at `*t0` that coexists with the mutable pointer `p`,
803 //! which is clearly unsound.
805 //! However, it is not always unsafe to freeze the base pointer. In
806 //! particular, if the referent is frozen, there is no harm in it:
809 //! // src/test/run-pass/borrowck-borrow-of-mut-base-ptr-safe.rs
810 //! fn foo<'a>(mut t0: &'a mut int,
811 //! mut t1: &'a mut int) {
812 //! let p: &int = &*t0; // Freezes `*t0`
813 //! let mut t2 = &t0;
814 //! let q: &int = &*t2; // Freezes `*t0`, but that's ok...
815 //! let r: &int = &*t0; // ...after all, could do same thing directly.
819 //! In this case, creating the alias `t2` of `t0` is safe because the only
820 //! thing `t2` can be used for is to further freeze `*t0`, which is
821 //! already frozen. In particular, we cannot assign to `*t0` through the
822 //! new alias `t2`, as demonstrated in this test case:
825 //! // src/test/run-pass/borrowck-borrow-mut-base-ptr-in-aliasable-loc.rs
826 //! fn foo(t0: & &mut int) {
828 //! let p: &int = &**t0;
829 //! **t1 = 22; //~ ERROR cannot assign
833 //! This distinction is reflected in the rules. When doing an `&mut`
834 //! borrow -- as in the first example -- the set `ACTIONS` will be
835 //! `CLAIM|MUTATE|FREEZE`, because claiming the referent implies that it
836 //! cannot be claimed, mutated, or frozen by anyone else. These
837 //! restrictions are propagated back to the base path and hence the base
838 //! path is considered unfreezable.
840 //! In contrast, when the referent is merely frozen -- as in the second
841 //! example -- the set `ACTIONS` will be `CLAIM|MUTATE`, because freezing
842 //! the referent implies that it cannot be claimed or mutated but permits
843 //! others to freeze. Hence when these restrictions are propagated back to
844 //! the base path, it will still be considered freezable.
848 //! **FIXME #10520: Restrictions against mutating the base pointer.** When
849 //! an `&mut` pointer is frozen or claimed, we currently pass along the
850 //! restriction against MUTATE to the base pointer. I do not believe this
851 //! restriction is needed. It dates from the days when we had a way to
852 //! mutate that preserved the value being mutated (i.e., swap). Nowadays
853 //! the only form of mutation is assignment, which destroys the pointer
854 //! being mutated -- therefore, a mutation cannot create a new path to the
855 //! same data. Rather, it removes an existing path. This implies that not
856 //! only can we permit mutation, we can have mutation kill restrictions in
857 //! the dataflow sense.
859 //! **WARNING:** We do not currently have `const` borrows in the
860 //! language. If they are added back in, we must ensure that they are
861 //! consistent with all of these examples. The crucial question will be
862 //! what sorts of actions are permitted with a `&const &mut` pointer. I
863 //! would suggest that an `&mut` referent found in an `&const` location be
864 //! prohibited from both freezes and claims. This would avoid the need to
865 //! prevent `const` borrows of the base pointer when the referent is
868 //! # Moves and initialization
870 //! The borrow checker is also in charge of ensuring that:
872 //! - all memory which is accessed is initialized
873 //! - immutable local variables are assigned at most once.
875 //! These are two separate dataflow analyses built on the same
876 //! framework. Let's look at checking that memory is initialized first;
877 //! the checking of immutable local variable assignments works in a very
880 //! To track the initialization of memory, we actually track all the
881 //! points in the program that *create uninitialized memory*, meaning
882 //! moves and the declaration of uninitialized variables. For each of
883 //! these points, we create a bit in the dataflow set. Assignments to a
884 //! variable `x` or path `a.b.c` kill the move/uninitialization bits for
885 //! those paths and any subpaths (e.g., `x`, `x.y`, `a.b.c`, `*a.b.c`).
886 //! Bits are unioned when two control-flow paths join. Thus, the
887 //! presence of a bit indicates that the move may have occurred without an
888 //! intervening assignment to the same memory. At each use of a variable,
889 //! we examine the bits in scope, and check that none of them are
890 //! moves/uninitializations of the variable that is being used.
892 //! Let's look at a simple example:
895 //! fn foo(a: Box<int>) {
896 //! let b: Box<int>; // Gen bit 0.
898 //! if cond { // Bits: 0
900 //! b = a; // Gen bit 1, kill bit 0.
906 //! use(&*a); // Error.
907 //! use(&*b); // Error.
910 //! fn use(a: &int) { }
913 //! In this example, the variable `b` is created uninitialized. In one
914 //! branch of an `if`, we then move the variable `a` into `b`. Once we
915 //! exit the `if`, therefore, it is an error to use `a` or `b` since both
916 //! are only conditionally initialized. I have annotated the dataflow
917 //! state using comments. There are two dataflow bits, with bit 0
918 //! corresponding to the creation of `b` without an initializer, and bit 1
919 //! corresponding to the move of `a`. The assignment `b = a` both
920 //! generates bit 1, because it is a move of `a`, and kills bit 0, because
921 //! `b` is now initialized. On the else branch, though, `b` is never
922 //! initialized, and so bit 0 remains untouched. When the two flows of
923 //! control join, we union the bits from both sides, resulting in both
924 //! bits 0 and 1 being set. Thus any attempt to use `a` uncovers the bit 1
925 //! from the "then" branch, showing that `a` may be moved, and any attempt
926 //! to use `b` uncovers bit 0, from the "else" branch, showing that `b`
927 //! may not be initialized.
929 //! ## Initialization of immutable variables
931 //! Initialization of immutable variables works in a very similar way,
934 //! 1. we generate bits for each assignment to a variable;
935 //! 2. the bits are never killed except when the variable goes out of scope.
937 //! Thus the presence of an assignment bit indicates that the assignment
938 //! may have occurred. Note that assignments are only killed when the
939 //! variable goes out of scope, as it is not relevant whether or not there
940 //! has been a move in the meantime. Using these bits, we can declare that
941 //! an assignment to an immutable variable is legal iff there is no other
942 //! assignment bit to that same variable in scope.
944 //! ## Why is the design made this way?
946 //! It may seem surprising that we assign dataflow bits to *each move*
947 //! rather than *each path being moved*. This is somewhat less efficient,
948 //! since on each use, we must iterate through all moves and check whether
949 //! any of them correspond to the path in question. Similar concerns apply
950 //! to the analysis for double assignments to immutable variables. The
951 //! main reason to do it this way is that it allows us to print better
952 //! error messages, because when a use occurs, we can print out the
953 //! precise move that may be in scope, rather than simply having to say
954 //! "the variable may not be initialized".
956 //! ## Data structures used in the move analysis
958 //! The move analysis maintains several data structures that enable it to
959 //! cross-reference moves and assignments to determine when they may be
960 //! moving/assigning the same memory. These are all collected into the
961 //! `MoveData` and `FlowedMoveData` structs. The former represents the set
962 //! of move paths, moves, and assignments, and the latter adds in the
963 //! results of a dataflow computation.
967 //! The `MovePath` tree tracks every path that is moved or assigned to.
968 //! These paths have the same form as the `LoanPath` data structure, which
969 //! in turn is the "real world version of the lvalues `LV` that we
970 //! introduced earlier. The difference between a `MovePath` and a `LoanPath`
971 //! is that move paths are:
973 //! 1. Canonicalized, so that we have exactly one copy of each, and
974 //! we can refer to move paths by index;
975 //! 2. Cross-referenced with other paths into a tree, so that given a move
976 //! path we can efficiently find all parent move paths and all
977 //! extensions (e.g., given the `a.b` move path, we can easily find the
978 //! move path `a` and also the move paths `a.b.c`)
979 //! 3. Cross-referenced with moves and assignments, so that we can
980 //! easily find all moves and assignments to a given path.
982 //! The mechanism that we use is to create a `MovePath` record for each
983 //! move path. These are arranged in an array and are referenced using
984 //! `MovePathIndex` values, which are newtype'd indices. The `MovePath`
985 //! structs are arranged into a tree, representing using the standard
986 //! Knuth representation where each node has a child 'pointer' and a "next
987 //! sibling" 'pointer'. In addition, each `MovePath` has a parent
988 //! 'pointer'. In this case, the 'pointers' are just `MovePathIndex`
991 //! In this way, if we want to find all base paths of a given move path,
992 //! we can just iterate up the parent pointers (see `each_base_path()` in
993 //! the `move_data` module). If we want to find all extensions, we can
994 //! iterate through the subtree (see `each_extending_path()`).
996 //! ### Moves and assignments
998 //! There are structs to represent moves (`Move`) and assignments
999 //! (`Assignment`), and these are also placed into arrays and referenced
1000 //! by index. All moves of a particular path are arranged into a linked
1001 //! lists, beginning with `MovePath.first_move` and continuing through
1002 //! `Move.next_move`.
1004 //! We distinguish between "var" assignments, which are assignments to a
1005 //! variable like `x = foo`, and "path" assignments (`x.f = foo`). This
1006 //! is because we need to assign dataflows to the former, but not the
1007 //! latter, so as to check for double initialization of immutable
1010 //! ### Gathering and checking moves
1012 //! Like loans, we distinguish two phases. The first, gathering, is where
1013 //! we uncover all the moves and assignments. As with loans, we do some
1014 //! basic sanity checking in this phase, so we'll report errors if you
1015 //! attempt to move out of a borrowed pointer etc. Then we do the dataflow
1016 //! (see `FlowedMoveData::new`). Finally, in the `check_loans.rs` code, we
1017 //! walk back over, identify all uses, assignments, and captures, and
1018 //! check that they are legal given the set of dataflow bits we have
1019 //! computed for that program point.
1021 //! # Drop flags and structural fragments
1023 //! In addition to the job of enforcing memory safety, the borrow checker
1024 //! code is also responsible for identifying the *structural fragments* of
1025 //! data in the function, to support out-of-band dynamic drop flags
1026 //! allocated on the stack. (For background, see [RFC PR #320].)
1028 //! [RFC PR #320]: https://github.com/rust-lang/rfcs/pull/320
1030 //! Semantically, each piece of data that has a destructor may need a
1031 //! boolean flag to indicate whether or not its destructor has been run
1032 //! yet. However, in many cases there is no need to actually maintain such
1033 //! a flag: It can be apparent from the code itself that a given path is
1034 //! always initialized (or always deinitialized) when control reaches the
1035 //! end of its owner's scope, and thus we can unconditionally emit (or
1036 //! not) the destructor invocation for that path.
1038 //! A simple example of this is the following:
1041 //! struct D { p: int }
1042 //! impl D { fn new(x: int) -> D { ... }
1043 //! impl Drop for D { ... }
1045 //! fn foo(a: D, b: D, t: || -> bool) {
1048 //! if t() { c = b; }
1052 //! At the end of the body of `foo`, the compiler knows that `a` is
1053 //! initialized, introducing a drop obligation (deallocating the boxed
1054 //! integer) for the end of `a`'s scope that is run unconditionally.
1055 //! Likewise the compiler knows that `d` is not initialized, and thus it
1056 //! leave out the drop code for `d`.
1058 //! The compiler cannot statically know the drop-state of `b` nor `c` at
1059 //! the end of their scope, since that depends on the value of
1060 //! `t`. Therefore, we need to insert boolean flags to track whether we
1061 //! need to drop `b` and `c`.
1063 //! However, the matter is not as simple as just mapping local variables
1064 //! to their corresponding drop flags when necessary. In particular, in
1065 //! addition to being able to move data out of local variables, Rust
1066 //! allows one to move values in and out of structured data.
1068 //! Consider the following:
1071 //! struct S { x: D, y: D, z: D }
1073 //! fn foo(a: S, mut b: S, t: || -> bool) {
1076 //! let e: S = a.clone();
1081 //! if t() { c.y = D::new(4); }
1085 //! As before, the drop obligations of `a` and `d` can be statically
1086 //! determined, and again the state of `b` and `c` depend on dynamic
1087 //! state. But additionally, the dynamic drop obligations introduced by
1088 //! `b` and `c` are not just per-local boolean flags. For example, if the
1089 //! first call to `t` returns `false` and the second call `true`, then at
1090 //! the end of their scope, `b` will be completely initialized, but only
1091 //! `c.y` in `c` will be initialized. If both calls to `t` return `true`,
1092 //! then at the end of their scope, `c` will be completely initialized,
1093 //! but only `b.x` will be initialized in `b`, and only `e.x` and `e.z`
1094 //! will be initialized in `e`.
1096 //! Note that we need to cover the `z` field in each case in some way,
1097 //! since it may (or may not) need to be dropped, even though `z` is never
1098 //! directly mentioned in the body of the `foo` function. We call a path
1099 //! like `b.z` a *fragment sibling* of `b.x`, since the field `z` comes
1100 //! from the same structure `S` that declared the field `x` in `b.x`.
1102 //! In general we need to maintain boolean flags that match the
1103 //! `S`-structure of both `b` and `c`. In addition, we need to consult
1104 //! such a flag when doing an assignment (such as `c.y = D::new(4);`
1105 //! above), in order to know whether or not there is a previous value that
1106 //! needs to be dropped before we do the assignment.
1108 //! So for any given function, we need to determine what flags are needed
1109 //! to track its drop obligations. Our strategy for determining the set of
1110 //! flags is to represent the fragmentation of the structure explicitly:
1111 //! by starting initially from the paths that are explicitly mentioned in
1112 //! moves and assignments (such as `b.x` and `c.y` above), and then
1113 //! traversing the structure of the path's type to identify leftover
1114 //! *unmoved fragments*: assigning into `c.y` means that `c.x` and `c.z`
1115 //! are leftover unmoved fragments. Each fragment represents a drop
1116 //! obligation that may need to be tracked. Paths that are only moved or
1117 //! assigned in their entirety (like `a` and `d`) are treated as a single
1118 //! drop obligation.
1120 //! The fragment construction process works by piggy-backing on the
1121 //! existing `move_data` module. We already have callbacks that visit each
1122 //! direct move and assignment; these form the basis for the sets of
1123 //! moved_leaf_paths and assigned_leaf_paths. From these leaves, we can
1124 //! walk up their parent chain to identify all of their parent paths.
1125 //! We need to identify the parents because of cases like the following:
1128 //! struct Pair<X,Y>{ x: X, y: Y }
1129 //! fn foo(dd_d_d: Pair<Pair<Pair<D, D>, D>, D>) {
1130 //! other_function(dd_d_d.x.y);
1134 //! In this code, the move of the path `dd_d.x.y` leaves behind not only
1135 //! the fragment drop-obligation `dd_d.x.x` but also `dd_d.y` as well.
1137 //! Once we have identified the directly-referenced leaves and their
1138 //! parents, we compute the left-over fragments, in the function
1139 //! `fragments::add_fragment_siblings`. As of this writing this works by
1140 //! looking at each directly-moved or assigned path P, and blindly
1141 //! gathering all sibling fields of P (as well as siblings for the parents
1142 //! of P, etc). After accumulating all such siblings, we filter out the
1143 //! entries added as siblings of P that turned out to be
1144 //! directly-referenced paths (or parents of directly referenced paths)
1145 //! themselves, thus leaving the never-referenced "left-overs" as the only
1146 //! thing left from the gathering step.
1148 //! ## Array structural fragments
1150 //! A special case of the structural fragments discussed above are
1151 //! the elements of an array that has been passed by value, such as
1155 //! fn foo(a: [D, ..10], i: uint) -> D {
1160 //! The above code moves a single element out of the input array `a`.
1161 //! The remainder of the array still needs to be dropped; i.e., it
1162 //! is a structural fragment. Note that after performing such a move,
1163 //! it is not legal to read from the array `a`. There are a number of
1164 //! ways to deal with this, but the important thing to note is that
1165 //! the semantics needs to distinguish in some manner between a
1166 //! fragment that is the *entire* array versus a fragment that represents
1167 //! all-but-one element of the array. A place where that distinction
1168 //! would arise is the following:
1171 //! fn foo(a: [D, ..10], b: [D, ..10], i: uint, t: bool) -> D {
1178 //! // When control exits, we will need either to drop all of `a`
1179 //! // and all-but-one of `b`, or to drop all of `b` and all-but-one
1184 //! There are a number of ways that the trans backend could choose to
1185 //! compile this (e.g. a `[bool, ..10]` array for each such moved array;
1186 //! or an `Option<uint>` for each moved array). From the viewpoint of the
1187 //! borrow-checker, the important thing is to record what kind of fragment
1188 //! is implied by the relevant moves.
1192 //! While writing up these docs, I encountered some rules I believe to be
1193 //! stricter than necessary:
1195 //! - I think restricting the `&mut` LV against moves and `ALIAS` is sufficient,
1196 //! `MUTATE` and `CLAIM` are overkill. `MUTATE` was necessary when swap was
1197 //! a built-in operator, but as it is not, it is implied by `CLAIM`,
1198 //! and `CLAIM` is implied by `ALIAS`. The only net effect of this is an
1199 //! extra error message in some cases, though.
1200 //! - I have not described how closures interact. Current code is unsound.
1201 //! I am working on describing and implementing the fix.
1202 //! - If we wish, we can easily extend the move checking to allow finer-grained
1203 //! tracking of what is initialized and what is not, enabling code like
1206 //! a = x.f.g; // x.f.g is now uninitialized
1207 //! // here, x and x.f are not usable, but x.f.h *is*
1208 //! x.f.g = b; // x.f.g is not initialized
1209 //! // now x, x.f, x.f.g, x.f.h are all usable
1211 //! What needs to change here, most likely, is that the `moves` module
1212 //! should record not only what paths are moved, but what expressions
1213 //! are actual *uses*. For example, the reference to `x` in `x.f.g = b`
1214 //! is not a true *use* in the sense that it requires `x` to be fully
1215 //! initialized. This is in fact why the above code produces an error
1216 //! today: the reference to `x` in `x.f.g = b` is considered illegal
1217 //! because `x` is not fully initialized.
1219 //! There are also some possible refactorings:
1221 //! - It might be nice to replace all loan paths with the MovePath mechanism,
1222 //! since they allow lightweight comparison using an integer.