src/librustc/dep_graph/README.md

   1 # Dependency graph for incremental compilation
   2
   3 This module contains the infrastructure for managing the incremental
   4 compilation dependency graph. This README aims to explain how it ought
   5 to be used. In this document, we'll first explain the overall
   6 strategy, and then share some tips for handling specific scenarios.
   7
   8 The high-level idea is that we want to instrument the compiler to
   9 track which parts of the AST and other IR are read/written by what.
  10 This way, when we come back later, we can look at this graph and
  11 determine what work needs to be redone.
  12
  13 ### The dependency graph
  14
  15 The nodes of the graph are defined by the enum `DepNode`. They represent
  16 one of three things:
  17
  18 1. HIR nodes (like `Hir(DefId)`) represent the HIR input itself.
  19 2. Data nodes (like `TypeOfItem(DefId)`) represent some computed
  20    information about a particular item.
  21 3. Procedure nodes (like `CoherenceCheckTrait(DefId)`) represent some
  22    procedure that is executing. Usually this procedure is
  23    performing some kind of check for errors. You can think of them as
  24    computed values where the value being computed is `()` (and the
  25    value may fail to be computed, if an error results).
  26
  27 An edge `N1 -> N2` is added between two nodes if either:
  28
  29 - the value of `N1` is used to compute `N2`;
  30 - `N1` is read by the procedure `N2`;
  31 - the procedure `N1` writes the value `N2`.
  32
  33 The latter two conditions are equivalent to the first one if you think
  34 of procedures as values.
  35
  36 ### Basic tracking
  37
  38 There is a very general strategy to ensure that you have a correct, if
  39 sometimes overconservative, dependency graph. The two main things you have
  40 to do are (a) identify shared state and (b) identify the current tasks.
  41
  42 ### Identifying shared state
  43
  44 Identify "shared state" that will be written by one pass and read by
  45 another. In particular, we need to identify shared state that will be
  46 read "across items" -- that is, anything where changes in one item
  47 could invalidate work done for other items. So, for example:
  48
  49 1. The signature for a function is "shared state".
  50 2. The computed type of some expression in the body of a function is
  51    not shared state, because if it changes it does not itself
  52    invalidate other functions (though it may be that it causes new
  53    monomorphizations to occur, but that's handled independently).
  54
  55 Put another way: if the HIR for an item changes, we are going to
  56 recompile that item for sure. But we need the dep tracking map to tell
  57 us what *else* we have to recompile. Shared state is anything that is
  58 used to communicate results from one item to another.
  59
  60 ### Identifying the current task, tracking reads/writes, etc
  61
  62 FIXME(#42293). This text needs to be rewritten for the new red-green
  63 system, which doesn't fully exist yet.
  64
  65 #### Dependency tracking map
  66
  67 `DepTrackingMap` is a particularly convenient way to correctly store
  68 shared state. A `DepTrackingMap` is a special hashmap that will add
  69 edges automatically when `get` and `insert` are called. The idea is
  70 that, when you get/insert a value for the key `K`, we will add an edge
  71 from/to the node `DepNode::Variant(K)` (for some variant specific to
  72 the map).
  73
  74 Each `DepTrackingMap` is parameterized by a special type `M` that
  75 implements `DepTrackingMapConfig`; this trait defines the key and value
  76 types of the map, and also defines a fn for converting from the key to
  77 a `DepNode` label. You don't usually have to muck about with this by
  78 hand, there is a macro for creating it. You can see the complete set
  79 of `DepTrackingMap` definitions in `librustc/middle/ty/maps.rs`.
  80
  81 As an example, let's look at the `adt_defs` map. The `adt_defs` map
  82 maps from the def-id of a struct/enum to its `AdtDef`. It is defined
  83 using this macro:
  84
  85 ```rust
  86 dep_map_ty! { AdtDefs: ItemSignature(DefId) -> ty::AdtDefMaster<'tcx> }
  87 //            ~~~~~~~  ~~~~~~~~~~~~~ ~~~~~     ~~~~~~~~~~~~~~~~~~~~~~
  88 //               |           |      Key type       Value type
  89 //               |    DepNode variant
  90 //      Name of map id type
  91 ```
  92
  93 this indicates that a map id type `AdtDefs` will be created. The key
  94 of the map will be a `DefId` and value will be
  95 `ty::AdtDefMaster<'tcx>`. The `DepNode` will be created by
  96 `DepNode::ItemSignature(K)` for a given key.
  97
  98 Once that is done, you can just use the `DepTrackingMap` like any
  99 other map:
 100
 101 ```rust
 102 let mut map: DepTrackingMap<M> = DepTrackingMap::new(dep_graph);
 103 map.insert(key, value); // registers dep_graph.write
 104 map.get(key; // registers dep_graph.read
 105 ```
 106
 107 #### Memoization
 108
 109 One particularly interesting case is memoization. If you have some
 110 shared state that you compute in a memoized fashion, the correct thing
 111 to do is to define a `RefCell<DepTrackingMap>` for it and use the
 112 `memoize` helper:
 113
 114 ```rust
 115 map.memoize(key, || /* compute value */)
 116 ```
 117
 118 This will create a graph that looks like
 119
 120     ... -> MapVariant(key) -> CurrentTask
 121
 122 where `MapVariant` is the `DepNode` variant that the map is associated with,
 123 and `...` are whatever edges the `/* compute value */` closure creates.
 124
 125 In particular, using the memoize helper is much better than writing
 126 the obvious code yourself:
 127
 128 ```rust
 129 if let Some(result) = map.get(key) {
 130     return result;
 131 }
 132 let value = /* compute value */;
 133 map.insert(key, value);
 134 ```
 135
 136 If you write that code manually, the dependency graph you get will
 137 include artificial edges that are not necessary. For example, imagine that
 138 two tasks, A and B, both invoke the manual memoization code, but A happens
 139 to go first. The resulting graph will be:
 140
 141     ... -> A -> MapVariant(key) -> B
 142     ~~~~~~~~~~~~~~~~~~~~~~~~~~~       // caused by A writing to MapVariant(key)
 143                 ~~~~~~~~~~~~~~~~~~~~  // caused by B reading from MapVariant(key)
 144
 145 This graph is not *wrong*, but it encodes a path from A to B that
 146 should not exist.  In contrast, using the memoized helper, you get:
 147
 148     ... -> MapVariant(key) -> A
 149                  |
 150                  +----------> B
 151
 152 which is much cleaner.
 153
 154 **Be aware though that the closure is executed with `MapVariant(key)`
 155 pushed onto the stack as the current task!** That means that you must
 156 add explicit `read` calls for any shared state that it accesses
 157 implicitly from its environment. See the section on "explicit calls to
 158 read and write when starting a new subtask" above for more details.
 159
 160 ### How to decide where to introduce a new task
 161
 162 Certainly, you need at least one task on the stack: any attempt to
 163 `read` or `write` shared state will panic if there is no current
 164 task. But where does it make sense to introduce subtasks? The basic
 165 rule is that a subtask makes sense for any discrete unit of work you
 166 may want to skip in the future. Adding a subtask separates out the
 167 reads/writes from *that particular subtask* versus the larger
 168 context. An example: you might have a 'meta' task for all of borrow
 169 checking, and then subtasks for borrow checking individual fns.  (Seen
 170 in this light, memoized computations are just a special case where we
 171 may want to avoid redoing the work even within the context of one
 172 compilation.)
 173
 174 The other case where you might want a subtask is to help with refining
 175 the reads/writes for some later bit of work that needs to be memoized.
 176 For example, we create a subtask for type-checking the body of each
 177 fn.  However, in the initial version of incr. comp. at least, we do
 178 not expect to actually *SKIP* type-checking -- we only expect to skip
 179 trans. However, it's still useful to create subtasks for type-checking
 180 individual items, because, otherwise, if a fn sig changes, we won't
 181 know which callers are affected -- in fact, because the graph would be
 182 so coarse, we'd just have to retrans everything, since we can't
 183 distinguish which fns used which fn sigs.
 184
 185 ### Testing the dependency graph
 186
 187 There are various ways to write tests against the dependency graph.
 188 The simplest mechanism are the
 189 `#[rustc_if_this_changed]` and `#[rustc_then_this_would_need]`
 190 annotations. These are used in compile-fail tests to test whether the
 191 expected set of paths exist in the dependency graph. As an example,
 192 see `src/test/compile-fail/dep-graph-caller-callee.rs`.
 193
 194 The idea is that you can annotate a test like:
 195
 196 ```rust
 197 #[rustc_if_this_changed]
 198 fn foo() { }
 199
 200 #[rustc_then_this_would_need(TypeckTables)] //~ ERROR OK
 201 fn bar() { foo(); }
 202
 203 #[rustc_then_this_would_need(TypeckTables)] //~ ERROR no path
 204 fn baz() { }
 205 ```
 206
 207 This will check whether there is a path in the dependency graph from
 208 `Hir(foo)` to `TypeckTables(bar)`. An error is reported for each
 209 `#[rustc_then_this_would_need]` annotation that indicates whether a
 210 path exists. `//~ ERROR` annotations can then be used to test if a
 211 path is found (as demonstrated above).
 212
 213 ### Debugging the dependency graph
 214
 215 #### Dumping the graph
 216
 217 The compiler is also capable of dumping the dependency graph for your
 218 debugging pleasure. To do so, pass the `-Z dump-dep-graph` flag. The
 219 graph will be dumped to `dep_graph.{txt,dot}` in the current
 220 directory.  You can override the filename with the `RUST_DEP_GRAPH`
 221 environment variable.
 222
 223 Frequently, though, the full dep graph is quite overwhelming and not
 224 particularly helpful. Therefore, the compiler also allows you to filter
 225 the graph. You can filter in three ways:
 226
 227 1. All edges originating in a particular set of nodes (usually a single node).
 228 2. All edges reaching a particular set of nodes.
 229 3. All edges that lie between given start and end nodes.
 230
 231 To filter, use the `RUST_DEP_GRAPH_FILTER` environment variable, which should
 232 look like one of the following:
 233
 234 ```
 235 source_filter     // nodes originating from source_filter
 236 -> target_filter  // nodes that can reach target_filter
 237 source_filter -> target_filter // nodes in between source_filter and target_filter
 238 ```
 239
 240 `source_filter` and `target_filter` are a `&`-separated list of strings.
 241 A node is considered to match a filter if all of those strings appear in its
 242 label. So, for example:
 243
 244 ```
 245 RUST_DEP_GRAPH_FILTER='-> TypeckTables'
 246 ```
 247
 248 would select the predecessors of all `TypeckTables` nodes. Usually though you
 249 want the `TypeckTables` node for some particular fn, so you might write:
 250
 251 ```
 252 RUST_DEP_GRAPH_FILTER='-> TypeckTables & bar'
 253 ```
 254
 255 This will select only the `TypeckTables` nodes for fns with `bar` in their name.
 256
 257 Perhaps you are finding that when you change `foo` you need to re-type-check `bar`,
 258 but you don't think you should have to. In that case, you might do:
 259
 260 ```
 261 RUST_DEP_GRAPH_FILTER='Hir&foo -> TypeckTables & bar'
 262 ```
 263
 264 This will dump out all the nodes that lead from `Hir(foo)` to
 265 `TypeckTables(bar)`, from which you can (hopefully) see the source
 266 of the erroneous edge.
 267
 268 #### Tracking down incorrect edges
 269
 270 Sometimes, after you dump the dependency graph, you will find some
 271 path that should not exist, but you will not be quite sure how it came
 272 to be. **When the compiler is built with debug assertions,** it can
 273 help you track that down. Simply set the `RUST_FORBID_DEP_GRAPH_EDGE`
 274 environment variable to a filter. Every edge created in the dep-graph
 275 will be tested against that filter -- if it matches, a `bug!` is
 276 reported, so you can easily see the backtrace (`RUST_BACKTRACE=1`).
 277
 278 The syntax for these filters is the same as described in the previous
 279 section. However, note that this filter is applied to every **edge**
 280 and doesn't handle longer paths in the graph, unlike the previous
 281 section.
 282
 283 Example:
 284
 285 You find that there is a path from the `Hir` of `foo` to the type
 286 check of `bar` and you don't think there should be. You dump the
 287 dep-graph as described in the previous section and open `dep-graph.txt`
 288 to see something like:
 289
 290     Hir(foo) -> Collect(bar)
 291     Collect(bar) -> TypeckTables(bar)
 292
 293 That first edge looks suspicious to you. So you set
 294 `RUST_FORBID_DEP_GRAPH_EDGE` to `Hir&foo -> Collect&bar`, re-run, and
 295 then observe the backtrace. Voila, bug fixed!