src/librustc/ty/query/README.md

   1 # The Rust Compiler Query System
   2
   3 The Compiler Query System is the key to our new demand-driven
   4 organization.  The idea is pretty simple. You have various queries
   5 that compute things about the input -- for example, there is a query
   6 called `type_of(def_id)` that, given the def-id of some item, will
   7 compute the type of that item and return it to you.
   8
   9 Query execution is **memoized** -- so the first time you invoke a
  10 query, it will go do the computation, but the next time, the result is
  11 returned from a hashtable. Moreover, query execution fits nicely into
  12 **incremental computation**; the idea is roughly that, when you do a
  13 query, the result **may** be returned to you by loading stored data
  14 from disk (but that's a separate topic we won't discuss further here).
  15
  16 The overall vision is that, eventually, the entire compiler
  17 control-flow will be query driven. There will effectively be one
  18 top-level query ("compile") that will run compilation on a crate; this
  19 will in turn demand information about that crate, starting from the
  20 *end*.  For example:
  21
  22 - This "compile" query might demand to get a list of codegen-units
  23   (i.e., modules that need to be compiled by LLVM).
  24 - But computing the list of codegen-units would invoke some subquery
  25   that returns the list of all modules defined in the Rust source.
  26 - That query in turn would invoke something asking for the HIR.
  27 - This keeps going further and further back until we wind up doing the
  28   actual parsing.
  29
  30 However, that vision is not fully realized. Still, big chunks of the
  31 compiler (for example, generating MIR) work exactly like this.
  32
  33 ### Invoking queries
  34
  35 To invoke a query is simple. The tcx ("type context") offers a method
  36 for each defined query. So, for example, to invoke the `type_of`
  37 query, you would just do this:
  38
  39 ```rust
  40 let ty = tcx.type_of(some_def_id);
  41 ```
  42
  43 ### Cycles between queries
  44
  45 Currently, cycles during query execution should always result in a
  46 compilation error. Typically, they arise because of illegal programs
  47 that contain cyclic references they shouldn't (though sometimes they
  48 arise because of compiler bugs, in which case we need to factor our
  49 queries in a more fine-grained fashion to avoid them).
  50
  51 However, it is nonetheless often useful to *recover* from a cycle
  52 (after reporting an error, say) and try to soldier on, so as to give a
  53 better user experience. In order to recover from a cycle, you don't
  54 get to use the nice method-call-style syntax. Instead, you invoke
  55 using the `try_get` method, which looks roughly like this:
  56
  57 ```rust
  58 use ty::query::queries;
  59 ...
  60 match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
  61   Ok(result) => {
  62     // no cycle occurred! You can use `result`
  63   }
  64   Err(err) => {
  65     // A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
  66     // meaning essentially an "in-progress", not-yet-reported error message.
  67     // See below for more details on what to do here.
  68   }
  69 }
  70 ```
  71
  72 So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that
  73 you must ensure that a compiler error message is reported. You can do that in two ways:
  74
  75 The simplest is to invoke `err.emit()`. This will emit the cycle error to the user.
  76
  77 However, often cycles happen because of an illegal program, and you
  78 know at that point that an error either already has been reported or
  79 will be reported due to this cycle by some other bit of code. In that
  80 case, you can invoke `err.cancel()` to not emit any error. It is
  81 traditional to then invoke:
  82
  83 ```
  84 tcx.sess.delay_span_bug(some_span, "some message")
  85 ```
  86
  87 `delay_span_bug()` is a helper that says: we expect a compilation
  88 error to have happened or to happen in the future; so, if compilation
  89 ultimately succeeds, make an ICE with the message `"some
  90 message"`. This is basically just a precaution in case you are wrong.
  91
  92 ### How the compiler executes a query
  93
  94 So you may be wondering what happens when you invoke a query
  95 method. The answer is that, for each query, the compiler maintains a
  96 cache -- if your query has already been executed, then, the answer is
  97 simple: we clone the return value out of the cache and return it
  98 (therefore, you should try to ensure that the return types of queries
  99 are cheaply cloneable; insert a `Rc` if necessary).
 100
 101 #### Providers
 102
 103 If, however, the query is *not* in the cache, then the compiler will
 104 try to find a suitable **provider**. A provider is a function that has
 105 been defined and linked into the compiler somewhere that contains the
 106 code to compute the result of the query.
 107
 108 **Providers are defined per-crate.** The compiler maintains,
 109 internally, a table of providers for every crate, at least
 110 conceptually. Right now, there are really two sets: the providers for
 111 queries about the **local crate** (that is, the one being compiled)
 112 and providers for queries about **external crates** (that is,
 113 dependencies of the local crate). Note that what determines the crate
 114 that a query is targeting is not the *kind* of query, but the *key*.
 115 For example, when you invoke `tcx.type_of(def_id)`, that could be a
 116 local query or an external query, depending on what crate the `def_id`
 117 is referring to (see the `self::keys::Key` trait for more information
 118 on how that works).
 119
 120 Providers always have the same signature:
 121
 122 ```rust
 123 fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
 124                        key: QUERY_KEY)
 125                        -> QUERY_RESULT
 126 {
 127     ...
 128 }
 129 ```
 130
 131 Providers take two arguments: the `tcx` and the query key. Note also
 132 that they take the *global* tcx (i.e., they use the `'tcx` lifetime
 133 twice), rather than taking a tcx with some active inference context.
 134 They return the result of the query.
 135
 136 ####  How providers are setup
 137
 138 When the tcx is created, it is given the providers by its creator using
 139 the `Providers` struct. This struct is generate by the macros here, but it
 140 is basically a big list of function pointers:
 141
 142 ```rust
 143 struct Providers {
 144     type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
 145     ...
 146 }
 147 ```
 148
 149 At present, we have one copy of the struct for local crates, and one
 150 for external crates, though the plan is that we may eventually have
 151 one per crate.
 152
 153 These `Provider` structs are ultimately created and populated by
 154 `librustc_driver`, but it does this by distributing the work
 155 throughout the other `rustc_*` crates. This is done by invoking
 156 various `provide` functions. These functions tend to look something
 157 like this:
 158
 159 ```rust
 160 pub fn provide(providers: &mut Providers) {
 161     *providers = Providers {
 162         type_of,
 163         ..*providers
 164     };
 165 }
 166 ```
 167
 168 That is, they take an `&mut Providers` and mutate it in place. Usually
 169 we use the formulation above just because it looks nice, but you could
 170 as well do `providers.type_of = type_of`, which would be equivalent.
 171 (Here, `type_of` would be a top-level function, defined as we saw
 172 before.) So, if we want to add a provider for some other query,
 173 let's call it `fubar`, into the crate above, we might modify the `provide()`
 174 function like so:
 175
 176 ```rust
 177 pub fn provide(providers: &mut Providers) {
 178     *providers = Providers {
 179         type_of,
 180         fubar,
 181         ..*providers
 182     };
 183 }
 184
 185 fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. }
 186 ```
 187
 188 NB. Most of the `rustc_*` crates only provide **local
 189 providers**. Almost all **extern providers** wind up going through the
 190 `rustc_metadata` crate, which loads the information from the crate
 191 metadata.  But in some cases there are crates that provide queries for
 192 *both* local and external crates, in which case they define both a
 193 `provide` and a `provide_extern` function that `rustc_driver` can
 194 invoke.
 195
 196 ### Adding a new kind of query
 197
 198 So suppose you want to add a new kind of query, how do you do so?
 199 Well, defining a query takes place in two steps:
 200
 201 1. first, you have to specify the query name and arguments; and then,
 202 2. you have to supply query providers where needed.
 203
 204 To specify the query name and arguments, you simply add an entry
 205 to the big macro invocation in `mod.rs`. This will probably have changed
 206 by the time you read this README, but at present it looks something
 207 like:
 208
 209 ```
 210 define_queries! { <'tcx>
 211     /// Records the type of every item.
 212     [] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
 213
 214     ...
 215 }
 216 ```
 217
 218 Each line of the macro defines one query. The name is broken up like this:
 219
 220 ```
 221 [] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
 222 ^^    ^^^^^^^  ^^^^^^^^^^ ^^^^^     ^^^^^^^^
 223 |     |        |          |         |
 224 |     |        |          |         result type of query
 225 |     |        |          query key type
 226 |     |        dep-node constructor
 227 |     name of query
 228 query flags
 229 ```
 230
 231 Let's go over them one by one:
 232
 233 - **Query flags:** these are largely unused right now, but the intention
 234   is that we'll be able to customize various aspects of how the query is
 235   processed.
 236 - **Name of query:** the name of the query method
 237   (`tcx.type_of(..)`). Also used as the name of a struct
 238   (`ty::query::queries::type_of`) that will be generated to represent
 239   this query.
 240 - **Dep-node constructor:** indicates the constructor function that
 241   connects this query to incremental compilation. Typically, this is a
 242   `DepNode` variant, which can be added by modifying the
 243   `define_dep_nodes!` macro invocation in
 244   `librustc/dep_graph/dep_node.rs`.
 245   - However, sometimes we use a custom function, in which case the
 246     name will be in snake case and the function will be defined at the
 247     bottom of the file. This is typically used when the query key is
 248     not a def-id, or just not the type that the dep-node expects.
 249 - **Query key type:** the type of the argument to this query.
 250   This type must implement the `ty::query::keys::Key` trait, which
 251   defines (for example) how to map it to a crate, and so forth.
 252 - **Result type of query:** the type produced by this query. This type
 253   should (a) not use `RefCell` or other interior mutability and (b) be
 254   cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
 255   non-trivial data types.
 256   - The one exception to those rules is the `ty::steal::Steal` type,
 257     which is used to cheaply modify MIR in place. See the definition
 258     of `Steal` for more details. New uses of `Steal` should **not** be
 259     added without alerting `@rust-lang/compiler`.
 260
 261 So, to add a query:
 262
 263 - Add an entry to `define_queries!` using the format above.
 264 - Possibly add a corresponding entry to the dep-node macro.
 265 - Link the provider by modifying the appropriate `provide` method;
 266   or add a new one if needed and ensure that `rustc_driver` is invoking it.
 267
 268 #### Query structs and descriptions
 269
 270 For each kind, the `define_queries` macro will generate a "query struct"
 271 named after the query. This struct is a kind of a place-holder
 272 describing the query. Each such struct implements the
 273 `self::config::QueryConfig` trait, which has associated types for the
 274 key/value of that particular query. Basically the code generated looks something
 275 like this:
 276
 277 ```rust
 278 // Dummy struct representing a particular kind of query:
 279 pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> }
 280
 281 impl<'tcx> QueryConfig for type_of<'tcx> {
 282   type Key = DefId;
 283   type Value = Ty<'tcx>;
 284 }
 285 ```
 286
 287 There is an additional trait that you may wish to implement called
 288 `self::config::QueryDescription`. This trait is used during cycle
 289 errors to give a "human readable" name for the query, so that we can
 290 summarize what was happening when the cycle occurred. Implementing
 291 this trait is optional if the query key is `DefId`, but if you *don't*
 292 implement it, you get a pretty generic error ("processing `foo`...").
 293 You can put new impls into the `config` module. They look something like this:
 294
 295 ```rust
 296 impl<'tcx> QueryDescription for queries::type_of<'tcx> {
 297     fn describe(tcx: TyCtxt<'_, '_, '_>, key: DefId) -> String {
 298         format!("computing the type of `{}`", tcx.item_path_str(key))
 299     }
 300 }
 301 ```
 302