ARCHITECTURE.md

   1 # Architecture
   2
   3 This document describes the high-level architecture of rust-analyzer.
   4 If you want to familiarize yourself with the code base, you are just
   5 in the right place!
   6
   7 ## The Big Picture
   8
   9 ![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png)
  10
  11 On the highest level, rust-analyzer is a thing which accepts input source code
  12 from the client and produces a structured semantic model of the code.
  13
  14 More specifically, input data consists of a set of test files (`(PathBuf,
  15 String)` pairs) and information about project structure, captured in the so called
  16 `CrateGraph`. The crate graph specifies which files are crate roots, which cfg
  17 flags are specified for each crate (TODO: actually implement this) and what
  18 dependencies exist between the crates. The analyzer keeps all this input data in
  19 memory and never does any IO. Because the input data is source code, which
  20 typically measures in tens of megabytes at most, keeping all input data in
  21 memory is OK.
  22
  23 A "structured semantic model" is basically an object-oriented representation of
  24 modules, functions and types which appear in the source code. This representation
  25 is fully "resolved": all expressions have types, all references are bound to
  26 declarations, etc.
  27
  28 The client can submit a small delta of input data (typically, a change to a
  29 single file) and get a fresh code model which accounts for changes.
  30
  31 The underlying engine makes sure that model is computed lazily (on-demand) and
  32 can be quickly updated for small modifications.
  33
  34
  35 ## Code generation
  36
  37 Some of the components of this repository are generated through automatic
  38 processes. These are outlined below:
  39
  40 - `gen-syntax`: The kinds of tokens that are reused in several places, so a generator
  41   is used. We use tera templates to generate the files listed below, based on
  42   the grammar described in [grammar.ron]:
  43   - [ast/generated.rs][ast generated] in `ra_syntax` based on
  44     [ast/generated.tera.rs][ast source]
  45   - [syntax_kinds/generated.rs][syntax_kinds generated] in `ra_syntax` based on
  46     [syntax_kinds/generated.tera.rs][syntax_kinds source]
  47
  48 [tera]: https://tera.netlify.com/
  49 [grammar.ron]: ./crates/ra_syntax/src/grammar.ron
  50 [ast generated]: ./crates/ra_syntax/src/ast/generated.rs
  51 [ast source]: ./crates/ra_syntax/src/ast/generated.rs.tera
  52 [syntax_kinds generated]: ./crates/ra_syntax/src/syntax_kinds/generated.rs
  53 [syntax_kinds source]: ./crates/ra_syntax/src/syntax_kinds/generated.rs.tera
  54
  55
  56 ## Code Walk-Through
  57
  58 ### `crates/ra_syntax`
  59
  60 Rust syntax tree structure and parser. See
  61 [RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes.
  62
  63 - [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees.
  64 - `grammar` module is the actual parser. It is a hand-written recursive descent parser, which
  65   produces a sequence of events like "start node X", "finish not Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java),
  66   which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs)
  67   is what we use for the definition of the Rust language.
  68 - `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees.
  69   This is the thing that turns a flat list of events into a tree (see `EventProcessor`)
  70 - `ast` provides a type safe API on top of the raw `rowan` tree.
  71 - `grammar.ron` RON description of the grammar, which is used to
  72   generate `syntax_kinds` and `ast` modules, using `cargo gen-syntax` command.
  73 - `algo`: generic tree algorithms, including `walk` for O(1) stack
  74   space tree traversal (this is cool) and `visit` for type-driven
  75   visiting the nodes (this is double plus cool, if you understand how
  76   `Visitor` works, you understand the design of syntax trees).
  77
  78 Tests for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs`
  79 (test vectors) and `.txt` files with corresponding syntax trees. During testing, we check
  80 `.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update
  81 tests). Additionally, running `cargo gen-tests` will walk the grammar module and collect
  82 all `//test test_name` comments into files inside `tests/data` directory.
  83
  84 See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which
  85 fixes a bug in the grammar.
  86
  87 ### `crates/ra_db`
  88
  89 We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and
  90 on-demand computation. Roughly, you can think of salsa as a key-value store, but
  91 it also can compute derived values using specified functions. The `ra_db` crate
  92 provides basic infrastructure for interacting with salsa. Crucially, it
  93 defines most of the "input" queries: facts supplied by the client of the
  94 analyzer. Reading the docs of the `ra_db::input` module should be useful:
  95 everything else is strictly derived from those inputs.
  96
  97 ### `crates/ra_hir`
  98
  99 HIR provides high-level "object oriented" access to Rust code.
 100
 101 The principal difference between HIR and syntax trees is that HIR is bound to a
 102 particular crate instance. That is, it has cfg flags and features applied (in
 103 theory, in practice this is to be implemented). So, the relation between
 104 syntax and HIR is many-to-one. The `source_binder` module is responsible for
 105 guessing a HIR for a particular source position.
 106
 107 Underneath, HIR works on top of salsa, using a `HirDatabase` trait.
 108
 109 ### `crates/ra_ide_api`
 110
 111 A stateful library for analyzing many Rust files as they change. `AnalysisHost`
 112 is a mutable entity (clojure's atom) which holds the current state, incorporates
 113 changes and hands out `Analysis` --- an immutable and consistent snapshot of
 114 the world state at a point in time, which actually powers analysis.
 115
 116 One interesting aspect of analysis is its support for cancellation. When a
 117 change is applied to `AnalysisHost`, first all currently active snapshots are
 118 canceled. Only after all snapshots are dropped the change actually affects the
 119 database.
 120
 121 APIs in this crate are IDE centric: they take text offsets as input and produce
 122 offsets and strings as output. This works on top of rich code model powered by
 123 `hir`.
 124
 125 ### `crates/ra_ide_api_light`
 126
 127 All IDE features which can be implemented if you only have access to a single
 128 file. `ra_ide_api_light` could be used to enhance editing of Rust code without
 129 the need to fiddle with build-systems, file synchronization and such.
 130
 131 In a sense, `ra_ide_api_light` is just a bunch of pure functions which take a
 132 syntax tree as input.
 133
 134 The tests for `ra_ide_api_light` are `#[cfg(test)] mod tests` unit-tests spread
 135 throughout its modules.
 136
 137
 138 ### `crates/ra_lsp_server`
 139
 140 An LSP implementation which wraps `ra_ide_api` into a langauge server protocol.
 141
 142 ### `crates/ra_vfs`
 143
 144 Although `hir` and `ra_ide_api` don't do any IO, we need to be able to read
 145 files from disk at the end of the day. This is what `ra_vfs` does. It also
 146 manages overlays: "dirty" files in the editor, whose "true" contents is
 147 different from data on disk.
 148
 149 ### `crates/gen_lsp_server`
 150
 151 A language server scaffold, exposing a synchronous crossbeam-channel based API.
 152 This crate handles protocol handshaking and parsing messages, while you
 153 control the message dispatch loop yourself.
 154
 155 Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages.
 156
 157 ### `crates/ra_cli`
 158
 159 A CLI interface to rust-analyzer.
 160
 161 ### `crate/tools`
 162
 163 Custom Cargo tasks used to develop rust-analyzer:
 164
 165 - `cargo gen-syntax` -- generate `ast` and `syntax_kinds`
 166 - `cargo gen-tests` -- collect inline tests from grammar
 167 - `cargo install-code` -- build and install VS Code extension and server
 168
 169 ### `editors/code`
 170
 171 VS Code plugin
 172
 173
 174 ## Common workflows
 175
 176 To try out VS Code extensions, run `cargo install-code`.  This installs both the
 177 `ra_lsp_server` binary and the VS Code extension. To install only the binary, use
 178 `cargo install --path crates/ra_lsp_server --force`
 179
 180 To see logs from the language server, set `RUST_LOG=info` env variable. To see
 181 all communication between the server and the client, use
 182 `RUST_LOG=gen_lsp_server=debug` (this will print quite a bit of stuff).
 183
 184 To run tests, just `cargo test`.
 185
 186 To work on the VS Code extension, launch code inside `editors/code` and use `F5` to
 187 launch/debug. To automatically apply formatter and linter suggestions, use `npm
 188 run fix`.
 189