docs/dev/architecture.md

   1 # Architecture
   2
   3 This document describes the high-level architecture of rust-analyzer.
   4 If you want to familiarize yourself with the code base, you are just
   5 in the right place!
   6
   7 See also the [guide](./guide.md), which walks through a particular snapshot of
   8 rust-analyzer code base.
   9
  10 Yet another resource is this playlist with videos about various parts of the
  11 analyzer:
  12
  13 https://www.youtube.com/playlist?list=PL85XCvVPmGQho7MZkdW-wtPtuJcFpzycE
  14
  15 ## The Big Picture
  16
  17 ![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png)
  18
  19 On the highest level, rust-analyzer is a thing which accepts input source code
  20 from the client and produces a structured semantic model of the code.
  21
  22 More specifically, input data consists of a set of test files (`(PathBuf,
  23 String)` pairs) and information about project structure, captured in the so called
  24 `CrateGraph`. The crate graph specifies which files are crate roots, which cfg
  25 flags are specified for each crate (TODO: actually implement this) and what
  26 dependencies exist between the crates. The analyzer keeps all this input data in
  27 memory and never does any IO. Because the input data is source code, which
  28 typically measures in tens of megabytes at most, keeping all input data in
  29 memory is OK.
  30
  31 A "structured semantic model" is basically an object-oriented representation of
  32 modules, functions and types which appear in the source code. This representation
  33 is fully "resolved": all expressions have types, all references are bound to
  34 declarations, etc.
  35
  36 The client can submit a small delta of input data (typically, a change to a
  37 single file) and get a fresh code model which accounts for changes.
  38
  39 The underlying engine makes sure that model is computed lazily (on-demand) and
  40 can be quickly updated for small modifications.
  41
  42
  43 ## Code generation
  44
  45 Some of the components of this repository are generated through automatic
  46 processes. These are outlined below:
  47
  48 - `gen-syntax`: The kinds of tokens that are reused in several places, so a generator
  49   is used. We use tera templates to generate the files listed below, based on
  50   the grammar described in [grammar.ron]:
  51   - [ast/generated.rs][ast generated] in `ra_syntax` based on
  52     [ast/generated.tera.rs][ast source]
  53   - [syntax_kind/generated.rs][syntax_kind generated] in `ra_syntax` based on
  54     [syntax_kind/generated.tera.rs][syntax_kind source]
  55
  56 [tera]: https://tera.netlify.com/
  57 [grammar.ron]: ../../crates/ra_syntax/src/grammar.ron
  58 [ast generated]: ../../crates/ra_syntax/src/ast/generated.rs
  59 [ast source]: ../../crates/ra_syntax/src/ast/generated.rs.tera
  60 [syntax_kind generated]: ../../crates/ra_parser/src/syntax_kind/generated.rs
  61 [syntax_kind source]: ../../crates/ra_parser/src/syntax_kind/generated.rs.tera
  62
  63
  64 ## Code Walk-Through
  65
  66 ### `crates/ra_syntax`, `crates/ra_parser`
  67
  68 Rust syntax tree structure and parser. See
  69 [RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes.
  70
  71 - [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees.
  72 - `grammar` module is the actual parser. It is a hand-written recursive descent parser, which
  73   produces a sequence of events like "start node X", "finish node Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java),
  74   which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs)
  75   is what we use for the definition of the Rust language.
  76 - `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees.
  77   This is the thing that turns a flat list of events into a tree (see `EventProcessor`)
  78 - `ast` provides a type safe API on top of the raw `rowan` tree.
  79 - `grammar.ron` RON description of the grammar, which is used to
  80   generate `syntax_kinds` and `ast` modules, using `cargo gen-syntax` command.
  81 - `algo`: generic tree algorithms, including `walk` for O(1) stack
  82   space tree traversal (this is cool) and `visit` for type-driven
  83   visiting the nodes (this is double plus cool, if you understand how
  84   `Visitor` works, you understand the design of syntax trees).
  85
  86 Tests for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs`
  87 (test vectors) and `.txt` files with corresponding syntax trees. During testing, we check
  88 `.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update
  89 tests). Additionally, running `cargo gen-tests` will walk the grammar module and collect
  90 all `//test test_name` comments into files inside `tests/data` directory.
  91
  92 See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which
  93 fixes a bug in the grammar.
  94
  95 ### `crates/ra_db`
  96
  97 We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and
  98 on-demand computation. Roughly, you can think of salsa as a key-value store, but
  99 it also can compute derived values using specified functions. The `ra_db` crate
 100 provides basic infrastructure for interacting with salsa. Crucially, it
 101 defines most of the "input" queries: facts supplied by the client of the
 102 analyzer. Reading the docs of the `ra_db::input` module should be useful:
 103 everything else is strictly derived from those inputs.
 104
 105 ### `crates/ra_hir`
 106
 107 HIR provides high-level "object oriented" access to Rust code.
 108
 109 The principal difference between HIR and syntax trees is that HIR is bound to a
 110 particular crate instance. That is, it has cfg flags and features applied (in
 111 theory, in practice this is to be implemented). So, the relation between
 112 syntax and HIR is many-to-one. The `source_binder` module is responsible for
 113 guessing a HIR for a particular source position.
 114
 115 Underneath, HIR works on top of salsa, using a `HirDatabase` trait.
 116
 117 ### `crates/ra_ide_api`
 118
 119 A stateful library for analyzing many Rust files as they change. `AnalysisHost`
 120 is a mutable entity (clojure's atom) which holds the current state, incorporates
 121 changes and hands out `Analysis` --- an immutable and consistent snapshot of
 122 the world state at a point in time, which actually powers analysis.
 123
 124 One interesting aspect of analysis is its support for cancellation. When a
 125 change is applied to `AnalysisHost`, first all currently active snapshots are
 126 canceled. Only after all snapshots are dropped the change actually affects the
 127 database.
 128
 129 APIs in this crate are IDE centric: they take text offsets as input and produce
 130 offsets and strings as output. This works on top of rich code model powered by
 131 `hir`.
 132
 133 ### `crates/ra_lsp_server`
 134
 135 An LSP implementation which wraps `ra_ide_api` into a language server protocol.
 136
 137 ### `ra_vfs`
 138
 139 Although `hir` and `ra_ide_api` don't do any IO, we need to be able to read
 140 files from disk at the end of the day. This is what `ra_vfs` does. It also
 141 manages overlays: "dirty" files in the editor, whose "true" contents is
 142 different from data on disk. This is more or less the single really
 143 platform-dependent component, so it lives in a separate repository and has an
 144 extensive cross-platform CI testing.
 145
 146 ### `crates/gen_lsp_server`
 147
 148 A language server scaffold, exposing a synchronous crossbeam-channel based API.
 149 This crate handles protocol handshaking and parsing messages, while you
 150 control the message dispatch loop yourself.
 151
 152 Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages.
 153
 154 ### `crates/ra_cli`
 155
 156 A CLI interface to rust-analyzer.
 157
 158
 159 ## Testing Infrastructure
 160
 161 Rust Analyzer has three interesting [systems
 162 boundaries](https://www.tedinski.com/2018/04/10/making-tests-a-positive-influence-on-design.html)
 163 to concentrate tests on.
 164
 165 The outermost boundary is the `ra_lsp_server` crate, which defines an LSP
 166 interface in terms of stdio. We do integration testing of this component, by
 167 feeding it with a stream of LSP requests and checking responses. These tests are
 168 known as "heavy", because they interact with Cargo and read real files from
 169 disk. For this reason, we try to avoid writing too many tests on this boundary:
 170 in a statically typed language, it's hard to make an error in the protocol
 171 itself if messages are themselves typed.
 172
 173 The middle, and most important, boundary is `ra_ide_api`. Unlike
 174 `ra_lsp_server`, which exposes API, `ide_api` uses Rust API and is intended to
 175 use by various tools. Typical test creates an `AnalysisHost`, calls some
 176 `Analysis` functions and compares the results against expectation.
 177
 178 The innermost and most elaborate boundary is `hir`. It has a much richer
 179 vocabulary of types than `ide_api`, but the basic testing setup is the same: we
 180 create a database, run some queries, assert result.
 181
 182 For comparisons, we use [insta](https://github.com/mitsuhiko/insta/) library for
 183 snapshot testing.
 184
 185 To test various analysis corner cases and avoid forgetting about old tests, we
 186 use so-called marks. See the `marks` module in the `test_utils` crate for more.