Merge #939

[rust.git] / ARCHITECTURE.md
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md

index 3b200bbc857917144d2a2ce64e467481d9684589..57f76ebaecf409c08ca9fab320c67301aa64bb2c 100644 (file)
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -1,9 +1,15 @@
  # Architecture
  
-This document describes high-level architecture of rust-analyzer.
+This document describes the high-level architecture of rust-analyzer.
  If you want to familiarize yourself with the code base, you are just
  in the right place!
  
+See also the [guide](./guide.md), which walks through a particular snapshot of
+rust-analyzer code base.
+
+For syntax-trees specifically, there's a [video walk
+through](https://youtu.be/DGAuLWdCCAI) as well.
+
  ## The Big Picture
  
  ![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png)
@@ -12,15 +18,15 @@ On the highest level, rust-analyzer is a thing which accepts input source code
  from the client and produces a structured semantic model of the code.
  
  More specifically, input data consists of a set of test files (`(PathBuf,
-String)` pairs) and an information about project structure, the so called
-`CrateGraph`. Crate graph specifies which files are crate roots, which cfg flags
-are specified for each crate (TODO: actually implement this) and what are
-dependencies between the crate. The analyzer keeps all these input data in
+String)` pairs) and information about project structure, captured in the so called
+`CrateGraph`. The crate graph specifies which files are crate roots, which cfg
+flags are specified for each crate (TODO: actually implement this) and what
+dependencies exist between the crates. The analyzer keeps all this input data in
  memory and never does any IO. Because the input data is source code, which
  typically measures in tens of megabytes at most, keeping all input data in
  memory is OK.
  
-A "structured semantic model" is basically an object-oriented representations of
+A "structured semantic model" is basically an object-oriented representation of
  modules, functions and types which appear in the source code. This representation
  is fully "resolved": all expressions have types, all references are bound to
  declarations, etc.
@@ -28,8 +34,8 @@ declarations, etc.
  The client can submit a small delta of input data (typically, a change to a
  single file) and get a fresh code model which accounts for changes.
  
-Underlying engine makes sure that model is computed lazily (on-demand) and can
-be quickly updated for small modifications.
+The underlying engine makes sure that model is computed lazily (on-demand) and
+can be quickly updated for small modifications.
  
  
  ## Code generation
@@ -37,7 +43,7 @@ be quickly updated for small modifications.
  Some of the components of this repository are generated through automatic
  processes. These are outlined below:
  
-- `gen-syntax`: The kinds of tokens are reused in several places, so a generator
+- `gen-syntax`: The kinds of tokens that are reused in several places, so a generator
    is used. We use tera templates to generate the files listed below, based on
    the grammar described in [grammar.ron]:
    - [ast/generated.rs][ast generated] in `ra_syntax` based on
@@ -58,25 +64,24 @@ processes. These are outlined below:
  ### `crates/ra_syntax`
  
  Rust syntax tree structure and parser. See
-[RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design
-notes.
+[RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes.
  
  - [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees.
-- `grammar` module is the actual parser. It is a hand-written recursive descent parsers, which
-  produces a sequence of events like "start node X", "finish not Y". It works similarly to  [kotlin parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java),
-  which is a good source for inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs)
+- `grammar` module is the actual parser. It is a hand-written recursive descent parser, which
+  produces a sequence of events like "start node X", "finish not Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java),
+  which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs)
    is what we use for the definition of the Rust language.
  - `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees.
    This is the thing that turns a flat list of events into a tree (see `EventProcessor`)
-- `ast` a type safe API on top of the raw `rowan` tree.
+- `ast` provides a type safe API on top of the raw `rowan` tree.
  - `grammar.ron` RON description of the grammar, which is used to
    generate `syntax_kinds` and `ast` modules, using `cargo gen-syntax` command.
  - `algo`: generic tree algorithms, including `walk` for O(1) stack
    space tree traversal (this is cool) and `visit` for type-driven
    visiting the nodes (this is double plus cool, if you understand how
-  `Visitor` works, you understand rust-analyzer).
+  `Visitor` works, you understand the design of syntax trees).
  
-Test for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs`
+Tests for ra_syntax are mostly data-driven: `tests/data/parser` contains a bunch of `.rs`
  (test vectors) and `.txt` files with corresponding syntax trees. During testing, we check
  `.rs` against `.txt`. If the `.txt` file is missing, it is created (this is how you update
  tests). Additionally, running `cargo gen-tests` will walk the grammar module and collect
@@ -87,58 +92,65 @@ fixes a bug in the grammar.
  
  ### `crates/ra_db`
  
-We use [salsa][https://github.com/salsa-rs/salsa] crate for incremental and
+We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and
  on-demand computation. Roughly, you can think of salsa as a key-value store, but
  it also can compute derived values using specified functions. The `ra_db` crate
-provides a basic infrastructure for interracting with salsa. Crucially, it
-defines most of the "input" queries: facts supplied by the client of the analyzer.
+provides basic infrastructure for interacting with salsa. Crucially, it
+defines most of the "input" queries: facts supplied by the client of the
+analyzer. Reading the docs of the `ra_db::input` module should be useful:
+everything else is strictly derived from those inputs.
  
  ### `crates/ra_hir`
  
-HIR provides a high-level "object oriented" acess to Rust code.
+HIR provides high-level "object oriented" access to Rust code.
  
  The principal difference between HIR and syntax trees is that HIR is bound to a
  particular crate instance. That is, it has cfg flags and features applied (in
-theory, in practice this is to be implemented). So, there relation between
-syntax and HIR is many-to-one. The `source_binder` modules is responsible for
-guessing a hir for a particular source position.
+theory, in practice this is to be implemented). So, the relation between
+syntax and HIR is many-to-one. The `source_binder` module is responsible for
+guessing a HIR for a particular source position.
  
-Underneath, hir works on top of salsa, using a `HirDatabase` trait.
+Underneath, HIR works on top of salsa, using a `HirDatabase` trait.
  
-### `crates/ra_analysis`
+### `crates/ra_ide_api`
  
-A stateful library for analyzing many Rust files as they change.
-`AnalysisHost` is a mutable entity (clojure's atom) which holds
-current state, incorporates changes and handles out `Analysis` --- an
-immutable consistent snapshot of world state at a point in time, which
-actually powers analysis.
+A stateful library for analyzing many Rust files as they change. `AnalysisHost`
+is a mutable entity (clojure's atom) which holds the current state, incorporates
+changes and hands out `Analysis` --- an immutable and consistent snapshot of
+the world state at a point in time, which actually powers analysis.
  
-One interesting aspect of analysis is its support for cancellation. When a change
-is applied to `AnalysisHost`, first all currently active snapshots are
-cancelled. Only after all snapshots are dropped the change actually affects the
+One interesting aspect of analysis is its support for cancellation. When a
+change is applied to `AnalysisHost`, first all currently active snapshots are
+canceled. Only after all snapshots are dropped the change actually affects the
  database.
  
-### `crates/ra_lsp_server`
+APIs in this crate are IDE centric: they take text offsets as input and produce
+offsets and strings as output. This works on top of rich code model powered by
+`hir`.
  
-An LSP implementation which uses `ra_analysis` for managing state and
-`ra_editor` for actually doing useful stuff.
+### `crates/ra_ide_api_light`
  
-See [#79](https://github.com/rust-analyzer/rust-analyzer/pull/79/) as an
-example of PR which adds a new feature to `ra_editor` and exposes it
-to `ra_lsp_server`.
+All IDE features which can be implemented if you only have access to a single
+file. `ra_ide_api_light` could be used to enhance editing of Rust code without
+the need to fiddle with build-systems, file synchronization and such.
  
-### `crates/ra_editor`
+In a sense, `ra_ide_api_light` is just a bunch of pure functions which take a
+syntax tree as input.
  
-All IDE features which can be implemented if you only have access to a
-single file. `ra_editor` could be used to enhance editing of Rust code
-without the need to fiddle with build-systems, file
-synchronization and such.
+The tests for `ra_ide_api_light` are `#[cfg(test)] mod tests` unit-tests spread
+throughout its modules.
  
-In a sense, `ra_editor` is just a bunch of pure functions which take a
-syntax tree as an input.
  
-The tests for `ra_editor` are `#[cfg(test)] mod tests` unit-tests spread
-throughout its modules.
+### `crates/ra_lsp_server`
+
+An LSP implementation which wraps `ra_ide_api` into a langauge server protocol.
+
+### `crates/ra_vfs`
+
+Although `hir` and `ra_ide_api` don't do any IO, we need to be able to read
+files from disk at the end of the day. This is what `ra_vfs` does. It also
+manages overlays: "dirty" files in the editor, whose "true" contents is
+different from data on disk.
  
  ### `crates/gen_lsp_server`
  
@@ -168,16 +180,21 @@ VS Code plugin
  ## Common workflows
  
  To try out VS Code extensions, run `cargo install-code`.  This installs both the
-`ra_lsp_server` binary and VS Code extension. To install only the binary, use
-`cargo install --path crates/ra_lsp_server --force`
+`ra_lsp_server` binary and the VS Code extension. To install only the binary, use
+`cargo install-lsp` (shorthand for `cargo install --path crates/ra_lsp_server --force`)
  
  To see logs from the language server, set `RUST_LOG=info` env variable. To see
  all communication between the server and the client, use
-`RUST_LOG=gen_lsp_server=debug` (will print quite a bit of stuff).
+`RUST_LOG=gen_lsp_server=debug` (this will print quite a bit of stuff).
+
+There's `rust-analyzer: status` command which prints common high-level debug
+info. In particular, it prints info about memory usage of various data
+structures, and, if compiled with jemalloc support (`cargo jinstall-lsp` or 
+`cargo install --path crates/ra_lsp_server --force --features jemalloc`), includes
+ statistic about the heap.
  
  To run tests, just `cargo test`.
  
-To work on VS Code extension, launch code inside `editors/code` and use `F5` to
+To work on the VS Code extension, launch code inside `editors/code` and use `F5` to
  launch/debug. To automatically apply formatter and linter suggestions, use `npm
  run fix`.
-