## Source Code
-The things described are implemented in two places
+The things described are implemented in three places
* [rowan](https://github.com/rust-analyzer/rowan/tree/v0.9.0) -- a generic library for rowan syntax trees.
* [ra_syntax](https://github.com/rust-analyzer/rust-analyzer/tree/cf5bdf464cad7ceb9a67e07985a3f4d3799ec0b6/crates/ra_syntax) crate inside rust-analyzer which wraps `rowan` into rust-analyzer specific API.
Nothing in rust-analyzer except this crate knows about `rowan`.
-* [ra_parser](https://github.com/rust-analyzer/rust-analyzer/tree/cf5bdf464cad7ceb9a67e07985a3f4d3799ec0b6/crates/ra_parser) crate parses input tokens into an `ra_syntax` tree
+* [parser](https://github.com/rust-analyzer/rust-analyzer/tree/cf5bdf464cad7ceb9a67e07985a3f4d3799ec0b6/crates/parser) crate parses input tokens into an `ra_syntax` tree
## Design Goals
-* Syntax trees are lossless, or full fidelity. All comments and whitespace are preserved.
+* Syntax trees are lossless, or full fidelity. All comments and whitespace get preserved.
* Syntax trees are semantic-less. They describe *strictly* the structure of a sequence of characters, they don't have hygiene, name resolution or type information attached.
-* Syntax trees are simple value type. It is possible to create trees for a syntax without any external context.
+* Syntax trees are simple value types. It is possible to create trees for a syntax without any external context.
* Syntax trees have intuitive traversal API (parent, children, siblings, etc).
* Parsing is lossless (even if the input is invalid, the tree produced by the parser represents it exactly).
* Parsing is resilient (even if the input is invalid, parser tries to see as much syntax tree fragments in the input as it can).
* Trivia and non-trivia tokens are not distinguished on the type level.
* Each token carries its full text.
* The original text can be recovered by concatenating the texts of all tokens in order.
-* Accessing a child of particular type (for example, parameter list of a function) generally involves linerary traversing the children, looking for a specific `kind`.
+* Accessing a child of particular type (for example, parameter list of a function) generally involves linearly traversing the children, looking for a specific `kind`.
* Modifying the tree is roughly `O(depth)`.
- We don't make special efforts to guarantree that the depth is not liner, but, in practice, syntax trees are branchy and shallow.
+ We don't make special efforts to guarantee that the depth is not linear, but, in practice, syntax trees are branchy and shallow.
* If mandatory (grammar wise) node is missing from the input, it's just missing from the tree.
* If an extra erroneous input is present, it is wrapped into a node with `ERROR` kind, and treated just like any other node.
* Parser errors are not a part of syntax tree.
An input like `fn f() { 90 + 2 }` might be parsed as
```
-FN_DEF@0..17
+FN@0..17
FN_KW@0..2 "fn"
WHITESPACE@2..3 " "
NAME@3..4
R_PAREN@5..6 ")"
WHITESPACE@6..7 " "
BLOCK_EXPR@7..17
- BLOCK@7..17
- L_CURLY@7..8 "{"
- WHITESPACE@8..9 " "
- BIN_EXPR@9..15
- LITERAL@9..11
- INT_NUMBER@9..11 "90"
- WHITESPACE@11..12 " "
- PLUS@12..13 "+"
- WHITESPACE@13..14 " "
- LITERAL@14..15
- INT_NUMBER@14..15 "2"
- WHITESPACE@15..16 " "
- R_CURLY@16..17 "}"
+ L_CURLY@7..8 "{"
+ WHITESPACE@8..9 " "
+ BIN_EXPR@9..15
+ LITERAL@9..11
+ INT_NUMBER@9..11 "90"
+ WHITESPACE@11..12 " "
+ PLUS@12..13 "+"
+ WHITESPACE@13..14 " "
+ LITERAL@14..15
+ INT_NUMBER@14..15 "2"
+ WHITESPACE@15..16 " "
+ R_CURLY@16..17 "}"
```
#### Optimizations
(significant amount of implementation work here was done by [CAD97](https://github.com/cad97)).
-To reduce the amount of allocations, the GreenNode is a DST, which uses a single allocation for header and children. Thus, it is only usable behind a pointer
+To reduce the amount of allocations, the GreenNode is a [DST](https://doc.rust-lang.org/reference/dynamically-sized-types.html), which uses a single allocation for header and children. Thus, it is only usable behind a pointer.
```
*-----------+------+----------+------------+--------+--------+-----+--------*
```rust
struct Token {
- kind: NonTriviaTokenKind
+ kind: NonTriviaTokenKind,
text: String,
leading_trivia: Vec<TriviaToken>,
trailing_trivia: Vec<TriviaToken>,
A function green tree is not super-convenient to use.
The biggest problem is accessing parents (there are no parent pointers!).
But there are also "identify" issues.
-Let's say you want to write a code which builds a list of expressions in a file: `fn collect_exrepssions(file: GreenNode) -> HashSet<GreenNode>`.
+Let's say you want to write a code which builds a list of expressions in a file: `fn collect_expressions(file: GreenNode) -> HashSet<GreenNode>`.
For the input like
```rust
self.parent.clone()
}
fn children(&self) -> impl Iterator<Item = SyntaxNode> {
- let mut offset = self.offset
+ let mut offset = self.offset;
self.green.children().map(|green_child| {
let child_offset = offset;
offset += green_child.text_len;
Arc::new(SyntaxData {
- offset: child_offset;
+ offset: child_offset,
parent: Some(Arc::clone(self)),
green: Arc::clone(green_child),
})
}
impl PartialEq for SyntaxNode {
- fn eq(&self, other: &SyntaxNode) {
+ fn eq(&self, other: &SyntaxNode) -> bool {
self.offset == other.offset
&& Arc::ptr_eq(&self.green, &other.green)
}
The other thread can restore the `SyntaxNode` by traversing from the root green node and looking for a node with specified range.
You can also use the similar trick to store a `SyntaxNode`.
That is, a data structure that holds a `(GreenNode, Range<usize>)` will be `Sync`.
-However rust-analyzer goes even further.
+However, rust-analyzer goes even further.
It treats trees as semi-transient and instead of storing a `GreenNode`, it generally stores just the id of the file from which the tree originated: `(FileId, Range<usize>)`.
The `SyntaxNode` is the restored by reparsing the file and traversing it from root.
With this trick, rust-analyzer holds only a small amount of trees in memory at the same time, which reduces memory usage.
To get rid of allocations, `rowan` takes advantage of `SyntaxNode: !Sync` and uses a thread-local free list of `SyntaxNode`s.
In a typical traversal, you only directly hold a few `SyntaxNode`s at a time (and their ancestors indirectly), so a free list proportional to the depth of the tree removes all allocations in a typical case.
-So, while traversal is not exactly incrementing a pointer, it's still pretty cheep: tls + rc bump!
+So, while traversal is not exactly incrementing a pointer, it's still pretty cheap: TLS + rc bump!
Traversal also yields (cheap) owned nodes, which improves ergonomics quite a bit.
```
This allows using true pointer equality for comparison of identities of `SyntaxNodes`.
-rust-analyzer used to have this design as well, but since we've switch to cursors.
+rust-analyzer used to have this design as well, but we've since switched to cursors.
The main problem with memoizing the red nodes is that it more than doubles the memory requirements for fully realized syntax trees.
In contrast, cursors generally retain only a path to the root.
C# combats increased memory usage by using weak references.
impl AstNode for FnDef {
fn cast(syntax: SyntaxNode) -> Option<Self> {
match kind {
- FN_DEF => Some(FnDef { syntax }),
+ FN => Some(FnDef { syntax }),
_ => None,
}
}
fn visibility(&self) -> Option<Visibility>;
}
-impl HasVisbility for FnDef {
+impl HasVisibility for FnDef {
fn visibility(&self) -> Option<Visibility> {
self.syntax.children().find_map(Visibility::cast)
}
### Parsing Algorithm
-We use a boring hand-crafted recursive descent + pratt combination, with a special effort of continuting the parsing if an error is detected.
+We use a boring hand-crafted recursive descent + pratt combination, with a special effort of continuing the parsing if an error is detected.
### Parser Recap