src/doc/intro.md

   1 % A 30-minute Introduction to Rust
   2
   3 Rust is a modern systems programming language focusing on safety and speed. It
   4 accomplishes these goals by being memory safe without using garbage collection.
   5
   6 This introduction will give you a rough idea of what Rust is like, eliding many
   7 details. It does not require prior experience with systems programming, but you
   8 may find the syntax easier if you've used a "curly brace" programming language
   9 before, like C or JavaScript. The concepts are more important than the syntax,
  10 so don't worry if you don't get every last detail: you can read [The
  11 Rust Programming Language](book/index.html) to get a more complete explanation.
  12
  13 Because this is about high-level concepts, you don't need to actually install
  14 Rust to follow along. If you'd like to anyway, check out [the
  15 homepage](http://rust-lang.org) for explanation.
  16
  17 To show off Rust, let's talk about how easy it is to get started with Rust.
  18 Then, we'll talk about Rust's most interesting feature, *ownership*, and
  19 then discuss how it makes concurrency easier to reason about. Finally,
  20 we'll talk about how Rust breaks down the perceived dichotomy between speed
  21 and safety.
  22
  23 # Tools
  24
  25 Getting started on a new Rust project is incredibly easy, thanks to Rust's
  26 package manager, [Cargo](https://crates.io/).
  27
  28 To start a new project with Cargo, use `cargo new`:
  29
  30 ```{bash}
  31 $ cargo new hello_world --bin
  32 ```
  33
  34 We're passing `--bin` because we're making a binary program: if we
  35 were making a library, we'd leave it off.
  36
  37 Let's check out what Cargo has generated for us:
  38
  39 ```{bash}
  40 $ cd hello_world
  41 $ tree .
  42 .
  43 ├── Cargo.toml
  44 └── src
  45     └── main.rs
  46
  47 1 directory, 2 files
  48 ```
  49
  50 This is all we need to get started. First, let's check out `Cargo.toml`:
  51
  52 ```{toml}
  53 [package]
  54
  55 name = "hello_world"
  56 version = "0.0.1"
  57 authors = ["Your Name <you@example.com>"]
  58 ```
  59
  60 This is called a *manifest*, and it contains all of the metadata that Cargo
  61 needs to compile your project.
  62
  63 Here's what's in `src/main.rs`:
  64
  65 ```{rust}
  66 fn main() {
  67     println!("Hello, world!");
  68 }
  69 ```
  70
  71 Cargo generated a "Hello World" for us. We'll talk more about the syntax here
  72 later, but that's what Rust code looks like! Let's compile and run it:
  73
  74 ```{bash}
  75 $ cargo run
  76    Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
  77      Running `target/hello_world`
  78 Hello, world!
  79 ```
  80
  81 Using an external dependency in Rust is incredibly easy. You add a line to
  82 your `Cargo.toml`:
  83
  84 ```{toml}
  85 [package]
  86
  87 name = "hello_world"
  88 version = "0.0.1"
  89 authors = ["Your Name <someone@example.com>"]
  90
  91 [dependencies.semver]
  92
  93 git = "https://github.com/rust-lang/semver.git"
  94 ```
  95
  96 You added the `semver` library, which parses version numbers and compares them
  97 according to the [SemVer specification](http://semver.org/).
  98
  99 Now, you can pull in that library using `extern crate` in
 100 `main.rs`.
 101
 102 ```{rust,ignore}
 103 extern crate semver;
 104
 105 use semver::Version;
 106
 107 fn main() {
 108     assert!(Version::parse("1.2.3") == Ok(Version {
 109         major: 1u64,
 110         minor: 2u64,
 111         patch: 3u64,
 112         pre: vec!(),
 113         build: vec!(),
 114     }));
 115
 116     println!("Versions compared successfully!");
 117 }
 118 ```
 119
 120 Again, we'll discuss the exact details of all of this syntax soon. For now,
 121 let's compile and run it:
 122
 123 ```{bash}
 124 $ cargo run
 125     Updating git repository `https://github.com/rust-lang/semver.git`
 126    Compiling semver v0.0.1 (https://github.com/rust-lang/semver.git#bf739419)
 127    Compiling hello_world v0.0.1 (file:///home/you/projects/hello_world)
 128      Running `target/hello_world`
 129 Versions compared successfully!
 130 ```
 131
 132 Because we only specified a repository without a version, if someone else were
 133 to try out our project at a later date, when `semver` was updated, they would
 134 get a different, possibly incompatible version. To solve this problem, Cargo
 135 produces a file, `Cargo.lock`, which records the versions of any dependencies.
 136 This gives us repeatable builds.
 137
 138 There is a lot more here, and this is a whirlwind tour, but you should feel
 139 right at home if you've used tools like [Bundler](http://bundler.io/),
 140 [npm](https://www.npmjs.org/), or [pip](https://pip.pypa.io/en/latest/).
 141 There's no `Makefile`s or endless `autotools` output here. (Rust's tooling does
 142 [play nice with external libraries written in those
 143 tools](http://doc.crates.io/build-script.html), if you need to.)
 144
 145 Enough about tools, let's talk code!
 146
 147 # Ownership
 148
 149 Rust's defining feature is "memory safety without garbage collection". Let's
 150 take a moment to talk about what that means. *Memory safety* means that the
 151 programming language eliminates certain kinds of bugs, such as [buffer
 152 overflows](https://en.wikipedia.org/wiki/Buffer_overflow) and [dangling
 153 pointers](https://en.wikipedia.org/wiki/Dangling_pointer). These problems occur
 154 when you have unrestricted access to memory. As an example, here's some Ruby
 155 code:
 156
 157 ```{ruby}
 158 v = []
 159
 160 v.push("Hello")
 161
 162 x = v[0]
 163
 164 v.push("world")
 165
 166 puts x
 167 ```
 168
 169 We make an array, `v`, and then call `push` on it. `push` is a method which
 170 adds an element to the end of an array.
 171
 172 Next, we make a new variable, `x`, that's equal to the first element of
 173 the array. Simple, but this is where the "bug" will appear.
 174
 175 Let's keep going. We then call `push` again, pushing "world" onto the
 176 end of the array. `v` now is `["Hello", "world"]`.
 177
 178 Finally, we print `x` with the `puts` method. This prints "Hello."
 179
 180 All good? Let's go over a similar, but subtly different example, in C++:
 181
 182 ```{cpp}
 183 #include<iostream>
 184 #include<vector>
 185 #include<string>
 186
 187 int main() {
 188     std::vector<std::string> v;
 189
 190     v.push_back("Hello");
 191
 192     std::string& x = v[0];
 193
 194     v.push_back("world");
 195
 196     std::cout << x;
 197 }
 198 ```
 199
 200 It's a little more verbose due to the static typing, but it's almost the same
 201 thing. We make a `std::vector` of `std::string`s, we call `push_back` (same as
 202 `push`) on it, take a reference to the first element of the vector, call
 203 `push_back` again, and then print out the reference.
 204
 205 There's two big differences here: one, they're not _exactly_ the same thing,
 206 and two...
 207
 208 ```{bash}
 209 $ g++ hello.cpp -Wall -Werror
 210 $ ./a.out
 211 Segmentation fault (core dumped)
 212 ```
 213
 214 A crash! (Note that this is actually system-dependent. Because referring to an
 215 invalid reference is undefined behavior, the compiler can do anything,
 216 including the right thing!) Even though we compiled with flags to give us as
 217 many warnings as possible, and to treat those warnings as errors, we got no
 218 errors. When we ran the program, it crashed.
 219
 220 Why does this happen? When we append to an array, its length changes. Since
 221 its length changes, we may need to allocate more memory. In Ruby, this happens
 222 as well, we just don't think about it very often. So why does the C++ version
 223 segfault when we allocate more memory?
 224
 225 The answer is that in the C++ version, `x` is a *reference* to the memory
 226 location where the first element of the array is stored. But in Ruby, `x` is a
 227 standalone value, not connected to the underlying array at all. Let's dig into
 228 the details for a moment. Your program has access to memory, provided to it by
 229 the operating system. Each location in memory has an address.  So when we make
 230 our vector, `v`, it's stored in a memory location somewhere:
 231
 232 | location | name | value |
 233 |----------|------|-------|
 234 | 0x30     | v    |       |
 235
 236 (Address numbers made up, and in hexadecimal. Those of you with deep C++
 237 knowledge, there are some simplifications going on here, like the lack of an
 238 allocated length for the vector. This is an introduction.)
 239
 240 When we push our first string onto the array, we allocate some memory,
 241 and `v` refers to it:
 242
 243 | location | name | value    |
 244 |----------|------|----------|
 245 | 0x30     | v    | 0x18     |
 246 | 0x18     |      | "Hello"  |
 247
 248 We then make a reference to that first element. A reference is a variable
 249 that points to a memory location, so its value is the memory location of
 250 the `"Hello"` string:
 251
 252 | location | name | value    |
 253 |----------|------|----------|
 254 | 0x30     | v    | 0x18     |
 255 | 0x18     |      | "Hello"  |
 256 | 0x14     | x    | 0x18     |
 257
 258 When we push `"world"` onto the vector with `push_back`, there's no room:
 259 we only allocated one element. So, we need to allocate two elements,
 260 copy the `"Hello"` string over, and update the reference. Like this:
 261
 262 | location | name | value    |
 263 |----------|------|----------|
 264 | 0x30     | v    | 0x08     |
 265 | 0x18     |      | GARBAGE  |
 266 | 0x14     | x    | 0x18     |
 267 | 0x08     |      | "Hello"  |
 268 | 0x04     |      | "world"  |
 269
 270 Note that `v` now refers to the new list, which has two elements. It's all
 271 good. But our `x` didn't get updated! It still points at the old location,
 272 which isn't valid anymore. In fact, [the documentation for `push_back` mentions
 273 this](http://en.cppreference.com/w/cpp/container/vector/push_back):
 274
 275 > If the new `size()` is greater than `capacity()` then all iterators and
 276 > references (including the past-the-end iterator) are invalidated.
 277
 278 Finding where these iterators and references are is a difficult problem, and
 279 even in this simple case, `g++` can't help us here. While the bug is obvious in
 280 this case, in real code, it can be difficult to track down the source of the
 281 error.
 282
 283 Before we talk about this solution, why didn't our Ruby code have this problem?
 284 The semantics are a little more complicated, and explaining Ruby's internals is
 285 out of the scope of a guide to Rust. But in a nutshell, Ruby's garbage
 286 collector keeps track of references, and makes sure that everything works as
 287 you might expect. This comes at an efficiency cost, and the internals are more
 288 complex.  If you'd really like to dig into the details, [this
 289 article](http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values)
 290 can give you more information.
 291
 292 Garbage collection is a valid approach to memory safety, but Rust chooses a
 293 different path.  Let's examine what the Rust version of this looks like:
 294
 295 ```{rust,ignore}
 296 fn main() {
 297     let mut v = vec![];
 298
 299     v.push("Hello");
 300
 301     let x = &v[0];
 302
 303     v.push("world");
 304
 305     println!("{}", x);
 306 }
 307 ```
 308
 309 This looks like a bit of both: fewer type annotations, but we do create new
 310 variables with `let`. The method name is `push`, some other stuff is different,
 311 but it's pretty close. So what happens when we compile this code?  Does Rust
 312 print `"Hello"`, or does Rust crash?
 313
 314 Neither. It refuses to compile:
 315
 316 ```bash
 317 $ cargo run
 318    Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
 319 main.rs:8:5: 8:6 error: cannot borrow `v` as mutable because it is also borrowed as immutable
 320 main.rs:8     v.push("world");
 321               ^
 322 main.rs:6:14: 6:15 note: previous borrow of `v` occurs here; the immutable borrow prevents subsequent moves or mutable borrows of `v` until the borrow ends
 323 main.rs:6     let x = &v[0];
 324                        ^
 325 main.rs:11:2: 11:2 note: previous borrow ends here
 326 main.rs:1 fn main() {
 327 ...
 328 main.rs:11 }
 329            ^
 330 error: aborting due to previous error
 331 ```
 332
 333 When we try to mutate the array by `push`ing it the second time, Rust throws
 334 an error. It says that we "cannot borrow v as mutable because it is also
 335 borrowed as immutable." What does it mean by "borrowed"?
 336
 337 In Rust, the type system encodes the notion of *ownership*. The variable `v`
 338 is an *owner* of the vector. When we make a reference to `v`, we let that
 339 variable (in this case, `x`) *borrow* it for a while. Just like if you own a
 340 book, and you lend it to me, I'm borrowing the book.
 341
 342 So, when I try to modify the vector with the second call to `push`, I need
 343 to be owning it. But `x` is borrowing it. You can't modify something that
 344 you've lent to someone. And so Rust throws an error.
 345
 346 So how do we fix this problem? Well, we can make a copy of the element:
 347
 348
 349 ```{rust}
 350 fn main() {
 351     let mut v = vec![];
 352
 353     v.push("Hello");
 354
 355     let x = v[0].clone();
 356
 357     v.push("world");
 358
 359     println!("{}", x);
 360 }
 361 ```
 362
 363 Note the addition of `clone()`. This creates a copy of the element, leaving
 364 the original untouched. Now, we no longer have two references to the same
 365 memory, and so the compiler is happy. Let's give that a try:
 366
 367 ```{bash}
 368 $ cargo run
 369    Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
 370      Running `target/hello_world`
 371 Hello
 372 ```
 373
 374 Same result. Now, making a copy can be inefficient, so this solution may not be
 375 acceptable. There are other ways to get around this problem, but this is a toy
 376 example, and because we're in an introduction, we'll leave that for later.
 377
 378 The point is, the Rust compiler and its notion of ownership has saved us from a
 379 bug that would crash the program. We've achieved safety, at compile time,
 380 without needing to rely on a garbage collector to handle our memory.
 381
 382 # Concurrency
 383
 384 Rust's ownership model can help in other ways, as well. For example, take
 385 concurrency. Concurrency is a big topic, and an important one for any modern
 386 programming language. Let's take a look at how ownership can help you write
 387 safe concurrent programs.
 388
 389 Here's an example of a concurrent Rust program:
 390
 391 ```{rust}
 392 # #![feature(scoped)]
 393 use std::thread;
 394
 395 fn main() {
 396     let guards: Vec<_> = (0..10).map(|_| {
 397         thread::scoped(|| {
 398             println!("Hello, world!");
 399         })
 400     }).collect();
 401 }
 402 ```
 403
 404 This program creates ten threads, which all print `Hello, world!`. The `scoped`
 405 function takes one argument, a closure, indicated by the double bars `||`. This
 406 closure is executed in a new thread created by `scoped`. The method is called
 407 `scoped` because it returns a 'join guard', which will automatically join the
 408 child thread when it goes out of scope. Because we `collect` these guards into
 409 a `Vec<T>`, and that vector goes out of scope at the end of our program, our
 410 program will wait for every thread to finish before finishing.
 411
 412 One common form of problem in concurrent programs is a *data race*.
 413 This occurs when two different threads attempt to access the same
 414 location in memory in a non-synchronized way, where at least one of
 415 them is a write. If one thread is attempting to read, and one thread
 416 is attempting to write, you cannot be sure that your data will not be
 417 corrupted. Note the first half of that requirement: two threads that
 418 attempt to access the same location in memory. Rust's ownership model
 419 can track which pointers own which memory locations, which solves this
 420 problem.
 421
 422 Let's see an example. This Rust code will not compile:
 423
 424 ```{rust,ignore}
 425 # #![feature(scoped)]
 426 use std::thread;
 427
 428 fn main() {
 429     let mut numbers = vec![1, 2, 3];
 430
 431     let guards: Vec<_> = (0..3).map(|i| {
 432         thread::scoped(move || {
 433             numbers[i] += 1;
 434             println!("numbers[{}] is {}", i, numbers[i]);
 435         })
 436     }).collect();
 437 }
 438 ```
 439
 440 It gives us this error:
 441
 442 ```text
 443 7:25: 10:6 error: cannot move out of captured outer variable in an `FnMut` closure
 444 7     thread::scoped(move || {
 445 8       numbers[i] += 1;
 446 9       println!("numbers[{}] is {}", i, numbers[i]);
 447 10     })
 448 error: aborting due to previous error
 449 ```
 450
 451 This is a little confusing because there are two closures here: the one passed
 452 to `map`, and the one passed to `thread::scoped`. In this case, the closure for
 453 `thread::scoped` is attempting to reference `numbers`, a `Vec<i32>`. This
 454 closure is a `FnOnce` closure, as that’s what `thread::scoped` takes as an
 455 argument. `FnOnce` closures take ownership of their environment. That’s fine,
 456 but there’s one detail: because of `map`, we’re going to make three of these
 457 closures. And since all three try to take ownership of `numbers`, that would be
 458 a problem. That’s what it means by ‘cannot move out of captured outer
 459 variable’: our `thread::scoped` closure wants to take ownership, and it can’t,
 460 because the closure for `map` won’t let it.
 461
 462 What to do here? Rust has a type that helps us: `Mutex<T>`. Because the threads
 463 are scoped, it is possible to use an _immutable_ reference to `numbers` inside
 464 of the closure. However, Rust prevents us from having multiple _mutable_
 465 references to the same object, so we need a `Mutex` to be able to modify what
 466 we're sharing. A Mutex will synchronize our accesses, so that we can ensure
 467 that our mutation doesn't cause a data race.
 468
 469 Here's what using a Mutex looks like:
 470
 471 ```{rust}
 472 # #![feature(scoped)]
 473 use std::thread;
 474 use std::sync::Mutex;
 475
 476 fn main() {
 477     let numbers = &Mutex::new(vec![1, 2, 3]);
 478
 479     let guards: Vec<_> = (0..3).map(|i| {
 480         thread::scoped(move || {
 481             let mut array = numbers.lock().unwrap();
 482             array[i] += 1;
 483             println!("numbers[{}] is {}", i, array[i]);
 484         })
 485     }).collect();
 486 }
 487 ```
 488
 489 We first have to `use` the appropriate library, and then we wrap our vector in
 490 a `Mutex` with the call to `Mutex::new()`. Inside of the loop, the `lock()`
 491 call will return us a reference to the value inside the Mutex, and block any
 492 other calls to `lock()` until said reference goes out of scope.
 493
 494 We can compile and run this program without error, and in fact, see the
 495 non-deterministic aspect:
 496
 497 ```{shell}
 498 $ cargo run
 499    Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
 500      Running `target/hello_world`
 501 numbers[1] is 3
 502 numbers[0] is 2
 503 numbers[2] is 4
 504 $ cargo run
 505      Running `target/hello_world`
 506 numbers[2] is 4
 507 numbers[1] is 3
 508 numbers[0] is 2
 509 ```
 510
 511 Each time, we can get a slightly different output because the threads are not
 512 guaranteed to run in any set order. If you get the same order every time it is
 513 because each of these threads are very small and complete too fast for their
 514 indeterminate behavior to surface.
 515
 516 The important part here is that the Rust compiler was able to use ownership to
 517 give us assurance _at compile time_ that we weren't doing something incorrect
 518 with regards to concurrency. In order to share ownership, we were forced to be
 519 explicit and use a mechanism to ensure that it would be properly handled.
 520
 521 # Safety _and_ Speed
 522
 523 Safety and speed are always presented as a continuum. At one end of the spectrum,
 524 you have maximum speed, but no safety. On the other end, you have absolute safety
 525 with no speed. Rust seeks to break out of this paradigm by introducing safety at
 526 compile time, ensuring that you haven't done anything wrong, while compiling to
 527 the same low-level code you'd expect without the safety.
 528
 529 As an example, Rust's ownership system is _entirely_ at compile time. The
 530 safety check that makes this an error about moved values:
 531
 532 ```{rust,ignore}
 533 # #![feature(scoped)]
 534 use std::thread;
 535
 536 fn main() {
 537     let numbers = vec![1, 2, 3];
 538
 539     let guards: Vec<_> = (0..3).map(|i| {
 540         thread::scoped(move || {
 541             println!("{}", numbers[i]);
 542         })
 543     }).collect();
 544 }
 545 ```
 546
 547 carries no runtime penalty. And while some of Rust's safety features do have
 548 a run-time cost, there's often a way to write your code in such a way that
 549 you can remove it. As an example, this is a poor way to iterate through
 550 a vector:
 551
 552 ```{rust}
 553 let vec = vec![1, 2, 3];
 554
 555 for i in 0..vec.len() {
 556      println!("{}", vec[i]);
 557 }
 558 ```
 559
 560 The reason is that the access of `vec[i]` does bounds checking, to ensure
 561 that we don't try to access an invalid index. However, we can remove this
 562 while retaining safety. The answer is iterators:
 563
 564 ```{rust}
 565 let vec = vec![1, 2, 3];
 566
 567 for x in &vec {
 568     println!("{}", x);
 569 }
 570 ```
 571
 572 This version uses an iterator that yields each element of the vector in turn.
 573 Because we have a reference to the element, rather than the whole vector itself,
 574 there's no array access bounds to check.
 575
 576 # Learning More
 577
 578 I hope that this taste of Rust has given you an idea if Rust is the right
 579 language for you. We talked about Rust's tooling, how encoding ownership into
 580 the type system helps you find bugs, how Rust can help you write correct
 581 concurrent code, and how you don't have to pay a speed cost for much of this
 582 safety.
 583
 584 To continue your Rustic education, read [The Rust Programming
 585 Language](book/index.html) for a more in-depth exploration of Rust's syntax and
 586 concepts.