src/doc/trpl/testing.md

   1 % Testing
   2
   3 > Program testing can be a very effective way to show the presence of bugs, but
   4 > it is hopelessly inadequate for showing their absence.
   5 >
   6 > Edsger W. Dijkstra, "The Humble Programmer" (1972)
   7
   8 Let's talk about how to test Rust code. What we will not be talking about is
   9 the right way to test Rust code. There are many schools of thought regarding
  10 the right and wrong way to write tests. All of these approaches use the same
  11 basic tools, and so we'll show you the syntax for using them.
  12
  13 # The `test` attribute
  14
  15 At its simplest, a test in Rust is a function that's annotated with the `test`
  16 attribute. Let's make a new project with Cargo called `adder`:
  17
  18 ```bash
  19 $ cargo new adder
  20 $ cd adder
  21 ```
  22
  23 Cargo will automatically generate a simple test when you make a new project.
  24 Here's the contents of `src/lib.rs`:
  25
  26 ```rust
  27 #[test]
  28 fn it_works() {
  29 }
  30 ```
  31
  32 Note the `#[test]`. This attribute indicates that this is a test function. It
  33 currently has no body. That's good enough to pass! We can run the tests with
  34 `cargo test`:
  35
  36 ```bash
  37 $ cargo test
  38    Compiling adder v0.0.1 (file:///home/you/projects/adder)
  39      Running target/adder-91b3e234d4ed382a
  40
  41 running 1 test
  42 test it_works ... ok
  43
  44 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
  45
  46    Doc-tests adder
  47
  48 running 0 tests
  49
  50 test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
  51 ```
  52
  53 Cargo compiled and ran our tests. There are two sets of output here: one
  54 for the test we wrote, and another for documentation tests. We'll talk about
  55 those later. For now, see this line:
  56
  57 ```text
  58 test it_works ... ok
  59 ```
  60
  61 Note the `it_works`. This comes from the name of our function:
  62
  63 ```rust
  64 fn it_works() {
  65 # }
  66 ```
  67
  68 We also get a summary line:
  69
  70 ```text
  71 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
  72 ```
  73
  74 So why does our do-nothing test pass? Any test which doesn't `panic!` passes,
  75 and any test that does `panic!` fails. Let's make our test fail:
  76
  77 ```rust
  78 #[test]
  79 fn it_works() {
  80     assert!(false);
  81 }
  82 ```
  83
  84 `assert!` is a macro provided by Rust which takes one argument: if the argument
  85 is `true`, nothing happens. If the argument is false, it `panic!`s. Let's run
  86 our tests again:
  87
  88 ```bash
  89 $ cargo test
  90    Compiling adder v0.0.1 (file:///home/you/projects/adder)
  91      Running target/adder-91b3e234d4ed382a
  92
  93 running 1 test
  94 test it_works ... FAILED
  95
  96 failures:
  97
  98 ---- it_works stdout ----
  99         thread 'it_works' panicked at 'assertion failed: false', /home/steve/tmp/adder/src/lib.rs:3
 100
 101
 102
 103 failures:
 104     it_works
 105
 106 test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured
 107
 108 thread '<main>' panicked at 'Some tests failed', /home/steve/src/rust/src/libtest/lib.rs:247
 109 ```
 110
 111 Rust indicates that our test failed:
 112
 113 ```text
 114 test it_works ... FAILED
 115 ```
 116
 117 And that's reflected in the summary line:
 118
 119 ```text
 120 test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured
 121 ```
 122
 123 We also get a non-zero status code:
 124
 125 ```bash
 126 $ echo $?
 127 101
 128 ```
 129
 130 This is useful if you want to integrate `cargo test` into other tooling.
 131
 132 We can invert our test's failure with another attribute: `should_panic`:
 133
 134 ```rust
 135 #[test]
 136 #[should_panic]
 137 fn it_works() {
 138     assert!(false);
 139 }
 140 ```
 141
 142 This test will now succeed if we `panic!` and fail if we complete. Let's try it:
 143
 144 ```bash
 145 $ cargo test
 146    Compiling adder v0.0.1 (file:///home/you/projects/adder)
 147      Running target/adder-91b3e234d4ed382a
 148
 149 running 1 test
 150 test it_works ... ok
 151
 152 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
 153
 154    Doc-tests adder
 155
 156 running 0 tests
 157
 158 test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
 159 ```
 160
 161 Rust provides another macro, `assert_eq!`, that compares two arguments for
 162 equality:
 163
 164 ```rust
 165 #[test]
 166 #[should_panic]
 167 fn it_works() {
 168     assert_eq!("Hello", "world");
 169 }
 170 ```
 171
 172 Does this test pass or fail? Because of the `should_panic` attribute, it
 173 passes:
 174
 175 ```bash
 176 $ cargo test
 177    Compiling adder v0.0.1 (file:///home/you/projects/adder)
 178      Running target/adder-91b3e234d4ed382a
 179
 180 running 1 test
 181 test it_works ... ok
 182
 183 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
 184
 185    Doc-tests adder
 186
 187 running 0 tests
 188
 189 test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
 190 ```
 191
 192 `should_panic` tests can be fragile, as it's hard to guarantee that the test
 193 didn't fail for an unexpected reason. To help with this, an optional `expected`
 194 parameter can be added to the `should_panic` attribute. The test harness will
 195 make sure that the failure message contains the provided text. A safer version
 196 of the example above would be:
 197
 198 ```
 199 #[test]
 200 #[should_panic(expected = "assertion failed")]
 201 fn it_works() {
 202     assert_eq!("Hello", "world");
 203 }
 204 ```
 205
 206 That's all there is to the basics! Let's write one 'real' test:
 207
 208 ```{rust,ignore}
 209 pub fn add_two(a: i32) -> i32 {
 210     a + 2
 211 }
 212
 213 #[test]
 214 fn it_works() {
 215     assert_eq!(4, add_two(2));
 216 }
 217 ```
 218
 219 This is a very common use of `assert_eq!`: call some function with
 220 some known arguments and compare it to the expected output.
 221
 222 # The `test` module
 223
 224 There is one way in which our existing example is not idiomatic: it's
 225 missing the test module. The idiomatic way of writing our example
 226 looks like this:
 227
 228 ```{rust,ignore}
 229 pub fn add_two(a: i32) -> i32 {
 230     a + 2
 231 }
 232
 233 #[cfg(test)]
 234 mod test {
 235     use super::add_two;
 236
 237     #[test]
 238     fn it_works() {
 239         assert_eq!(4, add_two(2));
 240     }
 241 }
 242 ```
 243
 244 There's a few changes here. The first is the introduction of a `mod test` with
 245 a `cfg` attribute. The module allows us to group all of our tests together, and
 246 to also define helper functions if needed, that don't become a part of the rest
 247 of our crate. The `cfg` attribute only compiles our test code if we're
 248 currently trying to run the tests. This can save compile time, and also ensures
 249 that our tests are entirely left out of a normal build.
 250
 251 The second change is the `use` declaration. Because we're in an inner module,
 252 we need to bring our test function into scope. This can be annoying if you have
 253 a large module, and so this is a common use of the `glob` feature. Let's change
 254 our `src/lib.rs` to make use of it:
 255
 256 ```{rust,ignore}
 257
 258 pub fn add_two(a: i32) -> i32 {
 259     a + 2
 260 }
 261
 262 #[cfg(test)]
 263 mod test {
 264     use super::*;
 265
 266     #[test]
 267     fn it_works() {
 268         assert_eq!(4, add_two(2));
 269     }
 270 }
 271 ```
 272
 273 Note the different `use` line. Now we run our tests:
 274
 275 ```bash
 276 $ cargo test
 277     Updating registry `https://github.com/rust-lang/crates.io-index`
 278    Compiling adder v0.0.1 (file:///home/you/projects/adder)
 279      Running target/adder-91b3e234d4ed382a
 280
 281 running 1 test
 282 test test::it_works ... ok
 283
 284 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
 285
 286    Doc-tests adder
 287
 288 running 0 tests
 289
 290 test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
 291 ```
 292
 293 It works!
 294
 295 The current convention is to use the `test` module to hold your "unit-style"
 296 tests. Anything that just tests one small bit of functionality makes sense to
 297 go here. But what about "integration-style" tests instead? For that, we have
 298 the `tests` directory
 299
 300 # The `tests` directory
 301
 302 To write an integration test, let's make a `tests` directory, and
 303 put a `tests/lib.rs` file inside, with this as its contents:
 304
 305 ```{rust,ignore}
 306 extern crate adder;
 307
 308 #[test]
 309 fn it_works() {
 310     assert_eq!(4, adder::add_two(2));
 311 }
 312 ```
 313
 314 This looks similar to our previous tests, but slightly different. We now have
 315 an `extern crate adder` at the top. This is because the tests in the `tests`
 316 directory are an entirely separate crate, and so we need to import our library.
 317 This is also why `tests` is a suitable place to write integration-style tests:
 318 they use the library like any other consumer of it would.
 319
 320 Let's run them:
 321
 322 ```bash
 323 $ cargo test
 324    Compiling adder v0.0.1 (file:///home/you/projects/adder)
 325      Running target/adder-91b3e234d4ed382a
 326
 327 running 1 test
 328 test test::it_works ... ok
 329
 330 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
 331
 332      Running target/lib-c18e7d3494509e74
 333
 334 running 1 test
 335 test it_works ... ok
 336
 337 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
 338
 339    Doc-tests adder
 340
 341 running 0 tests
 342
 343 test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
 344 ```
 345
 346 Now we have three sections: our previous test is also run, as well as our new
 347 one.
 348
 349 That's all there is to the `tests` directory. The `test` module isn't needed
 350 here, since the whole thing is focused on tests.
 351
 352 Let's finally check out that third section: documentation tests.
 353
 354 # Documentation tests
 355
 356 Nothing is better than documentation with examples. Nothing is worse than
 357 examples that don't actually work, because the code has changed since the
 358 documentation has been written. To this end, Rust supports automatically
 359 running examples in your documentation. Here's a fleshed-out `src/lib.rs`
 360 with examples:
 361
 362 ```{rust,ignore}
 363 //! The `adder` crate provides functions that add numbers to other numbers.
 364 //!
 365 //! # Examples
 366 //!
 367 //! ```
 368 //! assert_eq!(4, adder::add_two(2));
 369 //! ```
 370
 371 /// This function adds two to its argument.
 372 ///
 373 /// # Examples
 374 ///
 375 /// ```
 376 /// use adder::add_two;
 377 ///
 378 /// assert_eq!(4, add_two(2));
 379 /// ```
 380 pub fn add_two(a: i32) -> i32 {
 381     a + 2
 382 }
 383
 384 #[cfg(test)]
 385 mod tests {
 386     use super::*;
 387
 388     #[test]
 389     fn it_works() {
 390         assert_eq!(4, add_two(2));
 391     }
 392 }
 393 ```
 394
 395 Note the module-level documentation with `//!` and the function-level
 396 documentation with `///`. Rust's documentation supports Markdown in comments,
 397 and so triple graves mark code blocks. It is conventional to include the
 398 `# Examples` section, exactly like that, with examples following.
 399
 400 Let's run the tests again:
 401
 402 ```bash
 403 $ cargo test
 404    Compiling adder v0.0.1 (file:///home/steve/tmp/adder)
 405      Running target/adder-91b3e234d4ed382a
 406
 407 running 1 test
 408 test test::it_works ... ok
 409
 410 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
 411
 412      Running target/lib-c18e7d3494509e74
 413
 414 running 1 test
 415 test it_works ... ok
 416
 417 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
 418
 419    Doc-tests adder
 420
 421 running 2 tests
 422 test add_two_0 ... ok
 423 test _0 ... ok
 424
 425 test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured
 426 ```
 427
 428 Now we have all three kinds of tests running! Note the names of the
 429 documentation tests: the `_0` is generated for the module test, and `add_two_0`
 430 for the function test. These will auto increment with names like `add_two_1` as
 431 you add more examples.
 432
 433 # Benchmark tests
 434
 435 Rust also supports benchmark tests, which can test the performance of your
 436 code. Let's make our `src/lib.rs` look like this (comments elided):
 437
 438 ```{rust,ignore}
 439 extern crate test;
 440
 441 pub fn add_two(a: i32) -> i32 {
 442     a + 2
 443 }
 444
 445 #[cfg(test)]
 446 mod tests {
 447     use super::*;
 448     use test::Bencher;
 449
 450     #[test]
 451     fn it_works() {
 452         assert_eq!(4, add_two(2));
 453     }
 454
 455     #[bench]
 456     fn bench_add_two(b: &mut Bencher) {
 457         b.iter(|| add_two(2));
 458     }
 459 }
 460 ```
 461
 462 We've imported the `test` crate, which contains our benchmarking support.
 463 We have a new function as well, with the `bench` attribute. Unlike regular
 464 tests, which take no arguments, benchmark tests take a `&mut Bencher`. This
 465 `Bencher` provides an `iter` method, which takes a closure. This closure
 466 contains the code we'd like to benchmark.
 467
 468 We can run benchmark tests with `cargo bench`:
 469
 470 ```bash
 471 $ cargo bench
 472    Compiling adder v0.0.1 (file:///home/steve/tmp/adder)
 473      Running target/release/adder-91b3e234d4ed382a
 474
 475 running 2 tests
 476 test tests::it_works ... ignored
 477 test tests::bench_add_two ... bench:         1 ns/iter (+/- 0)
 478
 479 test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured
 480 ```
 481
 482 Our non-benchmark test was ignored. You may have noticed that `cargo bench`
 483 takes a bit longer than `cargo test`. This is because Rust runs our benchmark
 484 a number of times, and then takes the average. Because we're doing so little
 485 work in this example, we have a `1 ns/iter (+/- 0)`, but this would show
 486 the variance if there was one.
 487
 488 Advice on writing benchmarks:
 489
 490
 491 * Move setup code outside the `iter` loop; only put the part you want to measure inside
 492 * Make the code do "the same thing" on each iteration; do not accumulate or change state
 493 * Make the outer function idempotent too; the benchmark runner is likely to run
 494   it many times
 495 *  Make the inner `iter` loop short and fast so benchmark runs are fast and the
 496    calibrator can adjust the run-length at fine resolution
 497 * Make the code in the `iter` loop do something simple, to assist in pinpointing
 498   performance improvements (or regressions)
 499
 500 ## Gotcha: optimizations
 501
 502 There's another tricky part to writing benchmarks: benchmarks compiled with
 503 optimizations activated can be dramatically changed by the optimizer so that
 504 the benchmark is no longer benchmarking what one expects. For example, the
 505 compiler might recognize that some calculation has no external effects and
 506 remove it entirely.
 507
 508 ```{rust,ignore}
 509 extern crate test;
 510 use test::Bencher;
 511
 512 #[bench]
 513 fn bench_xor_1000_ints(b: &mut Bencher) {
 514     b.iter(|| {
 515         (0..1000).fold(0, |old, new| old ^ new);
 516     });
 517 }
 518 ```
 519
 520 gives the following results
 521
 522 ```text
 523 running 1 test
 524 test bench_xor_1000_ints ... bench:         0 ns/iter (+/- 0)
 525
 526 test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
 527 ```
 528
 529 The benchmarking runner offers two ways to avoid this. Either, the closure that
 530 the `iter` method receives can return an arbitrary value which forces the
 531 optimizer to consider the result used and ensures it cannot remove the
 532 computation entirely. This could be done for the example above by adjusting the
 533 `b.iter` call to
 534
 535 ```rust
 536 # struct X;
 537 # impl X { fn iter<T, F>(&self, _: F) where F: FnMut() -> T {} } let b = X;
 538 b.iter(|| {
 539     // note lack of `;` (could also use an explicit `return`).
 540     (0..1000).fold(0, |old, new| old ^ new)
 541 });
 542 ```
 543
 544 Or, the other option is to call the generic `test::black_box` function, which
 545 is an opaque "black box" to the optimizer and so forces it to consider any
 546 argument as used.
 547
 548 ```rust
 549 # #![feature(test)]
 550
 551 extern crate test;
 552
 553 # fn main() {
 554 # struct X;
 555 # impl X { fn iter<T, F>(&self, _: F) where F: FnMut() -> T {} } let b = X;
 556 b.iter(|| {
 557     let n = test::black_box(1000);
 558
 559     (0..n).fold(0, |a, b| a ^ b)
 560 })
 561 # }
 562 ```
 563
 564 Neither of these read or modify the value, and are very cheap for small values.
 565 Larger values can be passed indirectly to reduce overhead (e.g.
 566 `black_box(&huge_struct)`).
 567
 568 Performing either of the above changes gives the following benchmarking results
 569
 570 ```text
 571 running 1 test
 572 test bench_xor_1000_ints ... bench:       131 ns/iter (+/- 3)
 573
 574 test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
 575 ```
 576
 577 However, the optimizer can still modify a testcase in an undesirable manner
 578 even when using either of the above.