src/doc/guide-testing.md

   1 % The Rust Testing Guide
   2
   3 # Quick start
   4
   5 To create test functions, add a `#[test]` attribute like this:
   6
   7 ~~~test_harness
   8 fn return_two() -> int {
   9     2
  10 }
  11
  12 #[test]
  13 fn return_two_test() {
  14     let x = return_two();
  15     assert!(x == 2);
  16 }
  17 ~~~
  18
  19 To run these tests, compile with `rustc --test` and run the resulting
  20 binary:
  21
  22 ~~~console
  23 $ rustc --test foo.rs
  24 $ ./foo
  25 running 1 test
  26 test return_two_test ... ok
  27
  28 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
  29 ~~~
  30
  31 `rustc foo.rs` will *not* compile the tests, since `#[test]` implies
  32 `#[cfg(test)]`. The `--test` flag to `rustc` implies `--cfg test`.
  33
  34
  35 # Unit testing in Rust
  36
  37 Rust has built in support for simple unit testing. Functions can be
  38 marked as unit tests using the `test` attribute.
  39
  40 ~~~test_harness
  41 #[test]
  42 fn return_none_if_empty() {
  43     // ... test code ...
  44 }
  45 ~~~
  46
  47 A test function's signature must have no arguments and no return
  48 value. To run the tests in a crate, it must be compiled with the
  49 `--test` flag: `rustc myprogram.rs --test -o myprogram-tests`. Running
  50 the resulting executable will run all the tests in the crate. A test
  51 is considered successful if its function returns; if the task running
  52 the test fails, through a call to `panic!`, a failed `assert`, or some
  53 other (`assert_eq`, ...) means, then the test fails.
  54
  55 When compiling a crate with the `--test` flag `--cfg test` is also
  56 implied, so that tests can be conditionally compiled.
  57
  58 ~~~test_harness
  59 #[cfg(test)]
  60 mod tests {
  61     #[test]
  62     fn return_none_if_empty() {
  63       // ... test code ...
  64     }
  65 }
  66 ~~~
  67
  68 Additionally `#[test]` items behave as if they also have the
  69 `#[cfg(test)]` attribute, and will not be compiled when the `--test` flag
  70 is not used.
  71
  72 Tests that should not be run can be annotated with the `ignore`
  73 attribute. The existence of these tests will be noted in the test
  74 runner output, but the test will not be run. Tests can also be ignored
  75 by configuration using the `cfg_attr` attribute so, for example, to ignore a
  76 test on windows you can write `#[cfg_attr(windows, ignore)]`.
  77
  78 Tests that are intended to fail can be annotated with the
  79 `should_fail` attribute. The test will be run, and if it causes its
  80 task to panic then the test will be counted as successful; otherwise it
  81 will be counted as a failure. For example:
  82
  83 ~~~test_harness
  84 #[test]
  85 #[should_fail]
  86 fn test_out_of_bounds_failure() {
  87     let v: &[int] = &[];
  88     v[0];
  89 }
  90 ~~~
  91
  92 `#[should_fail]` tests can be fragile as it's hard to guarantee that the test
  93 didn't fail for an unexpected reason. To help with this, an optional `expected`
  94 parameter can be added to the `should_fail` attribute. The test harness will
  95 make sure that the failure message contains the provided text. A safer version
  96 of the example above would be:
  97
  98 ~~~test_harness
  99 #[test]
 100 #[should_fail(expected = "index out of bounds")]
 101 fn test_out_of_bounds_failure() {
 102     let v: &[int] = &[];
 103     v[0];
 104 }
 105 ~~~
 106
 107 A test runner built with the `--test` flag supports a limited set of
 108 arguments to control which tests are run:
 109
 110 - the first free argument passed to a test runner is interpreted as a
 111   regular expression
 112   ([syntax reference](regex/index.html#syntax))
 113   and is used to narrow down the set of tests being run. Note: a plain
 114   string is a valid regular expression that matches itself.
 115 - the `--ignored` flag tells the test runner to run only tests with the
 116   `ignore` attribute.
 117
 118 ## Parallelism
 119
 120 By default, tests are run in parallel, which can make interpreting
 121 failure output difficult. In these cases you can set the
 122 `RUST_TEST_TASKS` environment variable to 1 to make the tests run
 123 sequentially.
 124
 125 ## Examples
 126
 127 ### Typical test run
 128
 129 ~~~console
 130 $ mytests
 131
 132 running 30 tests
 133 running driver::tests::mytest1 ... ok
 134 running driver::tests::mytest2 ... ignored
 135 ... snip ...
 136 running driver::tests::mytest30 ... ok
 137
 138 result: ok. 28 passed; 0 failed; 2 ignored
 139 ~~~
 140
 141 ### Test run with failures
 142
 143 ~~~console
 144 $ mytests
 145
 146 running 30 tests
 147 running driver::tests::mytest1 ... ok
 148 running driver::tests::mytest2 ... ignored
 149 ... snip ...
 150 running driver::tests::mytest30 ... FAILED
 151
 152 result: FAILED. 27 passed; 1 failed; 2 ignored
 153 ~~~
 154
 155 ### Running ignored tests
 156
 157 ~~~console
 158 $ mytests --ignored
 159
 160 running 2 tests
 161 running driver::tests::mytest2 ... failed
 162 running driver::tests::mytest10 ... ok
 163
 164 result: FAILED. 1 passed; 1 failed; 0 ignored
 165 ~~~
 166
 167 ### Running a subset of tests
 168
 169 Using a plain string:
 170
 171 ~~~console
 172 $ mytests mytest23
 173
 174 running 1 tests
 175 running driver::tests::mytest23 ... ok
 176
 177 result: ok. 1 passed; 0 failed; 0 ignored
 178 ~~~
 179
 180 Using some regular expression features:
 181
 182 ~~~console
 183 $ mytests 'mytest[145]'
 184
 185 running 13 tests
 186 running driver::tests::mytest1 ... ok
 187 running driver::tests::mytest4 ... ok
 188 running driver::tests::mytest5 ... ok
 189 running driver::tests::mytest10 ... ignored
 190 ... snip ...
 191 running driver::tests::mytest19 ... ok
 192
 193 result: ok. 13 passed; 0 failed; 1 ignored
 194 ~~~
 195
 196 # Microbenchmarking
 197
 198 The test runner also understands a simple form of benchmark execution.
 199 Benchmark functions are marked with the `#[bench]` attribute, rather
 200 than `#[test]`, and have a different form and meaning. They are
 201 compiled along with `#[test]` functions when a crate is compiled with
 202 `--test`, but they are not run by default. To run the benchmark
 203 component of your testsuite, pass `--bench` to the compiled test
 204 runner.
 205
 206 The type signature of a benchmark function differs from a unit test:
 207 it takes a mutable reference to type
 208 `test::Bencher`. Inside the benchmark function, any
 209 time-variable or "setup" code should execute first, followed by a call
 210 to `iter` on the benchmark harness, passing a closure that contains
 211 the portion of the benchmark you wish to actually measure the
 212 per-iteration speed of.
 213
 214 For benchmarks relating to processing/generating data, one can set the
 215 `bytes` field to the number of bytes consumed/produced in each
 216 iteration; this will be used to show the throughput of the benchmark.
 217 This must be the amount used in each iteration, *not* the total
 218 amount.
 219
 220 For example:
 221
 222 ~~~test_harness
 223 extern crate test;
 224
 225 use test::Bencher;
 226
 227 #[bench]
 228 fn bench_sum_1024_ints(b: &mut Bencher) {
 229     let v = Vec::from_fn(1024, |n| n);
 230     b.iter(|| v.iter().fold(0, |old, new| old + *new));
 231 }
 232
 233 #[bench]
 234 fn initialise_a_vector(b: &mut Bencher) {
 235     b.iter(|| Vec::from_elem(1024, 0u64));
 236     b.bytes = 1024 * 8;
 237 }
 238 ~~~
 239
 240 The benchmark runner will calibrate measurement of the benchmark
 241 function to run the `iter` block "enough" times to get a reliable
 242 measure of the per-iteration speed.
 243
 244 Advice on writing benchmarks:
 245
 246   - Move setup code outside the `iter` loop; only put the part you
 247     want to measure inside
 248   - Make the code do "the same thing" on each iteration; do not
 249     accumulate or change state
 250   - Make the outer function idempotent too; the benchmark runner is
 251     likely to run it many times
 252   - Make the inner `iter` loop short and fast so benchmark runs are
 253     fast and the calibrator can adjust the run-length at fine
 254     resolution
 255   - Make the code in the `iter` loop do something simple, to assist in
 256     pinpointing performance improvements (or regressions)
 257
 258 To run benchmarks, pass the `--bench` flag to the compiled
 259 test-runner. Benchmarks are compiled-in but not executed by default.
 260
 261 ~~~console
 262 $ rustc mytests.rs -O --test
 263 $ mytests --bench
 264
 265 running 2 tests
 266 test bench_sum_1024_ints ... bench: 709 ns/iter (+/- 82)
 267 test initialise_a_vector ... bench: 424 ns/iter (+/- 99) = 19320 MB/s
 268
 269 test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured
 270 ~~~
 271
 272 ## Benchmarks and the optimizer
 273
 274 Benchmarks compiled with optimizations activated can be dramatically
 275 changed by the optimizer so that the benchmark is no longer
 276 benchmarking what one expects. For example, the compiler might
 277 recognize that some calculation has no external effects and remove
 278 it entirely.
 279
 280 ~~~test_harness
 281 extern crate test;
 282 use test::Bencher;
 283
 284 #[bench]
 285 fn bench_xor_1000_ints(b: &mut Bencher) {
 286     b.iter(|| {
 287         range(0u, 1000).fold(0, |old, new| old ^ new);
 288     });
 289 }
 290 ~~~
 291
 292 gives the following results
 293
 294 ~~~console
 295 running 1 test
 296 test bench_xor_1000_ints ... bench:         0 ns/iter (+/- 0)
 297
 298 test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
 299 ~~~
 300
 301 The benchmarking runner offers two ways to avoid this. Either, the
 302 closure that the `iter` method receives can return an arbitrary value
 303 which forces the optimizer to consider the result used and ensures it
 304 cannot remove the computation entirely. This could be done for the
 305 example above by adjusting the `b.iter` call to
 306
 307 ~~~
 308 # struct X; impl X { fn iter<T>(&self, _: || -> T) {} } let b = X;
 309 b.iter(|| {
 310     // note lack of `;` (could also use an explicit `return`).
 311     range(0u, 1000).fold(0, |old, new| old ^ new)
 312 });
 313 ~~~
 314
 315 Or, the other option is to call the generic `test::black_box`
 316 function, which is an opaque "black box" to the optimizer and so
 317 forces it to consider any argument as used.
 318
 319 ~~~
 320 extern crate test;
 321
 322 # fn main() {
 323 # struct X; impl X { fn iter<T>(&self, _: || -> T) {} } let b = X;
 324 b.iter(|| {
 325     test::black_box(range(0u, 1000).fold(0, |old, new| old ^ new));
 326 });
 327 # }
 328 ~~~
 329
 330 Neither of these read or modify the value, and are very cheap for
 331 small values. Larger values can be passed indirectly to reduce
 332 overhead (e.g. `black_box(&huge_struct)`).
 333
 334 Performing either of the above changes gives the following
 335 benchmarking results
 336
 337 ~~~console
 338 running 1 test
 339 test bench_xor_1000_ints ... bench:       375 ns/iter (+/- 148)
 340
 341 test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
 342 ~~~
 343
 344 However, the optimizer can still modify a testcase in an undesirable
 345 manner even when using either of the above. Benchmarks can be checked
 346 by hand by looking at the output of the compiler using the `--emit=ir`
 347 (for LLVM IR), `--emit=asm` (for assembly) or compiling normally and
 348 using any method for examining object code.
 349
 350 ## Saving and ratcheting metrics
 351
 352 When running benchmarks or other tests, the test runner can record
 353 per-test "metrics". Each metric is a scalar `f64` value, plus a noise
 354 value which represents uncertainty in the measurement. By default, all
 355 `#[bench]` benchmarks are recorded as metrics, which can be saved as
 356 JSON in an external file for further reporting.
 357
 358 In addition, the test runner supports _ratcheting_ against a metrics
 359 file. Ratcheting is like saving metrics, except that after each run,
 360 if the output file already exists the results of the current run are
 361 compared against the contents of the existing file, and any regression
 362 _causes the testsuite to fail_. If the comparison passes -- if all
 363 metrics stayed the same (within noise) or improved -- then the metrics
 364 file is overwritten with the new values. In this way, a metrics file
 365 in your workspace can be used to ensure your work does not regress
 366 performance.
 367
 368 Test runners take 3 options that are relevant to metrics:
 369
 370   - `--save-metrics=<file.json>` will save the metrics from a test run
 371     to `file.json`
 372   - `--ratchet-metrics=<file.json>` will ratchet the metrics against
 373     the `file.json`
 374   - `--ratchet-noise-percent=N` will override the noise measurements
 375     in `file.json`, and consider a metric change less than `N%` to be
 376     noise. This can be helpful if you are testing in a noisy
 377     environment where the benchmark calibration loop cannot acquire a
 378     clear enough signal.