src/doc/guide-testing.md

   1 % The Rust Testing Guide
   2
   3 # Quick start
   4
   5 To create test functions, add a `#[test]` attribute like this:
   6
   7 ~~~test_harness
   8 fn return_two() -> int {
   9     2
  10 }
  11
  12 #[test]
  13 fn return_two_test() {
  14     let x = return_two();
  15     assert!(x == 2);
  16 }
  17 ~~~
  18
  19 To run these tests, compile with `rustc --test` and run the resulting
  20 binary:
  21
  22 ~~~console
  23 $ rustc --test foo.rs
  24 $ ./foo
  25 running 1 test
  26 test return_two_test ... ok
  27
  28 test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
  29 ~~~
  30
  31 `rustc foo.rs` will *not* compile the tests, since `#[test]` implies
  32 `#[cfg(test)]`. The `--test` flag to `rustc` implies `--cfg test`.
  33
  34
  35 # Unit testing in Rust
  36
  37 Rust has built in support for simple unit testing. Functions can be
  38 marked as unit tests using the `test` attribute.
  39
  40 ~~~test_harness
  41 #[test]
  42 fn return_none_if_empty() {
  43     // ... test code ...
  44 }
  45 ~~~
  46
  47 A test function's signature must have no arguments and no return
  48 value. To run the tests in a crate, it must be compiled with the
  49 `--test` flag: `rustc myprogram.rs --test -o myprogram-tests`. Running
  50 the resulting executable will run all the tests in the crate. A test
  51 is considered successful if its function returns; if the task running
  52 the test fails, through a call to `fail!`, a failed `assert`, or some
  53 other (`assert_eq`, ...) means, then the test fails.
  54
  55 When compiling a crate with the `--test` flag `--cfg test` is also
  56 implied, so that tests can be conditionally compiled.
  57
  58 ~~~test_harness
  59 #[cfg(test)]
  60 mod tests {
  61     #[test]
  62     fn return_none_if_empty() {
  63       // ... test code ...
  64     }
  65 }
  66 ~~~
  67
  68 Additionally `#[test]` items behave as if they also have the
  69 `#[cfg(test)]` attribute, and will not be compiled when the `--test` flag
  70 is not used.
  71
  72 Tests that should not be run can be annotated with the `ignore`
  73 attribute. The existence of these tests will be noted in the test
  74 runner output, but the test will not be run. Tests can also be ignored
  75 by configuration so, for example, to ignore a test on windows you can
  76 write `#[ignore(cfg(target_os = "win32"))]`.
  77
  78 Tests that are intended to fail can be annotated with the
  79 `should_fail` attribute. The test will be run, and if it causes its
  80 task to fail then the test will be counted as successful; otherwise it
  81 will be counted as a failure. For example:
  82
  83 ~~~test_harness
  84 #[test]
  85 #[should_fail]
  86 fn test_out_of_bounds_failure() {
  87     let v: &[int] = [];
  88     v[0];
  89 }
  90 ~~~
  91
  92 A test runner built with the `--test` flag supports a limited set of
  93 arguments to control which tests are run:
  94
  95 - the first free argument passed to a test runner is interpreted as a
  96   regular expression
  97   ([syntax reference](regex/index.html#syntax))
  98   and is used to narrow down the set of tests being run. Note: a plain
  99   string is a valid regular expression that matches itself.
 100 - the `--ignored` flag tells the test runner to run only tests with the
 101   `ignore` attribute.
 102
 103 ## Parallelism
 104
 105 By default, tests are run in parallel, which can make interpreting
 106 failure output difficult. In these cases you can set the
 107 `RUST_TEST_TASKS` environment variable to 1 to make the tests run
 108 sequentially.
 109
 110 ## Examples
 111
 112 ### Typical test run
 113
 114 ~~~console
 115 $ mytests
 116
 117 running 30 tests
 118 running driver::tests::mytest1 ... ok
 119 running driver::tests::mytest2 ... ignored
 120 ... snip ...
 121 running driver::tests::mytest30 ... ok
 122
 123 result: ok. 28 passed; 0 failed; 2 ignored
 124 ~~~
 125
 126 ### Test run with failures
 127
 128 ~~~console
 129 $ mytests
 130
 131 running 30 tests
 132 running driver::tests::mytest1 ... ok
 133 running driver::tests::mytest2 ... ignored
 134 ... snip ...
 135 running driver::tests::mytest30 ... FAILED
 136
 137 result: FAILED. 27 passed; 1 failed; 2 ignored
 138 ~~~
 139
 140 ### Running ignored tests
 141
 142 ~~~console
 143 $ mytests --ignored
 144
 145 running 2 tests
 146 running driver::tests::mytest2 ... failed
 147 running driver::tests::mytest10 ... ok
 148
 149 result: FAILED. 1 passed; 1 failed; 0 ignored
 150 ~~~
 151
 152 ### Running a subset of tests
 153
 154 Using a plain string:
 155
 156 ~~~console
 157 $ mytests mytest23
 158
 159 running 1 tests
 160 running driver::tests::mytest23 ... ok
 161
 162 result: ok. 1 passed; 0 failed; 0 ignored
 163 ~~~
 164
 165 Using some regular expression features:
 166
 167 ~~~console
 168 $ mytests 'mytest[145]'
 169
 170 running 13 tests
 171 running driver::tests::mytest1 ... ok
 172 running driver::tests::mytest4 ... ok
 173 running driver::tests::mytest5 ... ok
 174 running driver::tests::mytest10 ... ignored
 175 ... snip ...
 176 running driver::tests::mytest19 ... ok
 177
 178 result: ok. 13 passed; 0 failed; 1 ignored
 179 ~~~
 180
 181 # Microbenchmarking
 182
 183 The test runner also understands a simple form of benchmark execution.
 184 Benchmark functions are marked with the `#[bench]` attribute, rather
 185 than `#[test]`, and have a different form and meaning. They are
 186 compiled along with `#[test]` functions when a crate is compiled with
 187 `--test`, but they are not run by default. To run the benchmark
 188 component of your testsuite, pass `--bench` to the compiled test
 189 runner.
 190
 191 The type signature of a benchmark function differs from a unit test:
 192 it takes a mutable reference to type
 193 `test::Bencher`. Inside the benchmark function, any
 194 time-variable or "setup" code should execute first, followed by a call
 195 to `iter` on the benchmark harness, passing a closure that contains
 196 the portion of the benchmark you wish to actually measure the
 197 per-iteration speed of.
 198
 199 For benchmarks relating to processing/generating data, one can set the
 200 `bytes` field to the number of bytes consumed/produced in each
 201 iteration; this will be used to show the throughput of the benchmark.
 202 This must be the amount used in each iteration, *not* the total
 203 amount.
 204
 205 For example:
 206
 207 ~~~test_harness
 208 extern crate test;
 209
 210 use test::Bencher;
 211
 212 #[bench]
 213 fn bench_sum_1024_ints(b: &mut Bencher) {
 214     let v = Vec::from_fn(1024, |n| n);
 215     b.iter(|| v.iter().fold(0, |old, new| old + *new));
 216 }
 217
 218 #[bench]
 219 fn initialise_a_vector(b: &mut Bencher) {
 220     b.iter(|| Vec::from_elem(1024, 0u64));
 221     b.bytes = 1024 * 8;
 222 }
 223 ~~~
 224
 225 The benchmark runner will calibrate measurement of the benchmark
 226 function to run the `iter` block "enough" times to get a reliable
 227 measure of the per-iteration speed.
 228
 229 Advice on writing benchmarks:
 230
 231   - Move setup code outside the `iter` loop; only put the part you
 232     want to measure inside
 233   - Make the code do "the same thing" on each iteration; do not
 234     accumulate or change state
 235   - Make the outer function idempotent too; the benchmark runner is
 236     likely to run it many times
 237   - Make the inner `iter` loop short and fast so benchmark runs are
 238     fast and the calibrator can adjust the run-length at fine
 239     resolution
 240   - Make the code in the `iter` loop do something simple, to assist in
 241     pinpointing performance improvements (or regressions)
 242
 243 To run benchmarks, pass the `--bench` flag to the compiled
 244 test-runner. Benchmarks are compiled-in but not executed by default.
 245
 246 ~~~console
 247 $ rustc mytests.rs -O --test
 248 $ mytests --bench
 249
 250 running 2 tests
 251 test bench_sum_1024_ints ... bench: 709 ns/iter (+/- 82)
 252 test initialise_a_vector ... bench: 424 ns/iter (+/- 99) = 19320 MB/s
 253
 254 test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured
 255 ~~~
 256
 257 ## Benchmarks and the optimizer
 258
 259 Benchmarks compiled with optimizations activated can be dramatically
 260 changed by the optimizer so that the benchmark is no longer
 261 benchmarking what one expects. For example, the compiler might
 262 recognize that some calculation has no external effects and remove
 263 it entirely.
 264
 265 ~~~test_harness
 266 extern crate test;
 267 use test::Bencher;
 268
 269 #[bench]
 270 fn bench_xor_1000_ints(b: &mut Bencher) {
 271     b.iter(|| {
 272         range(0u, 1000).fold(0, |old, new| old ^ new);
 273     });
 274 }
 275 ~~~
 276
 277 gives the following results
 278
 279 ~~~console
 280 running 1 test
 281 test bench_xor_1000_ints ... bench:         0 ns/iter (+/- 0)
 282
 283 test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
 284 ~~~
 285
 286 The benchmarking runner offers two ways to avoid this. Either, the
 287 closure that the `iter` method receives can return an arbitrary value
 288 which forces the optimizer to consider the result used and ensures it
 289 cannot remove the computation entirely. This could be done for the
 290 example above by adjusting the `bh.iter` call to
 291
 292 ~~~
 293 # struct X; impl X { fn iter<T>(&self, _: || -> T) {} } let b = X;
 294 b.iter(|| {
 295     // note lack of `;` (could also use an explicit `return`).
 296     range(0u, 1000).fold(0, |old, new| old ^ new)
 297 });
 298 ~~~
 299
 300 Or, the other option is to call the generic `test::black_box`
 301 function, which is an opaque "black box" to the optimizer and so
 302 forces it to consider any argument as used.
 303
 304 ~~~
 305 extern crate test;
 306
 307 # fn main() {
 308 # struct X; impl X { fn iter<T>(&self, _: || -> T) {} } let b = X;
 309 b.iter(|| {
 310     test::black_box(range(0u, 1000).fold(0, |old, new| old ^ new));
 311 });
 312 # }
 313 ~~~
 314
 315 Neither of these read or modify the value, and are very cheap for
 316 small values. Larger values can be passed indirectly to reduce
 317 overhead (e.g. `black_box(&huge_struct)`).
 318
 319 Performing either of the above changes gives the following
 320 benchmarking results
 321
 322 ~~~console
 323 running 1 test
 324 test bench_xor_1000_ints ... bench:       375 ns/iter (+/- 148)
 325
 326 test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
 327 ~~~
 328
 329 However, the optimizer can still modify a testcase in an undesirable
 330 manner even when using either of the above. Benchmarks can be checked
 331 by hand by looking at the output of the compiler using the `--emit=ir`
 332 (for LLVM IR), `--emit=asm` (for assembly) or compiling normally and
 333 using any method for examining object code.
 334
 335 ## Saving and ratcheting metrics
 336
 337 When running benchmarks or other tests, the test runner can record
 338 per-test "metrics". Each metric is a scalar `f64` value, plus a noise
 339 value which represents uncertainty in the measurement. By default, all
 340 `#[bench]` benchmarks are recorded as metrics, which can be saved as
 341 JSON in an external file for further reporting.
 342
 343 In addition, the test runner supports _ratcheting_ against a metrics
 344 file. Ratcheting is like saving metrics, except that after each run,
 345 if the output file already exists the results of the current run are
 346 compared against the contents of the existing file, and any regression
 347 _causes the testsuite to fail_. If the comparison passes -- if all
 348 metrics stayed the same (within noise) or improved -- then the metrics
 349 file is overwritten with the new values. In this way, a metrics file
 350 in your workspace can be used to ensure your work does not regress
 351 performance.
 352
 353 Test runners take 3 options that are relevant to metrics:
 354
 355   - `--save-metrics=<file.json>` will save the metrics from a test run
 356     to `file.json`
 357   - `--ratchet-metrics=<file.json>` will ratchet the metrics against
 358     the `file.json`
 359   - `--ratchet-noise-percent=N` will override the noise measurements
 360     in `file.json`, and consider a metric change less than `N%` to be
 361     noise. This can be helpful if you are testing in a noisy
 362     environment where the benchmark calibration loop cannot acquire a
 363     clear enough signal.