Auto merge of #1699 - m-ou-se:panic-format, r=RalfJung

[rust.git] / README.md
diff --git a/README.md b/README.md

index f755f85b601eddc16003c20cc10cd5dbd682303d..ae70c80d5f0a1e3b5d72757c6447043760b61c04 100644 (file)
--- a/README.md
+++ b/README.md
@@ -1,5 +1,9 @@
-# Miri [![Build Status](https://travis-ci.com/rust-lang/miri.svg?branch=master)](https://travis-ci.com/rust-lang/miri) [![Windows build status](https://ci.appveyor.com/api/projects/status/github/rust-lang/miri?svg=true)](https://ci.appveyor.com/project/rust-lang-libs/miri)
+# Miri
  
+[![Actions build status][actions-badge]][actions-url]
+
+[actions-badge]: https://github.com/rust-lang/miri/workflows/CI/badge.svg?branch=master
+[actions-url]: https://github.com/rust-lang/miri/actions
  
  An experimental interpreter for [Rust][rust]'s
  [mid-level intermediate representation][mir] (MIR).  It can run binaries and
@@ -16,13 +20,24 @@ for example:
    or an invalid enum discriminant)
  * **Experimental**: Violations of the [Stacked Borrows] rules governing aliasing
    for reference types
+* **Experimental**: Data races (but no weak memory effects)
+
+On top of that, Miri will also tell you about memory leaks: when there is memory
+still allocated at the end of the execution, and that memory is not reachable
+from a global `static`, Miri will raise an error.
+
+You can use Miri to emulate programs on other targets, e.g. to ensure that
+byte-level data manipulation works correctly both on little-endian and
+big-endian systems. See
+[cross-interpretation](#cross-interpretation-running-for-different-targets)
+below.
  
  Miri has already discovered some [real-world bugs](#bugs-found-by-miri).  If you
  found a bug with Miri, we'd appreciate if you tell us and we'll add it to the
  list!
  
-Be aware that Miri will **not catch all cases of undefined behavior** in your
-program, and cannot run all programs:
+However, be aware that Miri will **not catch all cases of undefined behavior**
+in your program, and cannot run all programs:
  
  * There are still plenty of open questions around the basic invariants for some
    types and when these invariants even have to hold. Miri tries to avoid false
@@ -35,14 +50,15 @@ program, and cannot run all programs:
    still run fine in Miri -- but might break (including causing UB) on different
    compiler versions or different platforms.
  * Program execution is non-deterministic when it depends, for example, on where
-  exactly in memory allocations end up. Miri tests one of many possible
-  executions of your program. If your code is sensitive to allocation base
-  addresses or other non-deterministic data, try running Miri with different
-  values for `-Zmiri-seed` to test different executions.
+  exactly in memory allocations end up, or on the exact interleaving of
+  concurrent threads. Miri tests one of many possible executions of your
+  program. You can alleviate this to some extent by running Miri with different
+  values for `-Zmiri-seed`, but that will still by far not explore all possible
+  executions.
  * Miri runs the program as a platform-independent interpreter, so the program
    has no access to most platform-specific APIs or FFI. A few APIs have been
    implemented (such as printing to stdout) but most have not: for example, Miri
-  currently does not support concurrency, or SIMD, or networking.
+  currently does not support SIMD or networking.
  
  [rust]: https://www.rust-lang.org/
  [mir]: https://github.com/rust-lang/rfcs/blob/master/text/1211-mir.md
@@ -76,18 +92,13 @@ Now you can run your project in Miri:
  The first time you run Miri, it will perform some extra setup and install some
  dependencies.  It will ask you for confirmation before installing anything.
  
-Miri supports cross-execution: if you want to run the program as if it was a
-Linux program, you can do `cargo miri run --target x86_64-unknown-linux-gnu`.
-This is particularly useful if you are using Windows, as the Linux target is
-much better supported than Windows targets.
+`cargo miri run/test` supports the exact same flags as `cargo run/test`.  You
+can pass arguments to Miri via `MIRIFLAGS`. For example,
+`MIRIFLAGS="-Zmiri-disable-stacked-borrows" cargo miri run` runs the program
+without checking the aliasing of references.
  
-You can pass arguments to Miri after the first `--`, and pass arguments to the
-interpreted program or test suite after the second `--`.  For example, `cargo
-miri run -- -Zmiri-disable-validation` runs the program without validation of
-basic type invariants and without checking the aliasing of references.
-
-When compiling code via `cargo miri`, the `miri` config flag is set.  You can
-use this to ignore test cases that will fail under Miri because they do things
+When compiling code via `cargo miri`, the `cfg(miri)` config flag is set.  You
+can use this to ignore test cases that fail under Miri because they do things
  Miri does not support:
  
  ```rust
@@ -100,17 +111,29 @@ fn does_not_work_on_miri() {
  }
  ```
  
-An exhaustive list of what `miri` does not support is not available, as this could be
-an unbounded set with FFI and more. However `miri` will explicitly tell you when it finds
-something unsupported with an error, containing a message such as:
+There is no way to list all the infinite things Miri cannot do, but the
+interpreter will explicitly tell you when it finds something unsupported:
  
  ```
-error: unsupported operation: can't call foreign function: mach_timebase_info
+error: unsupported operation: can't call foreign function: bind
      ...
      = help: this is likely not a bug in the program; it indicates that the program \
              performed an operation that the interpreter does not support
  ```
  
+### Cross-interpretation: running for different targets
+
+Miri can not only run a binary or test suite for your host target, it can also
+perform cross-interpretation for arbitrary foreign targets: `cargo miri run
+--target x86_64-unknown-linux-gnu` will run your program as if it was a Linux
+program, no matter your host OS.  This is particularly useful if you are using
+Windows, as the Linux target is much better supported than Windows targets.
+
+You can also use this to test platforms with different properties than your host
+platform.  For example `cargo miri test --target mips64-unknown-linux-gnuabi64`
+will run your test suite on a big-endian target, which is useful for testing
+endian-sensitive code.
+
  ### Running Miri on CI
  
  To run Miri on CI, make sure that you handle the case where the latest nightly
@@ -123,21 +146,29 @@ MIRI_NIGHTLY=nightly-$(curl -s https://rust-lang.github.io/rustup-components-his
  echo "Installing latest nightly with Miri: $MIRI_NIGHTLY"
  rustup set profile minimal
  rustup default "$MIRI_NIGHTLY"
-
  rustup component add miri
-cargo miri setup
  
  cargo miri test
  ```
  
-We use `cargo miri setup` to avoid getting interactive questions about the extra
-setup needed for Miri.
-
  ### Common Problems
  
  When using the above instructions, you may encounter a number of confusing compiler
  errors.
  
+### "note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace"
+
+You may see this when trying to get Miri to display a backtrace. By default, Miri
+doesn't expose any environment to the program, so running
+`RUST_BACKTRACE=1 cargo miri test` will not do what you expect.
+
+To get a backtrace, you need to disable isolation
+[using `-Zmiri-disable-isolation`](#miri-flags):
+
+```sh
+RUST_BACKTRACE=1 MIRIFLAGS="-Zmiri-disable-isolation" cargo miri test
+```
+
  #### "found possibly newer version of crate `std` which `<dependency>` depends on"
  
  Your build directory may contain artifacts from an earlier build that have/have
@@ -154,36 +185,79 @@ Try deleting `~/.cache/miri`.
  
  This means the sysroot you are using was not compiled with Miri in mind.  This
  should never happen when you use `cargo miri` because that takes care of setting
-up the sysroot.  If you are using `miri` (the Miri driver) directly, see
-[below][testing-miri] for how to set up the sysroot.
+up the sysroot.  If you are using `miri` (the Miri driver) directly, see the
+[contributors' guide](CONTRIBUTING.md) for how to use `./miri` to best do that.
  
  
  ## Miri `-Z` flags and environment variables
  [miri-flags]: #miri--z-flags-and-environment-variables
  
-Several `-Z` flags are relevant for Miri:
-
-* `-Zmiri-seed=<hex>` is a custom `-Z` flag added by Miri.  It configures the
-  seed of the RNG that Miri uses to resolve non-determinism.  This RNG is used
-  to pick base addresses for allocations.  When isolation is enabled (the default),
-  this is also used to emulate system entropy.  The default seed is 0.
-  **NOTE**: This entropy is not good enough for cryptographic use!  Do not
-  generate secret keys in Miri or perform other kinds of cryptographic
-  operations that rely on proper random numbers.
-* `-Zmiri-disable-validation` disables enforcing validity invariants, which are
-  enforced by default.  This is mostly useful for debugging.  It means Miri will
-  miss bugs in your program.  However, this can also help to make Miri run
-  faster.
+Miri adds its own set of `-Z` flags, which are usually set via the `MIRIFLAGS`
+environment variable:
+
+* `-Zmiri-compare-exchange-weak-failure-rate=<rate>` changes the failure rate of
+  `compare_exchange_weak` operations. The default is `0.8` (so 4 out of 5 weak ops will fail).
+  You can change it to any value between `0.0` and `1.0`, where `1.0` means it
+  will always fail and `0.0` means it will never fail.
+* `-Zmiri-disable-alignment-check` disables checking pointer alignment, so you
+  can focus on other failures, but it means Miri can miss bugs in your program.
+  Using this flag is **unsound**.
+* `-Zmiri-disable-data-race-detector` disables checking for data races.  Using
+  this flag is **unsound**.
  * `-Zmiri-disable-stacked-borrows` disables checking the experimental
    [Stacked Borrows] aliasing rules.  This can make Miri run faster, but it also
-  means no aliasing violations will be detected.
+  means no aliasing violations will be detected.  Using this flag is **unsound**
+  (but the affected soundness rules are experimental).
+* `-Zmiri-disable-validation` disables enforcing validity invariants, which are
+  enforced by default.  This is mostly useful to focus on other failures (such
+  as out-of-bounds accesses) first.  Setting this flag means Miri can miss bugs
+  in your program.  However, this can also help to make Miri run faster.  Using
+  this flag is **unsound**.
  * `-Zmiri-disable-isolation` disables host isolation.  As a consequence,
    the program has access to host resources such as environment variables, file
    systems, and randomness.
-* `-Zmiri-ignore-leaks` disables the memory leak checker.
  * `-Zmiri-env-exclude=<var>` keeps the `var` environment variable isolated from
-  the host. Can be used multiple times to exclude several variables. The `TERM`
-  environment variable is excluded by default.
+  the host so that it cannot be accessed by the program.  Can be used multiple
+  times to exclude several variables.  On Windows, the `TERM` environment
+  variable is excluded by default.
+* `-Zmiri-ignore-leaks` disables the memory leak checker.
+* `-Zmiri-seed=<hex>` configures the seed of the RNG that Miri uses to resolve
+  non-determinism.  This RNG is used to pick base addresses for allocations.
+  When isolation is enabled (the default), this is also used to emulate system
+  entropy.  The default seed is 0.  **NOTE**: This entropy is not good enough
+  for cryptographic use!  Do not generate secret keys in Miri or perform other
+  kinds of cryptographic operations that rely on proper random numbers.
+* `-Zmiri-symbolic-alignment-check` makes the alignment check more strict.  By
+  default, alignment is checked by casting the pointer to an integer, and making
+  sure that is a multiple of the alignment.  This can lead to cases where a
+  program passes the alignment check by pure chance, because things "happened to
+  be" sufficiently aligned -- there is no UB in this execution but there would
+  be UB in others.  To avoid such cases, the symbolic alignment check only takes
+  into account the requested alignment of the relevant allocation, and the
+  offset into that allocation.  This avoids missing such bugs, but it also
+  incurs some false positives when the code does manual integer arithmetic to
+  ensure alignment.  (The standard library `align_to` method works fine in both
+  modes; under symbolic alignment it only fills the middle slice when the
+  allocation guarantees sufficient alignment.)
+* `-Zmiri-track-alloc-id=<id>` shows a backtrace when the given allocation is
+  being allocated or freed.  This helps in debugging memory leaks and
+  use after free bugs.
+* `-Zmiri-track-call-id=<id>` shows a backtrace when the given call id is
+  assigned to a stack frame.  This helps in debugging UB related to Stacked
+  Borrows "protectors".
+* `-Zmiri-track-pointer-tag=<tag>` shows a backtrace when the given pointer tag
+  is popped from a borrow stack (which is where the tag becomes invalid and any
+  future use of it will error).  This helps you in finding out why UB is
+  happening and where in your code would be a good place to look for it.
+* `-Zmiri-track-raw-pointers` makes Stacked Borrows track a pointer tag even for
+  raw pointers. This can make valid code fail to pass the checks, but also can
+  help identify latent aliasing issues in code that Miri accepts by default. You
+  can recognize false positives by "<untagged>" occurring in the message -- this
+  indicates a pointer that was cast from an integer, so Miri was unable to track
+  this pointer.
+
+Some native rustc `-Z` flags are also very relevant for Miri:
+
  * `-Zmir-opt-level` controls how many MIR optimizations are performed.  Miri
    overrides the default to be `0`; be advised that using any higher level can
    make Miri miss bugs in your program because they got optimized away.
@@ -191,26 +265,92 @@ Several `-Z` flags are relevant for Miri:
    functions.  This is needed so that Miri can execute such functions, so Miri
    sets this flag per default.
  * `-Zmir-emit-retag` controls whether `Retag` statements are emitted. Miri
-  enables this per default because it is needed for validation.
-* `-Zmiri-track-pointer-tag=<tag>` shows a backtrace when the given pointer tag
-  is popped from a borrow stack (which is where the tag becomes invalid and any
-  future use of it will error).  This helps you in finding out why UB is
-  happening and where in your code would be a good place to look for it.
-* `-Zmiri-track-alloc-id=<id>` shows a backtrace when the given allocation is
-  being allocated.  This helps in debugging memory leaks.
+  enables this per default because it is needed for [Stacked Borrows].
  
  Moreover, Miri recognizes some environment variables:
  
  * `MIRI_LOG`, `MIRI_BACKTRACE` control logging and backtrace printing during
-  Miri executions, also [see above][testing-miri].
+  Miri executions, also [see "Testing the Miri driver" in `CONTRIBUTING.md`][testing-miri].
+* `MIRIFLAGS` (recognized by `cargo miri` and the test suite) defines extra
+  flags to be passed to Miri.
  * `MIRI_SYSROOT` (recognized by `cargo miri` and the test suite)
    indicates the sysroot to use.  To do the same thing with `miri`
    directly, use the `--sysroot` flag.
  * `MIRI_TEST_TARGET` (recognized by the test suite) indicates which target
    architecture to test against.  `miri` and `cargo miri` accept the `--target`
    flag for the same purpose.
-* `MIRI_TEST_FLAGS` (recognized by the test suite) defines extra flags to be
-  passed to Miri.
+
+The following environment variables are internal, but used to communicate between
+different Miri binaries, and as such worth documenting:
+
+* `MIRI_BE_RUSTC` when set to any value tells the Miri driver to actually not
+  interpret the code but compile it like rustc would. This is useful to be sure
+  that the compiled `rlib`s are compatible with Miri.
+  When set while running `cargo-miri`, it indicates that we are part of a sysroot
+  build (for which some crates need special treatment).
+* `MIRI_CWD` when set to any value tells the Miri driver to change to the given
+  directory after loading all the source files, but before commencing
+  interpretation. This is useful if the interpreted program wants a different
+  working directory at run-time than at build-time.
+* `MIRI_VERBOSE` when set to any value tells the various `cargo-miri` phases to
+  perform verbose logging.
+  
+[testing-miri]: CONTRIBUTING.md#testing-the-miri-driver
+
+## Miri `extern` functions
+
+Miri provides some `extern` functions that programs can import to access
+Miri-specific functionality:
+
+```rust
+#[cfg(miri)]
+extern "Rust" {
+    /// Miri-provided extern function to mark the block `ptr` points to as a "root"
+    /// for some static memory. This memory and everything reachable by it is not
+    /// considered leaking even if it still exists when the program terminates.
+    ///
+    /// `ptr` has to point to the beginning of an allocated block.
+    fn miri_static_root(ptr: *const u8);
+
+    /// Miri-provided extern function to obtain a backtrace of the current call stack.
+    /// This returns a boxed slice of pointers - each pointer is an opaque value
+    /// that is only useful when passed to `miri_resolve_frame`
+    /// The `flags` argument must be `0`.
+    fn miri_get_backtrace(flags: u64) -> Box<[*mut ()]>;
+
+    /// Miri-provided extern function to resolve a frame pointer obtained
+    /// from `miri_get_backtrace`. The `flags` argument must be `0`,
+    /// and `MiriFrame` should be declared as follows:
+    ///
+    /// ```rust
+    /// #[repr(C)]
+    /// struct MiriFrame {
+    ///     // The name of the function being executed, encoded in UTF-8
+    ///     name: Box<[u8]>,
+    ///     // The filename of the function being executed, encoded in UTF-8
+    ///     filename: Box<[u8]>,
+    ///     // The line number currently being executed in `filename`, starting from '1'.
+    ///     lineno: u32,
+    ///     // The column number currently being executed in `filename`, starting from '1'.
+    ///     colno: u32,
+    ///     // The function pointer to the function currently being executed.
+    ///     // This can be compared against function pointers obtained by
+    ///     // casting a function (e.g. `my_fn as *mut ()`)
+    ///     fn_ptr: *mut ()
+    /// }
+    /// ```
+    ///
+    /// The fields must be declared in exactly the same order as they appear in `MiriFrame` above.
+    /// This function can be called on any thread (not just the one which obtained `frame`).
+    fn miri_resolve_frame(frame: *mut (), flags: u64) -> MiriFrame;
+
+    /// Miri-provided extern function to begin unwinding with the given payload.
+    ///
+    /// This is internal and unstable and should not be used; we give it here
+    /// just to be complete.
+    fn miri_start_panic(payload: *mut u8) -> !;
+}
+```
  
  ## Contributing and getting help
  
@@ -218,9 +358,9 @@ If you want to contribute to Miri, great!  Please check out our
  [contribution guide](CONTRIBUTING.md).
  
  For help with running Miri, you can open an issue here on
-GitHub or contact us (`oli-obk` and `RalfJ`) on the [Rust Zulip].
+GitHub or use the [Miri stream on the Rust Zulip][zulip].
  
-[Rust Zulip]: https://rust-lang.zulipchat.com
+[zulip]: https://rust-lang.zulipchat.com/#narrow/stream/269128-miri
  
  ## History
  
@@ -259,24 +399,43 @@ Definite bugs found:
  * [The Unix allocator calling `posix_memalign` in an invalid way](https://github.com/rust-lang/rust/issues/62251)
  * [`getrandom` calling the `getrandom` syscall in an invalid way](https://github.com/rust-random/getrandom/pull/73)
  * [`Vec`](https://github.com/rust-lang/rust/issues/69770) and [`BTreeMap`](https://github.com/rust-lang/rust/issues/69769) leaking memory under some (panicky) conditions
-* [Memory leak in `beef`](https://github.com/maciejhirsz/beef/issues/12)
+* [`beef` leaking memory](https://github.com/maciejhirsz/beef/issues/12)
+* [`EbrCell` using uninitialized memory incorrectly](https://github.com/Firstyear/concread/commit/b15be53b6ec076acb295a5c0483cdb4bf9be838f#diff-6282b2fc8e98bd089a1f0c86f648157cR229)
+* [TiKV performing an unaligned pointer access](https://github.com/tikv/tikv/issues/7613)
+* [`servo_arc` creating a dangling shared reference](https://github.com/servo/servo/issues/26357)
+* [TiKV constructing out-of-bounds pointers (and overlapping mutable references)](https://github.com/tikv/tikv/pull/7751)
+* [`encoding_rs` doing out-of-bounds pointer arithmetic](https://github.com/hsivonen/encoding_rs/pull/53)
+* [TiKV using `Vec::from_raw_parts` incorrectly](https://github.com/tikv/agatedb/pull/24)
  
  Violations of [Stacked Borrows] found that are likely bugs (but Stacked Borrows is currently just an experiment):
  
-* [`VecDeque` creating overlapping mutable references](https://github.com/rust-lang/rust/pull/56161)
-* [`BTreeMap` creating mutable references that overlap with shared references](https://github.com/rust-lang/rust/pull/58431)
-* [`LinkedList` creating overlapping mutable references](https://github.com/rust-lang/rust/pull/60072)
+* [`VecDeque::drain` creating overlapping mutable references](https://github.com/rust-lang/rust/pull/56161)
+* Various `BTreeMap` problems
+    * [`BTreeMap` iterators creating mutable references that overlap with shared references](https://github.com/rust-lang/rust/pull/58431)
+    * [`BTreeMap::iter_mut` creating overlapping mutable references](https://github.com/rust-lang/rust/issues/73915)
+    * [`BTreeMap` node insertion using raw pointers outside their valid memory area](https://github.com/rust-lang/rust/issues/78477)
+* [`LinkedList` cursor insertion creating overlapping mutable references](https://github.com/rust-lang/rust/pull/60072)
  * [`Vec::push` invalidating existing references into the vector](https://github.com/rust-lang/rust/issues/60847)
  * [`align_to_mut` violating uniqueness of mutable references](https://github.com/rust-lang/rust/issues/68549)
-* [Aliasing mutable references in `sized-chunks`](https://github.com/bodil/sized-chunks/issues/8)
+* [`sized-chunks` creating aliasing mutable references](https://github.com/bodil/sized-chunks/issues/8)
+* [`String::push_str` invalidating existing references into the string](https://github.com/rust-lang/rust/issues/70301)
+* [`ryu` using raw pointers outside their valid memory area](https://github.com/dtolnay/ryu/issues/24)
+* [ink! creating overlapping mutable references](https://github.com/rust-lang/miri/issues/1364)
+* [TiKV creating overlapping mutable reference and raw pointer](https://github.com/tikv/tikv/pull/7709)
+* [Windows `Env` iterator using a raw pointer outside its valid memory area](https://github.com/rust-lang/rust/pull/70479)
+* [`VecDeque::iter_mut` creating overlapping mutable references](https://github.com/rust-lang/rust/issues/74029)
+* [Various standard library aliasing issues involving raw pointers](https://github.com/rust-lang/rust/pull/78602)
  
  ## License
  
  Licensed under either of
+
    * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or
      http://www.apache.org/licenses/LICENSE-2.0)
    * MIT license ([LICENSE-MIT](LICENSE-MIT) or
-    http://opensource.org/licenses/MIT) at your option.
+    http://opensource.org/licenses/MIT)
+
+at your option.
  
  ### Contribution