--- /dev/null
+% Writing Safe Unsafe and Low-Level Code
+
+# Introduction
+
+Rust aims to provide safe abstractions over the low-level details of
+the CPU and operating system, but sometimes one is forced to drop down
+and write code at that level (those abstractions have to be created
+somehow). This guide aims to provide an overview of the dangers and
+power one gets with Rust's unsafe subset.
+
+Rust provides an escape hatch in the form of the `unsafe { ... }`
+block which allows the programmer to dodge some of the compilers
+checks and do a wide range of operations, such as:
+
+- dereferencing [raw pointers](#raw-pointers)
+- calling a function via FFI ([covered by the FFI guide](guide-ffi.html))
+- casting between types bitwise (`transmute`, aka "reinterpret cast")
+- [inline assembly](#inline-assembly)
+
+Note that an `unsafe` block does not relax the rules about lifetimes
+of `&` and the freezing of borrowed data, it just allows the use of
+additional techniques for skirting the compiler's watchful eye. Any
+use of `unsafe` is the programmer saying "I know more than you" to the
+compiler, and, as such, the programmer should be very sure that they
+actually do know more about why that piece of code is valid.
+
+In general, one should try to minimize the amount of unsafe code in a
+code base; preferably by using the bare minimum `unsafe` blocks to
+build safe interfaces.
+
+> **Note**: the low-level details of the Rust language are still in
+> flux, and there is no guarantee of stability or backwards
+> compatibility. In particular, there may be changes that do not cause
+> compilation errors, but do cause semantic changes (such as invoking
+> undefined behaviour). As such, extreme care is required.
+
+# Pointers
+
+## References
+
+One of Rust's biggest goals as a language is ensuring memory safety,
+achieved in part via [the lifetime system](guide-lifetimes.html) which
+every `&` references has associated with it. This system is how the
+compiler can guarantee that every `&` reference is always valid, and,
+for example, never pointing to freed memory.
+
+These restrictions on `&` have huge advantages. However, there's no
+free lunch club. For example, `&` isn't a valid replacement for C's
+pointers, and so cannot be used for FFI, in general. Additionally,
+both immutable (`&`) and mutable (`&mut`) references have some
+aliasing and freezing guarantees, required for memory safety.
+
+In particular, if you have an `&T` reference, then the `T` must not be
+modified through that reference or any other reference. There are some
+standard library types, e.g. `Cell` and `RefCell`, that provide inner
+mutability by replacing compile time guarantees with dynamic checks at
+runtime.
+
+An `&mut` reference has a stronger requirement: when a object has an
+`&mut T` pointing into it, then that `&mut` reference must be the only
+such usable path to that object in the whole program. That is, an
+`&mut` cannot alias with any other references.
+
+Using `unsafe` code to incorrectly circumvent and violate these
+restrictions is undefined behaviour. For example, the following
+creates two aliasing `&mut` pointers, and is invalid.
+
+```
+use std::cast;
+let mut x: u8 = 1;
+
+let ref_1: &mut u8 = &mut x;
+let ref_2: &mut u8 = unsafe { cast::transmute_mut_region(ref_1) };
+
+// oops, ref_1 and ref_2 point to the same piece of data (x) and are
+// both usable
+*ref_1 = 10;
+*ref_2 = 20;
+```
+
+## Raw pointers
+
+Rust offers two additional pointer types "raw pointers", written as
+`*T` and `*mut T`. They're an approximation of C's `const T*` and `T*`
+respectively; indeed, one of their most common uses is for FFI,
+interfacing with external C libraries.
+
+Raw pointers have much fewer guarantees than other pointer types
+offered by the Rust language and libraries. For example, they
+
+- are not guaranteed to point to valid memory and are not even
+ guaranteed to be non-null (unlike both `~` and `&`);
+- do not have any automatic clean-up, unlike `~`, and so require
+ manual resource management;
+- are plain-old-data, that is, they don't move ownership, again unlike
+ `~`, hence the Rust compiler cannot protect against bugs like
+ use-after-free;
+- are considered sendable (if their contents is considered sendable),
+ so the compiler offers no assistance with ensuring their use is
+ thread-safe; for example, one can concurrently access a `*mut int`
+ from two threads without synchronization.
+- lack any form of lifetimes, unlike `&`, and so the compiler cannot
+ reason about dangling pointers; and
+- have no guarantees about aliasing or mutability other than mutation
+ not being allowed directly through a `*T`.
+
+Fortunately, they come with a redeeming feature: the weaker guarantees
+mean weaker restrictions. The missing restrictions make raw pointers
+appropriate as a building block for (carefully!) implementing things
+like smart pointers and vectors inside libraries. For example, `*`
+pointers are allowed to alias, allowing them to be used to write
+shared-ownership types like reference counted and garbage collected
+pointers, and even thread-safe shared memory types (`Rc` and the `Arc`
+types are both implemented entirely in Rust).
+
+There are two things that you are required to be careful about
+(i.e. require an `unsafe { ... }` block) with raw pointers:
+
+- dereferencing: they can have any value: so possible results include
+ a crash, a read of uninitialised memory, a use-after-free, or
+ reading data as normal (and one hopes happens).
+- pointer arithmetic via the `offset` [intrinsic](#intrinsics) (or
+ `.offset` method): this intrinsic uses so-called "in-bounds"
+ arithmetic, that is, it is only defined behaviour if the result is
+ inside (or one-byte-past-the-end) of the object from which the
+ original pointer came.
+
+The latter assumption allows the compiler to optimize more
+effectively. As can be seen, actually *creating* a raw pointer is not
+unsafe, and neither is converting to an integer.
+
+### References and raw pointers
+
+At runtime, a raw pointer `*` and a reference pointing to the same
+piece of data have an identical representation. In fact, an `&T`
+reference will implicitly coerce to an `*T` raw pointer in safe code
+and similarly for the `mut` variants (both coercions can be performed
+explicitly with, respectively, `value as *T` and `value as *mut T`).
+
+Going the opposite direction, from `*` to a reference `&`, is not
+safe. A `&T` is always valid, and so, at a minimum, the raw pointer
+`*T` has to be a valid to a valid instance of type `T`. Furthermore,
+the resulting pointer must satisfy the aliasing and mutability laws of
+references. The compiler assumes these properties are true for any
+references, no matter how they are created, and so any conversion from
+raw pointers is asserting that they hold. The programmer *must*
+guarantee this.
+
+The recommended method for the conversion is
+
+```
+let i: u32 = 1;
+// explicit cast
+let p_imm: *u32 = &i as *u32;
+let mut m: u32 = 2;
+// implicit coercion
+let p_mut: *mut u32 = &mut m;
+
+unsafe {
+ let ref_imm: &u32 = &*p_imm;
+ let ref_mut: &mut u32 = &mut *p_mut;
+}
+```
+
+The `&*x` dereferencing style is preferred to using a `transmute`.
+The latter is far more powerful than necessary, and the more
+restricted operation is harder to use incorrectly; for example, it
+requires that `x` is a pointer (unlike `transmute`).
+
+
+
+## Making the unsafe safe(r)
+
+There are various ways to expose a safe interface around some unsafe
+code:
+
+- store pointers privately (i.e. not in public fields of public
+ structs), so that you can see and control all reads and writes to
+ the pointer in one place.
+- use `assert!()` a lot: once you've thrown away the protection of the
+ compiler & type-system via `unsafe { ... }` you're left with just
+ your wits and your `assert!()`s, any bug is potentially exploitable.
+- implement the `Drop` for resource clean-up via a destructor, and use
+ RAII (Resource Acquisition Is Initialization). This reduces the need
+ for any manual memory management by users, and automatically ensures
+ that clean-up is always run, even when the task fails.
+- ensure that any data stored behind a raw pointer is destroyed at the
+ appropriate time.
+
+As an example, we give a reimplementation of owned boxes by wrapping
+`malloc` and `free`. Rust's move semantics and lifetimes mean this
+reimplementation is as safe as the built-in `~` type.
+
+```
+use std::libc::{c_void, size_t, malloc, free};
+use std::mem;
+use std::ptr;
+
+// Define a wrapper around the handle returned by the foreign code.
+// Unique<T> has the same semantics as ~T
+pub struct Unique<T> {
+ // It contains a single raw, mutable pointer to the object in question.
+ priv ptr: *mut T
+}
+
+// Implement methods for creating and using the values in the box.
+// NB: For simplicity and correctness, we require that T has kind Send
+// (owned boxes relax this restriction, and can contain managed (GC) boxes).
+// This is because, as implemented, the garbage collector would not know
+// about any shared boxes stored in the malloc'd region of memory.
+impl<T: Send> Unique<T> {
+ pub fn new(value: T) -> Unique<T> {
+ unsafe {
+ let ptr = malloc(std::mem::size_of::<T>() as size_t) as *mut T;
+ // we *need* valid pointer.
+ assert!(!ptr.is_null());
+ // `*ptr` is uninitialized, and `*ptr = value` would attempt to destroy it
+ // move_val_init moves a value into this memory without
+ // attempting to drop the original value.
+ mem::move_val_init(&mut *ptr, value);
+ Unique{ptr: ptr}
+ }
+ }
+
+ // the 'r lifetime results in the same semantics as `&*x` with ~T
+ pub fn borrow<'r>(&'r self) -> &'r T {
+ // By construction, self.ptr is valid
+ unsafe { &*self.ptr }
+ }
+
+ // the 'r lifetime results in the same semantics as `&mut *x` with ~T
+ pub fn borrow_mut<'r>(&'r mut self) -> &'r mut T {
+ unsafe { &mut*self.ptr }
+ }
+}
+
+// A key ingredient for safety, we associate a destructor with
+// Unique<T>, making the struct manage the raw pointer: when the
+// struct goes out of scope, it will automatically free the raw pointer.
+// NB: This is an unsafe destructor, because rustc will not normally
+// allow destructors to be associated with parametrized types, due to
+// bad interaction with managed boxes. (With the Send restriction,
+// we don't have this problem.)
+#[unsafe_destructor]
+impl<T: Send> Drop for Unique<T> {
+ fn drop(&mut self) {
+ unsafe {
+
+ // Copy the object out from the pointer onto the stack,
+ // where it is covered by normal Rust destructor semantics
+ // and cleans itself up, if necessary
+ ptr::read(self.ptr as *T);
+
+ // clean-up our allocation
+ free(self.ptr as *mut c_void)
+ }
+ }
+}
+
+// A comparison between the built-in ~ and this reimplementation
+fn main() {
+ {
+ let mut x = ~5;
+ *x = 10;
+ } // `x` is freed here
+
+ {
+ let mut y = Unique::new(5);
+ *y.borrow_mut() = 10;
+ } // `y` is freed here
+}
+```
+
+Notably, the only way to construct a `Unique` is via the `new`
+function, and this function ensures that the internal pointer is valid
+and hidden in the private field. The two `borrow` methods are safe
+because the compiler statically guarantees that objects are never used
+before creation or after destruction (unless you use some `unsafe`
+code...).
+
+# Inline assembly
+
+For extremely low-level manipulations and performance reasons, one
+might wish to control the CPU directly. Rust supports using inline
+assembly to do this via the `asm!` macro. The syntax roughly matches
+that of GCC & Clang:
+
+```ignore
+asm!(assembly template
+ : output operands
+ : input operands
+ : clobbers
+ : options
+ );
+```
+
+Any use of `asm` is feature gated (requires `#[feature(asm)];` on the
+crate to allow) and of course requires an `unsafe` block.
+
+> **Note**: the examples here are given in x86/x86-64 assembly, but all
+> platforms are supported.
+
+## Assembly template
+
+The `assembly template` is the only required parameter and must be a
+literal string (i.e `""`)
+
+```
+#[feature(asm)];
+
+#[cfg(target_arch = "x86")]
+#[cfg(target_arch = "x86_64")]
+fn foo() {
+ unsafe {
+ asm!("NOP");
+ }
+}
+
+// other platforms
+#[cfg(not(target_arch = "x86"),
+ not(target_arch = "x86_64"))]
+fn foo() { /* ... */ }
+
+fn main() {
+ // ...
+ foo();
+ // ...
+}
+```
+
+(The `feature(asm)` and `#[cfg]`s are omitted from now on.)
+
+Output operands, input operands, clobbers and options are all optional
+but you must add the right number of `:` if you skip them:
+
+```
+# #[feature(asm)];
+# #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
+# fn main() { unsafe {
+asm!("xor %eax, %eax"
+ :
+ :
+ : "eax"
+ );
+# } }
+```
+
+Whitespace also doesn't matter:
+
+```
+# #[feature(asm)];
+# #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
+# fn main() { unsafe {
+asm!("xor %eax, %eax" ::: "eax");
+# } }
+```
+
+## Operands
+
+Input and output operands follow the same format: `:
+"constraints1"(expr1), "constraints2"(expr2), ..."`. Output operand
+expressions must be mutable lvalues:
+
+```
+# #[feature(asm)];
+# #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
+fn add(a: int, b: int) -> int {
+ let mut c = 0;
+ unsafe {
+ asm!("add $2, $0"
+ : "=r"(c)
+ : "0"(a), "r"(b)
+ );
+ }
+ c
+}
+# #[cfg(not(target_arch = "x86"), not(target_arch = "x86_64"))]
+# fn add(a: int, b: int) -> int { a + b }
+
+fn main() {
+ assert_eq!(add(3, 14159), 14162)
+}
+```
+
+## Clobbers
+
+Some instructions modify registers which might otherwise have held
+different values so we use the clobbers list to indicate to the
+compiler not to assume any values loaded into those registers will
+stay valid.
+
+```
+# #[feature(asm)];
+# #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
+# fn main() { unsafe {
+// Put the value 0x200 in eax
+asm!("mov $$0x200, %eax" : /* no outputs */ : /* no inputs */ : "eax");
+# } }
+```
+
+Input and output registers need not be listed since that information
+is already communicated by the given constraints. Otherwise, any other
+registers used either implicitly or explicitly should be listed.
+
+If the assembly changes the condition code register `cc` should be
+specified as one of the clobbers. Similarly, if the assembly modifies
+memory, `memory` should also be specified.
+
+## Options
+
+The last section, `options` is specific to Rust. The format is comma
+separated literal strings (i.e `:"foo", "bar", "baz"`). It's used to
+specify some extra info about the inline assembly:
+
+Current valid options are:
+
+1. **volatile** - specifying this is analogous to `__asm__ __volatile__ (...)` in gcc/clang.
+2. **alignstack** - certain instructions expect the stack to be
+ aligned a certain way (i.e SSE) and specifying this indicates to
+ the compiler to insert its usual stack alignment code
+3. **intel** - use intel syntax instead of the default AT&T.
+
+# Avoiding the standard library
+
+By default, `std` is linked to every Rust crate. In some contexts,
+this is undesirable, and can be avoided with the `#[no_std];`
+attribute attached to the crate.
+
+```ignore
+# // FIXME #12903: linking failures due to no_std
+// the minimal library
+#[crate_type="lib"];
+#[no_std];
+
+# // fn main() {} tricked you, rustdoc!
+```
+
+Obviously there's more to life than just libraries: one can use
+`#[no_std]` with an executable, controlling the entry point is
+possible in two ways: the `#[start]` attribute, or overriding the
+default shim for the C `main` function with your own.
+
+The function marked `#[start]` is passed the command line parameters
+in the same format as a C:
+
+```ignore
+# // FIXME #12903: linking failures due to no_std
+#[no_std];
+
+extern "rust-intrinsic" { fn abort() -> !; }
+#[no_mangle] pub extern fn rust_stack_exhausted() {
+ unsafe { abort() }
+}
+
+#[start]
+fn start(_argc: int, _argv: **u8) -> int {
+ 0
+}
+
+# // fn main() {} tricked you, rustdoc!
+```
+
+To override the compiler-inserted `main` shim, one has to disable it
+with `#[no_main];` and then create the appropriate symbol with the
+correct ABI and the correct name, which requires overriding the
+compiler's name mangling too:
+
+```ignore
+# // FIXME #12903: linking failures due to no_std
+#[no_std];
+#[no_main];
+
+extern "rust-intrinsic" { fn abort() -> !; }
+#[no_mangle] pub extern fn rust_stack_exhausted() {
+ unsafe { abort() }
+}
+
+#[no_mangle] // ensure that this symbol is called `main` in the output
+extern "C" fn main(_argc: int, _argv: **u8) -> int {
+ 0
+}
+
+# // fn main() {} tricked you, rustdoc!
+```
+
+
+Unfortunately the Rust compiler assumes that symbols with certain
+names exist; and these have to be defined (or linked in). This is the
+purpose of the `rust_stack_exhausted`: it is called when a function
+detects that it will overflow its stack. The example above uses the
+`abort` intrinsic which ensures that execution halts.
+
+# Interacting with the compiler internals
+
+> **Note**: this section is specific to the `rustc` compiler; these
+> parts of the language may never be full specified and so details may
+> differ wildly between implementations (and even versions of `rustc`
+> itself).
+>
+> Furthermore, this is just an overview; the best form of
+> documentation for specific instances of these features are their
+> definitions and uses in `std`.
+
+The Rust language currently has two orthogonal mechanisms for allowing
+libraries to interact directly with the compiler and vice versa:
+
+- intrinsics, functions built directly into the compiler providing
+ very basic low-level functionality,
+- lang-items, special functions, types and traits in libraries marked
+ with specific `#[lang]` attributes
+
+## Intrinsics
+
+These are imported as if they were FFI functions, with the special
+`rust-intrinsic` ABI. For example, if one was in a freestanding
+context, but wished to be able to `transmute` between types, and
+perform efficient pointer arithmetic, one would import those functions
+via a declaration like
+
+```
+extern "rust-intrinsic" {
+ fn transmute<T, U>(x: T) -> U;
+
+ fn offset<T>(dst: *T, offset: int) -> *T;
+}
+```
+
+As with any other FFI functions, these are always `unsafe` to call.
+
+## Lang items
+
+The `rustc` compiler has certain pluggable operations, that is,
+functionality that isn't hard-coded into the language, but is
+implemented in libraries, with a special marker to tell the compiler
+it exists. The marker is the attribute `#[lang="..."]` and there are
+various different values of `...`, i.e. various different "lang
+items".
+
+For example, `~` pointers require two lang items, one for allocation
+and one for deallocation. A freestanding program that uses the `~`
+sugar for dynamic allocations via `malloc` and `free`:
+
+```ignore
+# // FIXME #12903: linking failures due to no_std
+#[no_std];
+
+#[allow(ctypes)] // `uint` == `size_t` on Rust's platforms
+extern {
+ fn malloc(size: uint) -> *mut u8;
+ fn free(ptr: *mut u8);
+
+ fn abort() -> !;
+}
+
+#[no_mangle] pub extern fn rust_stack_exhausted() {
+ unsafe { abort() }
+}
+
+#[lang="exchange_malloc"]
+unsafe fn allocate(size: uint) -> *mut u8 {
+ let p = malloc(size);
+
+ // malloc failed
+ if p as uint == 0 {
+ abort();
+ }
+
+ p
+}
+#[lang="exchange_free"]
+unsafe fn deallocate(ptr: *mut u8) {
+ free(ptr)
+}
+
+#[start]
+fn main(_argc: int, _argv: **u8) -> int {
+ let _x = ~1;
+
+ 0
+}
+
+# // fn main() {} tricked you, rustdoc!
+```
+
+Note the use of `abort`: the `exchange_malloc` lang item is assumed to
+return a valid pointer, and so needs to do the check
+internally.
+
+Other features provided by lang items include:
+
+- overloadable operators via traits: the traits corresponding to the
+ `==`, `<`, dereferencing (`*`) and `+` (etc.) operators are all
+ marked with lang items; those specific four are `eq`, `ord`,
+ `deref`, and `add` respectively.
+- stack unwinding and general failure; the `eh_personality`, `fail_`
+ and `fail_bounds_checks` lang items.
+- the traits in `std::kinds` used to indicate types that satisfy
+ various kinds; lang items `send`, `freeze` and `pod`.
+- the marker types and variance indicators found in
+ `std::kinds::markers`; lang items `covariant_type`,
+ `contravariant_lifetime`, `no_freeze_bound`, etc.
+
+Lang items are loaded lazily by the compiler; e.g. if one never uses
+`~` then there is no need to define functions for `exchange_malloc`
+and `exchange_free`. `rustc` will emit an error when an item is needed
+but not found in the current crate or any that it depends on.