1 % Writing Safe Unsafe and Low-Level Code
5 Rust aims to provide safe abstractions over the low-level details of
6 the CPU and operating system, but sometimes one is forced to drop down
7 and write code at that level (those abstractions have to be created
8 somehow). This guide aims to provide an overview of the dangers and
9 power one gets with Rust's unsafe subset.
11 Rust provides an escape hatch in the form of the `unsafe { ... }`
12 block which allows the programmer to dodge some of the compilers
13 checks and do a wide range of operations, such as:
15 - dereferencing [raw pointers](#raw-pointers)
16 - calling a function via FFI ([covered by the FFI guide](guide-ffi.html))
17 - casting between types bitwise (`transmute`, aka "reinterpret cast")
18 - [inline assembly](#inline-assembly)
20 Note that an `unsafe` block does not relax the rules about lifetimes
21 of `&` and the freezing of borrowed data, it just allows the use of
22 additional techniques for skirting the compiler's watchful eye. Any
23 use of `unsafe` is the programmer saying "I know more than you" to the
24 compiler, and, as such, the programmer should be very sure that they
25 actually do know more about why that piece of code is valid.
27 In general, one should try to minimize the amount of unsafe code in a
28 code base; preferably by using the bare minimum `unsafe` blocks to
29 build safe interfaces.
31 > **Note**: the low-level details of the Rust language are still in
32 > flux, and there is no guarantee of stability or backwards
33 > compatibility. In particular, there may be changes that do not cause
34 > compilation errors, but do cause semantic changes (such as invoking
35 > undefined behaviour). As such, extreme care is required.
41 One of Rust's biggest goals as a language is ensuring memory safety,
42 achieved in part via [the lifetime system](guide-lifetimes.html) which
43 every `&` references has associated with it. This system is how the
44 compiler can guarantee that every `&` reference is always valid, and,
45 for example, never pointing to freed memory.
47 These restrictions on `&` have huge advantages. However, there's no
48 free lunch club. For example, `&` isn't a valid replacement for C's
49 pointers, and so cannot be used for FFI, in general. Additionally,
50 both immutable (`&`) and mutable (`&mut`) references have some
51 aliasing and freezing guarantees, required for memory safety.
53 In particular, if you have an `&T` reference, then the `T` must not be
54 modified through that reference or any other reference. There are some
55 standard library types, e.g. `Cell` and `RefCell`, that provide inner
56 mutability by replacing compile time guarantees with dynamic checks at
59 An `&mut` reference has a stronger requirement: when an object has an
60 `&mut T` pointing into it, then that `&mut` reference must be the only
61 such usable path to that object in the whole program. That is, an
62 `&mut` cannot alias with any other references.
64 Using `unsafe` code to incorrectly circumvent and violate these
65 restrictions is undefined behaviour. For example, the following
66 creates two aliasing `&mut` pointers, and is invalid.
72 let ref_1: &mut u8 = &mut x;
73 let ref_2: &mut u8 = unsafe { mem::transmute(&mut *ref_1) };
75 // oops, ref_1 and ref_2 point to the same piece of data (x) and are
83 Rust offers two additional pointer types "raw pointers", written as
84 `*T` and `*mut T`. They're an approximation of C's `const T*` and `T*`
85 respectively; indeed, one of their most common uses is for FFI,
86 interfacing with external C libraries.
88 Raw pointers have much fewer guarantees than other pointer types
89 offered by the Rust language and libraries. For example, they
91 - are not guaranteed to point to valid memory and are not even
92 guaranteed to be non-null (unlike both `Box` and `&`);
93 - do not have any automatic clean-up, unlike `Box`, and so require
94 manual resource management;
95 - are plain-old-data, that is, they don't move ownership, again unlike
96 `Box`, hence the Rust compiler cannot protect against bugs like
98 - are considered sendable (if their contents is considered sendable),
99 so the compiler offers no assistance with ensuring their use is
100 thread-safe; for example, one can concurrently access a `*mut int`
101 from two threads without synchronization.
102 - lack any form of lifetimes, unlike `&`, and so the compiler cannot
103 reason about dangling pointers; and
104 - have no guarantees about aliasing or mutability other than mutation
105 not being allowed directly through a `*T`.
107 Fortunately, they come with a redeeming feature: the weaker guarantees
108 mean weaker restrictions. The missing restrictions make raw pointers
109 appropriate as a building block for (carefully!) implementing things
110 like smart pointers and vectors inside libraries. For example, `*`
111 pointers are allowed to alias, allowing them to be used to write
112 shared-ownership types like reference counted and garbage collected
113 pointers, and even thread-safe shared memory types (`Rc` and the `Arc`
114 types are both implemented entirely in Rust).
116 There are two things that you are required to be careful about
117 (i.e. require an `unsafe { ... }` block) with raw pointers:
119 - dereferencing: they can have any value: so possible results include
120 a crash, a read of uninitialised memory, a use-after-free, or
121 reading data as normal (and one hopes happens).
122 - pointer arithmetic via the `offset` [intrinsic](#intrinsics) (or
123 `.offset` method): this intrinsic uses so-called "in-bounds"
124 arithmetic, that is, it is only defined behaviour if the result is
125 inside (or one-byte-past-the-end) of the object from which the
126 original pointer came.
128 The latter assumption allows the compiler to optimize more
129 effectively. As can be seen, actually *creating* a raw pointer is not
130 unsafe, and neither is converting to an integer.
132 ### References and raw pointers
134 At runtime, a raw pointer `*` and a reference pointing to the same
135 piece of data have an identical representation. In fact, an `&T`
136 reference will implicitly coerce to an `*T` raw pointer in safe code
137 and similarly for the `mut` variants (both coercions can be performed
138 explicitly with, respectively, `value as *T` and `value as *mut T`).
140 Going the opposite direction, from `*` to a reference `&`, is not
141 safe. A `&T` is always valid, and so, at a minimum, the raw pointer
142 `*T` has to be a valid to a valid instance of type `T`. Furthermore,
143 the resulting pointer must satisfy the aliasing and mutability laws of
144 references. The compiler assumes these properties are true for any
145 references, no matter how they are created, and so any conversion from
146 raw pointers is asserting that they hold. The programmer *must*
149 The recommended method for the conversion is
154 let p_imm: *u32 = &i as *u32;
157 let p_mut: *mut u32 = &mut m;
160 let ref_imm: &u32 = &*p_imm;
161 let ref_mut: &mut u32 = &mut *p_mut;
165 The `&*x` dereferencing style is preferred to using a `transmute`.
166 The latter is far more powerful than necessary, and the more
167 restricted operation is harder to use incorrectly; for example, it
168 requires that `x` is a pointer (unlike `transmute`).
172 ## Making the unsafe safe(r)
174 There are various ways to expose a safe interface around some unsafe
177 - store pointers privately (i.e. not in public fields of public
178 structs), so that you can see and control all reads and writes to
179 the pointer in one place.
180 - use `assert!()` a lot: once you've thrown away the protection of the
181 compiler & type-system via `unsafe { ... }` you're left with just
182 your wits and your `assert!()`s, any bug is potentially exploitable.
183 - implement the `Drop` for resource clean-up via a destructor, and use
184 RAII (Resource Acquisition Is Initialization). This reduces the need
185 for any manual memory management by users, and automatically ensures
186 that clean-up is always run, even when the task fails.
187 - ensure that any data stored behind a raw pointer is destroyed at the
190 As an example, we give a reimplementation of owned boxes by wrapping
191 `malloc` and `free`. Rust's move semantics and lifetimes mean this
192 reimplementation is as safe as the `Box` type.
196 use libc::{c_void, size_t, malloc, free};
200 // Define a wrapper around the handle returned by the foreign code.
201 // Unique<T> has the same semantics as Box<T>
202 pub struct Unique<T> {
203 // It contains a single raw, mutable pointer to the object in question.
207 // Implement methods for creating and using the values in the box.
209 // NB: For simplicity and correctness, we require that T has kind Send
210 // (owned boxes relax this restriction, and can contain managed (GC) boxes).
211 // This is because, as implemented, the garbage collector would not know
212 // about any shared boxes stored in the malloc'd region of memory.
213 impl<T: Send> Unique<T> {
214 pub fn new(value: T) -> Unique<T> {
216 let ptr = malloc(std::mem::size_of::<T>() as size_t) as *mut T;
217 // we *need* valid pointer.
218 assert!(!ptr.is_null());
219 // `*ptr` is uninitialized, and `*ptr = value` would
220 // attempt to destroy it `overwrite` moves a value into
221 // this memory without attempting to drop the original
223 mem::overwrite(&mut *ptr, value);
228 // the 'r lifetime results in the same semantics as `&*x` with
230 pub fn borrow<'r>(&'r self) -> &'r T {
231 // By construction, self.ptr is valid
232 unsafe { &*self.ptr }
235 // the 'r lifetime results in the same semantics as `&mut *x` with
237 pub fn borrow_mut<'r>(&'r mut self) -> &'r mut T {
238 unsafe { &mut *self.ptr }
242 // A key ingredient for safety, we associate a destructor with
243 // Unique<T>, making the struct manage the raw pointer: when the
244 // struct goes out of scope, it will automatically free the raw pointer.
245 // NB: This is an unsafe destructor, because rustc will not normally
246 // allow destructors to be associated with parametrized types, due to
247 // bad interaction with managed boxes. (With the Send restriction,
248 // we don't have this problem.)
250 impl<T: Send> Drop for Unique<T> {
253 // Copy the object out from the pointer onto the stack,
254 // where it is covered by normal Rust destructor semantics
255 // and cleans itself up, if necessary
256 ptr::read(self.ptr as *T);
258 // clean-up our allocation
259 free(self.ptr as *mut c_void)
264 // A comparison between the built-in `Box` and this reimplementation
269 } // `x` is freed here
272 let mut y = Unique::new(5);
273 *y.borrow_mut() = 10;
274 } // `y` is freed here
278 Notably, the only way to construct a `Unique` is via the `new`
279 function, and this function ensures that the internal pointer is valid
280 and hidden in the private field. The two `borrow` methods are safe
281 because the compiler statically guarantees that objects are never used
282 before creation or after destruction (unless you use some `unsafe`
287 For extremely low-level manipulations and performance reasons, one
288 might wish to control the CPU directly. Rust supports using inline
289 assembly to do this via the `asm!` macro. The syntax roughly matches
293 asm!(assembly template
301 Any use of `asm` is feature gated (requires `#![feature(asm)]` on the
302 crate to allow) and of course requires an `unsafe` block.
304 > **Note**: the examples here are given in x86/x86-64 assembly, but all
305 > platforms are supported.
309 The `assembly template` is the only required parameter and must be a
310 literal string (i.e `""`)
315 #[cfg(target_arch = "x86")]
316 #[cfg(target_arch = "x86_64")]
324 #[cfg(not(target_arch = "x86"),
325 not(target_arch = "x86_64"))]
326 fn foo() { /* ... */ }
335 (The `feature(asm)` and `#[cfg]`s are omitted from now on.)
337 Output operands, input operands, clobbers and options are all optional
338 but you must add the right number of `:` if you skip them:
342 # #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
343 # fn main() { unsafe {
344 asm!("xor %eax, %eax"
352 Whitespace also doesn't matter:
356 # #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
357 # fn main() { unsafe {
358 asm!("xor %eax, %eax" ::: "eax");
364 Input and output operands follow the same format: `:
365 "constraints1"(expr1), "constraints2"(expr2), ..."`. Output operand
366 expressions must be mutable lvalues:
370 # #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
371 fn add(a: int, b: int) -> int {
381 # #[cfg(not(target_arch = "x86"), not(target_arch = "x86_64"))]
382 # fn add(a: int, b: int) -> int { a + b }
385 assert_eq!(add(3, 14159), 14162)
391 Some instructions modify registers which might otherwise have held
392 different values so we use the clobbers list to indicate to the
393 compiler not to assume any values loaded into those registers will
398 # #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
399 # fn main() { unsafe {
400 // Put the value 0x200 in eax
401 asm!("mov $$0x200, %eax" : /* no outputs */ : /* no inputs */ : "eax");
405 Input and output registers need not be listed since that information
406 is already communicated by the given constraints. Otherwise, any other
407 registers used either implicitly or explicitly should be listed.
409 If the assembly changes the condition code register `cc` should be
410 specified as one of the clobbers. Similarly, if the assembly modifies
411 memory, `memory` should also be specified.
415 The last section, `options` is specific to Rust. The format is comma
416 separated literal strings (i.e `:"foo", "bar", "baz"`). It's used to
417 specify some extra info about the inline assembly:
419 Current valid options are:
421 1. **volatile** - specifying this is analogous to `__asm__ __volatile__ (...)` in gcc/clang.
422 2. **alignstack** - certain instructions expect the stack to be
423 aligned a certain way (i.e SSE) and specifying this indicates to
424 the compiler to insert its usual stack alignment code
425 3. **intel** - use intel syntax instead of the default AT&T.
427 # Avoiding the standard library
429 By default, `std` is linked to every Rust crate. In some contexts,
430 this is undesirable, and can be avoided with the `#![no_std]`
431 attribute attached to the crate.
437 # // fn main() {} tricked you, rustdoc!
440 Obviously there's more to life than just libraries: one can use
441 `#[no_std]` with an executable, controlling the entry point is
442 possible in two ways: the `#[start]` attribute, or overriding the
443 default shim for the C `main` function with your own.
445 The function marked `#[start]` is passed the command line parameters
446 in the same format as a C:
451 // Pull in the system libc library for what crt0.o likely requires
454 // Entry point for this program
456 fn start(_argc: int, _argv: **u8) -> int {
460 // These functions are invoked by the compiler, but not
461 // for a bare-bones hello world. These are normally
462 // provided by libstd.
463 #[lang = "stack_exhausted"] extern fn stack_exhausted() {}
464 #[lang = "eh_personality"] extern fn eh_personality() {}
465 # // fn main() {} tricked you, rustdoc!
468 To override the compiler-inserted `main` shim, one has to disable it
469 with `#![no_main]` and then create the appropriate symbol with the
470 correct ABI and the correct name, which requires overriding the
471 compiler's name mangling too:
479 #[no_mangle] // ensure that this symbol is called `main` in the output
480 pub extern fn main(argc: int, argv: **u8) -> int {
484 #[lang = "stack_exhausted"] extern fn stack_exhausted() {}
485 #[lang = "eh_personality"] extern fn eh_personality() {}
486 # // fn main() {} tricked you, rustdoc!
490 The compiler currently makes a few assumptions about symbols which are available
491 in the executable to call. Normally these functions are provided by the standard
492 library, but without it you must define your own.
494 The first of these two functions, `stack_exhausted`, is invoked whenever stack
495 overflow is detected. This function has a number of restrictions about how it
496 can be called and what it must do, but if the stack limit register is not being
497 maintained then a task always has an "infinite stack" and this function
498 shouldn't get triggered.
500 The second of these two functions, `eh_personality`, is used by the failure
501 mechanisms of the compiler. This is often mapped to GCC's personality function
502 (see the [libstd implementation](std/rt/unwind/index.html) for more
503 information), but crates which do not trigger failure can be assured that this
504 function is never called.
508 > **Note**: the core library's structure is unstable, and it is recommended to
509 > use the standard library instead wherever possible.
511 With the above techniques, we've got a bare-metal executable running some Rust
512 code. There is a good deal of functionality provided by the standard library,
513 however, that is necessary to be productive in Rust. If the standard library is
514 not sufficient, then [libcore](core/index.html) is designed to be used
517 The core library has very few dependencies and is much more portable than the
518 standard library itself. Additionally, the core library has most of the
519 necessary functionality for writing idiomatic and effective Rust code.
521 As an example, here is a program that will calculate the dot product of two
522 vectors provided from C, using idiomatic Rust practices.
530 use core::prelude::*;
533 use core::raw::Slice;
536 pub extern fn dot_product(a: *u32, a_len: u32,
537 b: *u32, b_len: u32) -> u32 {
538 // Convert the provided arrays into Rust slices.
539 // The core::raw module guarantees that the Slice
540 // structure has the same memory layout as a &[T]
543 // This is an unsafe operation because the compiler
544 // cannot tell the pointers are valid.
545 let (a_slice, b_slice): (&[u32], &[u32]) = unsafe {
547 Slice { data: a, len: a_len as uint },
548 Slice { data: b, len: b_len as uint },
552 // Iterate over the slices, collecting the result
554 for (i, j) in a_slice.iter().zip(b_slice.iter()) {
560 #[lang = "begin_unwind"]
561 extern fn begin_unwind(args: &core::fmt::Arguments,
567 #[lang = "stack_exhausted"] extern fn stack_exhausted() {}
568 #[lang = "eh_personality"] extern fn eh_personality() {}
569 # #[start] fn start(argc: int, argv: **u8) -> int { 0 }
573 Note that there is one extra lang item here which differs from the examples
574 above, `begin_unwind`. This must be defined by consumers of libcore because the
575 core library declares failure, but it does not define it. The `begin_unwind`
576 lang item is this crate's definition of failure, and it must be guaranteed to
579 As can be seen in this example, the core library is intended to provide the
580 power of Rust in all circumstances, regardless of platform requirements. Further
581 libraries, such as liballoc, add functionality to libcore which make other
582 platform-specific assumptions, but continue to be more portable than the
583 standard library itself.
585 # Interacting with the compiler internals
587 > **Note**: this section is specific to the `rustc` compiler; these
588 > parts of the language may never be full specified and so details may
589 > differ wildly between implementations (and even versions of `rustc`
592 > Furthermore, this is just an overview; the best form of
593 > documentation for specific instances of these features are their
594 > definitions and uses in `std`.
596 The Rust language currently has two orthogonal mechanisms for allowing
597 libraries to interact directly with the compiler and vice versa:
599 - intrinsics, functions built directly into the compiler providing
600 very basic low-level functionality,
601 - lang-items, special functions, types and traits in libraries marked
602 with specific `#[lang]` attributes
606 > **Note**: intrinsics will forever have an unstable interface, it is
607 > recommended to use the stable interfaces of libcore rather than intrinsics
610 These are imported as if they were FFI functions, with the special
611 `rust-intrinsic` ABI. For example, if one was in a freestanding
612 context, but wished to be able to `transmute` between types, and
613 perform efficient pointer arithmetic, one would import those functions
614 via a declaration like
617 extern "rust-intrinsic" {
618 fn transmute<T, U>(x: T) -> U;
620 fn offset<T>(dst: *T, offset: int) -> *T;
624 As with any other FFI functions, these are always `unsafe` to call.
628 > **Note**: lang items are often provided by crates in the Rust distribution,
629 > and lang items themselves have an unstable interface. It is recommended to use
630 > officially distributed crates instead of defining your own lang items.
632 The `rustc` compiler has certain pluggable operations, that is,
633 functionality that isn't hard-coded into the language, but is
634 implemented in libraries, with a special marker to tell the compiler
635 it exists. The marker is the attribute `#[lang="..."]` and there are
636 various different values of `...`, i.e. various different "lang
639 For example, `Box` pointers require two lang items, one for allocation
640 and one for deallocation. A freestanding program that uses the `Box`
641 sugar for dynamic allocations via `malloc` and `free`:
652 #[lang="exchange_malloc"]
653 unsafe fn allocate(size: uint, _align: uint) -> *mut u8 {
654 let p = libc::malloc(size as libc::size_t) as *mut u8;
663 #[lang="exchange_free"]
664 unsafe fn deallocate(ptr: *mut u8, _size: uint, _align: uint) {
665 libc::free(ptr as *mut libc::c_void)
669 fn main(argc: int, argv: **u8) -> int {
675 #[lang = "stack_exhausted"] extern fn stack_exhausted() {}
676 #[lang = "eh_personality"] extern fn eh_personality() {}
679 Note the use of `abort`: the `exchange_malloc` lang item is assumed to
680 return a valid pointer, and so needs to do the check
683 Other features provided by lang items include:
685 - overloadable operators via traits: the traits corresponding to the
686 `==`, `<`, dereferencing (`*`) and `+` (etc.) operators are all
687 marked with lang items; those specific four are `eq`, `ord`,
688 `deref`, and `add` respectively.
689 - stack unwinding and general failure; the `eh_personality`, `fail_`
690 and `fail_bounds_checks` lang items.
691 - the traits in `std::kinds` used to indicate types that satisfy
692 various kinds; lang items `send`, `share` and `copy`.
693 - the marker types and variance indicators found in
694 `std::kinds::markers`; lang items `covariant_type`,
695 `contravariant_lifetime`, `no_share_bound`, etc.
697 Lang items are loaded lazily by the compiler; e.g. if one never uses
698 `Box` then there is no need to define functions for `exchange_malloc`
699 and `exchange_free`. `rustc` will emit an error when an item is needed
700 but not found in the current crate or any that it depends on.