1 % Writing Unsafe and Low-Level Code in Rust
5 Rust aims to provide safe abstractions over the low-level details of
6 the CPU and operating system, but sometimes one needs to drop down and
7 write code at that level. This guide aims to provide an overview of
8 the dangers and power one gets with Rust's unsafe subset.
10 Rust provides an escape hatch in the form of the `unsafe { ... }`
11 block which allows the programmer to dodge some of the compiler's
12 checks and do a wide range of operations, such as:
14 - dereferencing [raw pointers](#raw-pointers)
15 - calling a function via FFI ([covered by the FFI guide](guide-ffi.html))
16 - casting between types bitwise (`transmute`, aka "reinterpret cast")
17 - [inline assembly](#inline-assembly)
19 Note that an `unsafe` block does not relax the rules about lifetimes
20 of `&` and the freezing of borrowed data.
22 Any use of `unsafe` is the programmer saying "I know more than you" to
23 the compiler, and, as such, the programmer should be very sure that
24 they actually do know more about why that piece of code is valid. In
25 general, one should try to minimize the amount of unsafe code in a
26 code base; preferably by using the bare minimum `unsafe` blocks to
27 build safe interfaces.
29 > **Note**: the low-level details of the Rust language are still in
30 > flux, and there is no guarantee of stability or backwards
31 > compatibility. In particular, there may be changes that do not cause
32 > compilation errors, but do cause semantic changes (such as invoking
33 > undefined behaviour). As such, extreme care is required.
39 One of Rust's biggest features is memory safety. This is achieved in
40 part via [the lifetime system](guide-lifetimes.html), which is how the
41 compiler can guarantee that every `&` reference is always valid, and,
42 for example, never pointing to freed memory.
44 These restrictions on `&` have huge advantages. However, they also
45 constrain how we can use them. For example, `&` doesn't behave
46 identically to C's pointers, and so cannot be used for pointers in
47 foreign function interfaces (FFI). Additionally, both immutable (`&`)
48 and mutable (`&mut`) references have some aliasing and freezing
49 guarantees, required for memory safety.
51 In particular, if you have an `&T` reference, then the `T` must not be
52 modified through that reference or any other reference. There are some
53 standard library types, e.g. `Cell` and `RefCell`, that provide inner
54 mutability by replacing compile time guarantees with dynamic checks at
57 An `&mut` reference has a different constraint: when an object has an
58 `&mut T` pointing into it, then that `&mut` reference must be the only
59 such usable path to that object in the whole program. That is, an
60 `&mut` cannot alias with any other references.
62 Using `unsafe` code to incorrectly circumvent and violate these
63 restrictions is undefined behaviour. For example, the following
64 creates two aliasing `&mut` pointers, and is invalid.
70 let ref_1: &mut u8 = &mut x;
71 let ref_2: &mut u8 = unsafe { mem::transmute(&mut *ref_1) };
73 // oops, ref_1 and ref_2 point to the same piece of data (x) and are
81 Rust offers two additional pointer types "raw pointers", written as
82 `*const T` and `*mut T`. They're an approximation of C's `const T*` and `T*`
83 respectively; indeed, one of their most common uses is for FFI,
84 interfacing with external C libraries.
86 Raw pointers have much fewer guarantees than other pointer types
87 offered by the Rust language and libraries. For example, they
89 - are not guaranteed to point to valid memory and are not even
90 guaranteed to be non-null (unlike both `Box` and `&`);
91 - do not have any automatic clean-up, unlike `Box`, and so require
92 manual resource management;
93 - are plain-old-data, that is, they don't move ownership, again unlike
94 `Box`, hence the Rust compiler cannot protect against bugs like
96 - are considered sendable (if their contents is considered sendable),
97 so the compiler offers no assistance with ensuring their use is
98 thread-safe; for example, one can concurrently access a `*mut int`
99 from two threads without synchronization.
100 - lack any form of lifetimes, unlike `&`, and so the compiler cannot
101 reason about dangling pointers; and
102 - have no guarantees about aliasing or mutability other than mutation
103 not being allowed directly through a `*const T`.
105 Fortunately, they come with a redeeming feature: the weaker guarantees
106 mean weaker restrictions. The missing restrictions make raw pointers
107 appropriate as a building block for implementing things like smart
108 pointers and vectors inside libraries. For example, `*` pointers are
109 allowed to alias, allowing them to be used to write shared-ownership
110 types like reference counted and garbage collected pointers, and even
111 thread-safe shared memory types (`Rc` and the `Arc` types are both
112 implemented entirely in Rust).
114 There are two things that you are required to be careful about
115 (i.e. require an `unsafe { ... }` block) with raw pointers:
117 - dereferencing: they can have any value: so possible results include
118 a crash, a read of uninitialised memory, a use-after-free, or
119 reading data as normal.
120 - pointer arithmetic via the `offset` [intrinsic](#intrinsics) (or
121 `.offset` method): this intrinsic uses so-called "in-bounds"
122 arithmetic, that is, it is only defined behaviour if the result is
123 inside (or one-byte-past-the-end) of the object from which the
124 original pointer came.
126 The latter assumption allows the compiler to optimize more
127 effectively. As can be seen, actually *creating* a raw pointer is not
128 unsafe, and neither is converting to an integer.
130 ### References and raw pointers
132 At runtime, a raw pointer `*` and a reference pointing to the same
133 piece of data have an identical representation. In fact, an `&T`
134 reference will implicitly coerce to an `*const T` raw pointer in safe code
135 and similarly for the `mut` variants (both coercions can be performed
136 explicitly with, respectively, `value as *const T` and `value as *mut T`).
138 Going the opposite direction, from `*const` to a reference `&`, is not
139 safe. A `&T` is always valid, and so, at a minimum, the raw pointer
140 `*const T` has to point to a valid instance of type `T`. Furthermore,
141 the resulting pointer must satisfy the aliasing and mutability laws of
142 references. The compiler assumes these properties are true for any
143 references, no matter how they are created, and so any conversion from
144 raw pointers is asserting that they hold. The programmer *must*
147 The recommended method for the conversion is
152 let p_imm: *const u32 = &i as *const u32;
155 let p_mut: *mut u32 = &mut m;
158 let ref_imm: &u32 = &*p_imm;
159 let ref_mut: &mut u32 = &mut *p_mut;
163 The `&*x` dereferencing style is preferred to using a `transmute`.
164 The latter is far more powerful than necessary, and the more
165 restricted operation is harder to use incorrectly; for example, it
166 requires that `x` is a pointer (unlike `transmute`).
170 ## Making the unsafe safe(r)
172 There are various ways to expose a safe interface around some unsafe
175 - store pointers privately (i.e. not in public fields of public
176 structs), so that you can see and control all reads and writes to
177 the pointer in one place.
178 - use `assert!()` a lot: since you can't rely on the protection of the
179 compiler & type-system to ensure that your `unsafe` code is correct
180 at compile-time, use `assert!()` to verify that it is doing the
181 right thing at run-time.
182 - implement the `Drop` for resource clean-up via a destructor, and use
183 RAII (Resource Acquisition Is Initialization). This reduces the need
184 for any manual memory management by users, and automatically ensures
185 that clean-up is always run, even when the task fails.
186 - ensure that any data stored behind a raw pointer is destroyed at the
189 As an example, we give a reimplementation of owned boxes by wrapping
190 `malloc` and `free`. Rust's move semantics and lifetimes mean this
191 reimplementation is as safe as the `Box` type.
194 #![feature(unsafe_destructor)]
197 use libc::{c_void, size_t, malloc, free};
201 // Define a wrapper around the handle returned by the foreign code.
202 // Unique<T> has the same semantics as Box<T>
203 pub struct Unique<T> {
204 // It contains a single raw, mutable pointer to the object in question.
208 // Implement methods for creating and using the values in the box.
210 // NB: For simplicity and correctness, we require that T has kind Send
211 // (owned boxes relax this restriction, and can contain managed (GC) boxes).
212 // This is because, as implemented, the garbage collector would not know
213 // about any shared boxes stored in the malloc'd region of memory.
214 impl<T: Send> Unique<T> {
215 pub fn new(value: T) -> Unique<T> {
217 let ptr = malloc(mem::size_of::<T>() as size_t) as *mut T;
218 // we *need* valid pointer.
219 assert!(!ptr.is_null());
220 // `*ptr` is uninitialized, and `*ptr = value` would
221 // attempt to destroy it `overwrite` moves a value into
222 // this memory without attempting to drop the original
224 ptr::write(&mut *ptr, value);
229 // the 'r lifetime results in the same semantics as `&*x` with
231 pub fn borrow<'r>(&'r self) -> &'r T {
232 // By construction, self.ptr is valid
233 unsafe { &*self.ptr }
236 // the 'r lifetime results in the same semantics as `&mut *x` with
238 pub fn borrow_mut<'r>(&'r mut self) -> &'r mut T {
239 unsafe { &mut *self.ptr }
243 // A key ingredient for safety, we associate a destructor with
244 // Unique<T>, making the struct manage the raw pointer: when the
245 // struct goes out of scope, it will automatically free the raw pointer.
247 // NB: This is an unsafe destructor, because rustc will not normally
248 // allow destructors to be associated with parameterized types, due to
249 // bad interaction with managed boxes. (With the Send restriction,
250 // we don't have this problem.) Note that the `#[unsafe_destructor]`
251 // feature gate is required to use unsafe destructors.
253 impl<T: Send> Drop for Unique<T> {
256 // Copy the object out from the pointer onto the stack,
257 // where it is covered by normal Rust destructor semantics
258 // and cleans itself up, if necessary
259 ptr::read(self.ptr as *const T);
261 // clean-up our allocation
262 free(self.ptr as *mut c_void)
267 // A comparison between the built-in `Box` and this reimplementation
272 } // `x` is freed here
275 let mut y = Unique::new(5i);
276 *y.borrow_mut() = 10;
277 } // `y` is freed here
281 Notably, the only way to construct a `Unique` is via the `new`
282 function, and this function ensures that the internal pointer is valid
283 and hidden in the private field. The two `borrow` methods are safe
284 because the compiler statically guarantees that objects are never used
285 before creation or after destruction (unless you use some `unsafe`
290 For extremely low-level manipulations and performance reasons, one
291 might wish to control the CPU directly. Rust supports using inline
292 assembly to do this via the `asm!` macro. The syntax roughly matches
296 asm!(assembly template
304 Any use of `asm` is feature gated (requires `#![feature(asm)]` on the
305 crate to allow) and of course requires an `unsafe` block.
307 > **Note**: the examples here are given in x86/x86-64 assembly, but
308 > all platforms are supported.
312 The `assembly template` is the only required parameter and must be a
313 literal string (i.e `""`)
318 #[cfg(target_arch = "x86")]
319 #[cfg(target_arch = "x86_64")]
327 #[cfg(not(target_arch = "x86"),
328 not(target_arch = "x86_64"))]
329 fn foo() { /* ... */ }
338 (The `feature(asm)` and `#[cfg]`s are omitted from now on.)
340 Output operands, input operands, clobbers and options are all optional
341 but you must add the right number of `:` if you skip them:
345 # #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
346 # fn main() { unsafe {
347 asm!("xor %eax, %eax"
355 Whitespace also doesn't matter:
359 # #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
360 # fn main() { unsafe {
361 asm!("xor %eax, %eax" ::: "eax");
367 Input and output operands follow the same format: `:
368 "constraints1"(expr1), "constraints2"(expr2), ..."`. Output operand
369 expressions must be mutable lvalues:
373 # #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
374 fn add(a: int, b: int) -> int {
384 # #[cfg(not(target_arch = "x86"), not(target_arch = "x86_64"))]
385 # fn add(a: int, b: int) -> int { a + b }
388 assert_eq!(add(3, 14159), 14162)
394 Some instructions modify registers which might otherwise have held
395 different values so we use the clobbers list to indicate to the
396 compiler not to assume any values loaded into those registers will
401 # #[cfg(target_arch = "x86")] #[cfg(target_arch = "x86_64")]
402 # fn main() { unsafe {
403 // Put the value 0x200 in eax
404 asm!("mov $$0x200, %eax" : /* no outputs */ : /* no inputs */ : "eax");
408 Input and output registers need not be listed since that information
409 is already communicated by the given constraints. Otherwise, any other
410 registers used either implicitly or explicitly should be listed.
412 If the assembly changes the condition code register `cc` should be
413 specified as one of the clobbers. Similarly, if the assembly modifies
414 memory, `memory` should also be specified.
418 The last section, `options` is specific to Rust. The format is comma
419 separated literal strings (i.e `:"foo", "bar", "baz"`). It's used to
420 specify some extra info about the inline assembly:
422 Current valid options are:
424 1. **volatile** - specifying this is analogous to `__asm__ __volatile__ (...)` in gcc/clang.
425 2. **alignstack** - certain instructions expect the stack to be
426 aligned a certain way (i.e SSE) and specifying this indicates to
427 the compiler to insert its usual stack alignment code
428 3. **intel** - use intel syntax instead of the default AT&T.
430 # Avoiding the standard library
432 By default, `std` is linked to every Rust crate. In some contexts,
433 this is undesirable, and can be avoided with the `#![no_std]`
434 attribute attached to the crate.
440 # // fn main() {} tricked you, rustdoc!
443 Obviously there's more to life than just libraries: one can use
444 `#[no_std]` with an executable, controlling the entry point is
445 possible in two ways: the `#[start]` attribute, or overriding the
446 default shim for the C `main` function with your own.
448 The function marked `#[start]` is passed the command line parameters
449 in the same format as C:
453 #![feature(lang_items)]
455 // Pull in the system libc library for what crt0.o likely requires
458 // Entry point for this program
460 fn start(_argc: int, _argv: *const *const u8) -> int {
464 // These functions and traits are used by the compiler, but not
465 // for a bare-bones hello world. These are normally
466 // provided by libstd.
467 #[lang = "stack_exhausted"] extern fn stack_exhausted() {}
468 #[lang = "eh_personality"] extern fn eh_personality() {}
469 #[lang = "sized"] trait Sized { }
470 # // fn main() {} tricked you, rustdoc!
473 To override the compiler-inserted `main` shim, one has to disable it
474 with `#![no_main]` and then create the appropriate symbol with the
475 correct ABI and the correct name, which requires overriding the
476 compiler's name mangling too:
481 #![feature(lang_items)]
485 #[no_mangle] // ensure that this symbol is called `main` in the output
486 pub extern fn main(argc: int, argv: *const *const u8) -> int {
490 #[lang = "stack_exhausted"] extern fn stack_exhausted() {}
491 #[lang = "eh_personality"] extern fn eh_personality() {}
492 #[lang = "sized"] trait Sized { }
493 # // fn main() {} tricked you, rustdoc!
497 The compiler currently makes a few assumptions about symbols which are available
498 in the executable to call. Normally these functions are provided by the standard
499 xlibrary, but without it you must define your own.
501 The first of these two functions, `stack_exhausted`, is invoked whenever stack
502 overflow is detected. This function has a number of restrictions about how it
503 can be called and what it must do, but if the stack limit register is not being
504 maintained then a task always has an "infinite stack" and this function
505 shouldn't get triggered.
507 The second of these two functions, `eh_personality`, is used by the failure
508 mechanisms of the compiler. This is often mapped to GCC's personality function
509 (see the [libstd implementation](std/rt/unwind/index.html) for more
510 information), but crates which do not trigger failure can be assured that this
511 function is never called.
513 The final item in the example is a trait called `Sized`. This a trait
514 that represents data of a known static size: it is integral to the
515 Rust type system, and so the compiler expects the standard library to
516 provide it. Since you are not using the standard library, you have to
521 > **Note**: the core library's structure is unstable, and it is recommended to
522 > use the standard library instead wherever possible.
524 With the above techniques, we've got a bare-metal executable running some Rust
525 code. There is a good deal of functionality provided by the standard library,
526 however, that is necessary to be productive in Rust. If the standard library is
527 not sufficient, then [libcore](core/index.html) is designed to be used
530 The core library has very few dependencies and is much more portable than the
531 standard library itself. Additionally, the core library has most of the
532 necessary functionality for writing idiomatic and effective Rust code.
534 As an example, here is a program that will calculate the dot product of two
535 vectors provided from C, using idiomatic Rust practices.
540 #![feature(lang_items)]
545 use core::prelude::*;
550 pub extern fn dot_product(a: *const u32, a_len: u32,
551 b: *const u32, b_len: u32) -> u32 {
552 use core::raw::Slice;
554 // Convert the provided arrays into Rust slices.
555 // The core::raw module guarantees that the Slice
556 // structure has the same memory layout as a &[T]
559 // This is an unsafe operation because the compiler
560 // cannot tell the pointers are valid.
561 let (a_slice, b_slice): (&[u32], &[u32]) = unsafe {
563 Slice { data: a, len: a_len as uint },
564 Slice { data: b, len: b_len as uint },
568 // Iterate over the slices, collecting the result
570 for (i, j) in a_slice.iter().zip(b_slice.iter()) {
576 #[lang = "begin_unwind"]
577 extern fn begin_unwind(args: &core::fmt::Arguments,
583 #[lang = "stack_exhausted"] extern fn stack_exhausted() {}
584 #[lang = "eh_personality"] extern fn eh_personality() {}
585 # #[start] fn start(argc: int, argv: *const *const u8) -> int { 0 }
589 Note that there is one extra lang item here which differs from the examples
590 above, `begin_unwind`. This must be defined by consumers of libcore because the
591 core library declares failure, but it does not define it. The `begin_unwind`
592 lang item is this crate's definition of failure, and it must be guaranteed to
595 As can be seen in this example, the core library is intended to provide the
596 power of Rust in all circumstances, regardless of platform requirements. Further
597 libraries, such as liballoc, add functionality to libcore which make other
598 platform-specific assumptions, but continue to be more portable than the
599 standard library itself.
601 # Interacting with the compiler internals
603 > **Note**: this section is specific to the `rustc` compiler; these
604 > parts of the language may never be fully specified and so details may
605 > differ wildly between implementations (and even versions of `rustc`
608 > Furthermore, this is just an overview; the best form of
609 > documentation for specific instances of these features are their
610 > definitions and uses in `std`.
612 The Rust language currently has two orthogonal mechanisms for allowing
613 libraries to interact directly with the compiler and vice versa:
615 - intrinsics, functions built directly into the compiler providing
616 very basic low-level functionality,
617 - lang-items, special functions, types and traits in libraries marked
618 with specific `#[lang]` attributes
622 > **Note**: intrinsics will forever have an unstable interface, it is
623 > recommended to use the stable interfaces of libcore rather than intrinsics
626 These are imported as if they were FFI functions, with the special
627 `rust-intrinsic` ABI. For example, if one was in a freestanding
628 context, but wished to be able to `transmute` between types, and
629 perform efficient pointer arithmetic, one would import those functions
630 via a declaration like
633 # #![feature(intrinsics)]
636 extern "rust-intrinsic" {
637 fn transmute<T, U>(x: T) -> U;
639 fn offset<T>(dst: *const T, offset: int) -> *const T;
643 As with any other FFI functions, these are always `unsafe` to call.
647 > **Note**: lang items are often provided by crates in the Rust distribution,
648 > and lang items themselves have an unstable interface. It is recommended to use
649 > officially distributed crates instead of defining your own lang items.
651 The `rustc` compiler has certain pluggable operations, that is,
652 functionality that isn't hard-coded into the language, but is
653 implemented in libraries, with a special marker to tell the compiler
654 it exists. The marker is the attribute `#[lang="..."]` and there are
655 various different values of `...`, i.e. various different "lang
658 For example, `Box` pointers require two lang items, one for allocation
659 and one for deallocation. A freestanding program that uses the `Box`
660 sugar for dynamic allocations via `malloc` and `free`:
664 #![feature(lang_items)]
672 #[lang="exchange_malloc"]
673 unsafe fn allocate(size: uint, _align: uint) -> *mut u8 {
674 let p = libc::malloc(size as libc::size_t) as *mut u8;
683 #[lang="exchange_free"]
684 unsafe fn deallocate(ptr: *mut u8, _size: uint, _align: uint) {
685 libc::free(ptr as *mut libc::c_void)
689 fn main(argc: int, argv: *const *const u8) -> int {
695 #[lang = "stack_exhausted"] extern fn stack_exhausted() {}
696 #[lang = "eh_personality"] extern fn eh_personality() {}
697 #[lang = "sized"] trait Sized {}
700 Note the use of `abort`: the `exchange_malloc` lang item is assumed to
701 return a valid pointer, and so needs to do the check internally.
703 Other features provided by lang items include:
705 - overloadable operators via traits: the traits corresponding to the
706 `==`, `<`, dereferencing (`*`) and `+` (etc.) operators are all
707 marked with lang items; those specific four are `eq`, `ord`,
708 `deref`, and `add` respectively.
709 - stack unwinding and general failure; the `eh_personality`, `fail_`
710 and `fail_bounds_checks` lang items.
711 - the traits in `std::kinds` used to indicate types that satisfy
712 various kinds; lang items `send`, `sync` and `copy`.
713 - the marker types and variance indicators found in
714 `std::kinds::markers`; lang items `covariant_type`,
715 `contravariant_lifetime`, `no_sync_bound`, etc.
717 Lang items are loaded lazily by the compiler; e.g. if one never uses
718 `Box` then there is no need to define functions for `exchange_malloc`
719 and `exchange_free`. `rustc` will emit an error when an item is needed
720 but not found in the current crate or any that it depends on.