auto merge of #11866 : alexcrichton/rust/atomic-u64, r=brson
Let's try this again.
This is an implementation of mutexes which I believe is free from undefined behavior of OS mutexes (the pitfall of the previous implementation).
This implementation is not ideal. There's a yield-loop spot, and it's not particularly fair with respect to lockers who steal without going through the normal code paths. That being said, I believe that this is a correct implementation which is a stepping stone to move from.
I haven't done rigorous benchmarking of this mutex, but preliminary results show that it's about 25% slower in the uncontended case on linux (same runtime on OSX), and it's actually faster than a pthreads mutex on high contention (again, not rigorous benchmarking, I just saw these numbers come up).