This pull request converts the scheduler from a naive shared queue scheduler to a naive workstealing scheduler. The deque is still a queue inside a lock, but there is still a substantial performance gain. Fiddling with the messaging benchmark I got a ~10x speedup and observed massively reduced memory usage.
There are still *many* locations for optimization, but based on my experience so far it is a clear performance win as it is now.