Semantically there's actually no reason for us to spawn threads as part of the
call to `wait_with_output`, and that's generally an incredibly heavyweight
operation for just reading a few bytes (especially when stderr probably rarely
has bytes!). An equivalent operation in terms of what's implemented today would
be to just drain both pipes of all contents and then call `wait` on the child
process itself.
On Unix we can implement this through some convenient use of the `select`
function, whereas on Windows we can make use of overlapped I/O. Note that on
Windows this requires us to use named pipes instead of anonymous pipes, but
they're semantically the same under the hood.