When merging two sorted blocks `left` and `right` if the last element in
`left` is <= the first in `right`, the blocks are already sorted.
Add this as an additional fast path by simply copying the whole left
block into the output and advancing the left pointer. The right block is
then treated the same way by the already present logic in the merge
loop.
Reduces runtime of .sort() to less than 50% of the previous, if the data
was already perfectly sorted. Sorted data with a few swaps are also
sorted quicker than before. The overhead of one comparison per merge
seems to be negligible.
let mut out = buf_tmp.offset(start as isize);
let out_end = buf_tmp.offset(right_end_idx as isize);
+ // if left[last] <= right[0], they are already in order:
+ // fast-forward the left side (the right side is handled
+ // in the loop).
+ if compare(&*right.offset(-1), &*right) != Greater {
+ let elems = (right_start as usize - left as usize) / mem::size_of::<T>();
+ ptr::copy_nonoverlapping(&*left, out, elems);
+ out = out.offset(elems as isize);
+ left = right_start;
+ }
+
while out < out_end {
// Either the left or the right run are exhausted,
// so just copy the remainder from the other run