auto merge of #11498 : c-a/rust/optimize_vuint_at, r=alexcrichton
Use a lookup table, SHIFT_MASK_TABLE, that for every possible four
bit prefix holds the number of times the value should be right shifted and what
the right shifted value should be masked with. This way we can get rid of the
branches which in my testing gives approximately a 2x speedup.
Timings on Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz
-- Before --
running 5 tests
test ebml::tests::test_vuint_at ... ok
test ebml::bench::vuint_at_A_aligned ... bench: 494 ns/iter (+/- 3)
test ebml::bench::vuint_at_A_unaligned ... bench: 494 ns/iter (+/- 4)
test ebml::bench::vuint_at_D_aligned ... bench: 467 ns/iter (+/- 5)
test ebml::bench::vuint_at_D_unaligned ... bench: 467 ns/iter (+/- 5)
-- After --
running 5 tests
test ebml::tests::test_vuint_at ... ok
test ebml::bench::vuint_at_A_aligned ... bench: 181 ns/iter (+/- 2)
test ebml::bench::vuint_at_A_unaligned ... bench: 192 ns/iter (+/- 1)
test ebml::bench::vuint_at_D_aligned ... bench: 181 ns/iter (+/- 3)
test ebml::bench::vuint_at_D_unaligned ... bench: 197 ns/iter (+/- 6)