std: micro-optimize Vec constructors and add benchmarks
Generally speaking, inlining doesn't really help out with
constructing vectors, except for when we construct a zero-sized
vector. This patch allows llvm to optimize this case away in
a lot of cases, which shaves off 4-8ns. It's not much, but it
might help in some inner loop somewhere.
before:
running 12 tests
test bench_extend_0 ... bench: 123 ns/iter (+/- 6)
test bench_extend_5 ... bench: 323 ns/iter (+/- 11)
test bench_from_fn_0 ... bench: 7 ns/iter (+/- 0)
test bench_from_fn_5 ... bench: 49 ns/iter (+/- 6)
test bench_from_iter_0 ... bench: 11 ns/iter (+/- 0)
test bench_from_iter_5 ... bench: 176 ns/iter (+/- 11)
test bench_from_slice_0 ... bench: 8 ns/iter (+/- 1)
test bench_from_slice_5 ... bench: 73 ns/iter (+/- 5)
test bench_new ... bench: 0 ns/iter (+/- 0)
test bench_with_capacity_0 ... bench: 6 ns/iter (+/- 1)
test bench_with_capacity_100 ... bench: 41 ns/iter (+/- 3)
test bench_with_capacity_5 ... bench: 40 ns/iter (+/- 2)
after:
test bench_extend_0 ... bench: 123 ns/iter (+/- 7)
test bench_extend_5 ... bench: 339 ns/iter (+/- 27)
test bench_from_fn_0 ... bench: 7 ns/iter (+/- 0)
test bench_from_fn_5 ... bench: 54 ns/iter (+/- 4)
test bench_from_iter_0 ... bench: 11 ns/iter (+/- 1)
test bench_from_iter_5 ... bench: 182 ns/iter (+/- 16)
test bench_from_slice_0 ... bench: 4 ns/iter (+/- 0)
test bench_from_slice_5 ... bench: 62 ns/iter (+/- 3)
test bench_new ... bench: 0 ns/iter (+/- 0)
test bench_with_capacity_0 ... bench: 0 ns/iter (+/- 0)
test bench_with_capacity_100 ... bench: 41 ns/iter (+/- 1)
test bench_with_capacity_5 ... bench: 41 ns/iter (+/- 3)