auto merge of #13539 : Aatch/rust/vector-copy-faster, r=thestinger
LLVM wasn't recognising the loops as memcpy loops and was therefore failing to optimise them properly. While improving LLVM is the "proper" way to fix this, I think that these cases are important enough to warrant a little low-level optimisation.
Fixes #13472
r? @thestinger
---
Benchmark Results:
```
--- Before ---
test clone_owned ... bench:
6126104 ns/iter (+/- 285962) = 170 MB/s
test clone_owned_to_owned ... bench:
6125054 ns/iter (+/- 271197) = 170 MB/s
test clone_str ... bench: 80586 ns/iter (+/- 11489) = 13011 MB/s
test clone_vec ... bench:
3903220 ns/iter (+/- 658556) = 268 MB/s
test test_memcpy ... bench: 69401 ns/iter (+/- 2168) = 15108 MB/s
--- After ---
test clone_owned ... bench: 70839 ns/iter (+/- 4931) = 14801 MB/s
test clone_owned_to_owned ... bench: 70286 ns/iter (+/- 4836) = 14918 MB/s
test clone_str ... bench: 78519 ns/iter (+/- 5511) = 13353 MB/s
test clone_vec ... bench: 71415 ns/iter (+/- 1999) = 14682 MB/s
test test_memcpy ... bench: 70980 ns/iter (+/- 2126) = 14772 MB/s
```