auto merge of #15471 : erickt/rust/push_all, r=acrichto
llvm is currently not able to conver `Vec::extend` into a memcpy for `Copy` types, which results in methods like `Vec::push_all` to run twice as slow as it should be running. This patch takes the unsafe `Vec::clone` optimization to speed up all the operations that are cloning a slice into a `Vec`.
auto merge of #15283 : kwantam/rust/master, r=alexcrichton
Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode)
- Unicode-aware functions now live in libunicode
- `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`,
`is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`,
`to_uppercase`, `to_lowercase`
- added `width` method in UnicodeChar trait
- determines printed width of character in columns, or None if it is a non-NULL control character
- takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise)
- split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode)
- functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice`
- `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right`
- also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice`
- unified Unicode tables from libcollections, libcore, and libregex into libunicode
- updated `unicode.py` in `src/etc` to generate aforementioned tables
- generated new tables based on latest Unicode data
- added `UnicodeChar` and `UnicodeStrSlice` traits to prelude
- libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore
- thus, moved doc comment for `char` from `core::char` to `unicode::char`
- libcollections remains the collection point for `std::str`
The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude.
auto merge of #15220 : vhbit/rust/treemap-str-equiv, r=alexcrichton
- it allows to lookup using any str-equiv object, making TreeMaps finally usable (for example, it is much easier to work with JSON with lookup values being static strs)
- actually provides pretty flexible solution which could be extended to other equivalent types (although it might be not that performant)
- unicode tests live in coretest crate
- libcollections str tests need UnicodeChar trait.
- libregex perlw tests were checking a char in the Alphabetic category,
\x2161. Confirmed perl 5.18 considers this a \w character. Changed to
\x2961, which is not \w as the test expects.
auto merge of #15540 : Gankro/rust/master, r=huonw
Removing recursion from TreeMap implementation, because we don't have TCO. No need to add ```O(logn)``` extra stack frames to search in a tree.
I find it curious that ```find_mut``` and ```find``` basically duplicated the same logic, but in different ways (iterative vs recursive), possibly to maneuvre around mutability rules, but that's a more fundamental issue to deal with elsewhere.
Thanks to acrichto for the magic trick to appease borrowck (another issue to deal with elsewhere).
auto merge of #15530 : adrientetar/rust/proper-fonts, r=alexcrichton
- Treat WOFF as binary files so that git does not perform newline normalization.
- Replace corrupt Heuristica files with Source Serif Pro — italics are [almost in production](https://github.com/adobe/source-serif-pro/issues/2) so I left Heuristica Italic which makes a good pair with SSP. Overall, Source Serif Pro is I think a better fit for rustdoc (cc @TheHydroImpulse). This ought to fix #15527.
- Store Source Code Pro locally in order to make offline docs freestanding. Fixes #14778.
lexer: lex WS/COMMENT/SHEBANG rather than skipping
Now, the lexer will categorize every byte in its input according to the
grammar. The parser skips over these while parsing, thus avoiding their
presence in the input to syntax extensions.
Corey Richardson [Wed, 18 Jun 2014 17:44:20 +0000 (10:44 -0700)]
syntax: don't parse numeric literals in the lexer
This removes a bunch of token types. Tokens now store the original, unaltered
numeric literal (that is still checked for correctness), which is parsed into
an actual number later, as needed, when creating the AST.
This can change how syntax extensions work, but otherwise poses no visible
changes.
syntax: don't process string/char/byte/binary lits
This shuffles things around a bit so that LIT_CHAR and co store an Ident
which is the original, unaltered literal in the source. When creating the AST,
unescape and postprocess them.
This changes how syntax extensions can work, slightly, but otherwise poses no
visible changes. To get a useful value out of one of these tokens, call
`parse::{char_lit, byte_lit, bin_lit, str_lit}`
Corey Richardson [Tue, 17 Jun 2014 06:00:49 +0000 (23:00 -0700)]
syntax: use a better Show impl for Ident
Rather than just dumping the id in the interner, which is useless, actually
print the interned string. Adjust the lexer logging to use Show instead of
Poly.
auto merge of #15537 : jbclements/rust/hygiene-for-methods, r=pcwalton
This patch adds hygiene for methods. This one was more difficult than the others, due principally to issues surrounding `self`. Specifically, there were a whole bunch of places in the code that assumed that a `self` identifier could be discarded and then made up again later, causing the discard of contexts and hygiene breakage.
auto merge of #15374 : steveklabnik/rust/comments, r=brson
I'm leaving off `rustdoc` usage because it won't work unless this is a `pub fn`, and I want to talk about public/private in the context of modules. I'm also not mentioning `//!` because it is exclusively used to provide the overview of a module.
John Clements [Sun, 6 Jul 2014 22:10:57 +0000 (15:10 -0700)]
carry self ident forward through re-parsing
formerly, the self identifier was being discarded during parsing, which
stymies hygiene. The best fix here seems to be to attach a self identifier
to ExplicitSelf_, a change that rippled through the rest of the compiler,
but without any obvious damage.
John Clements [Mon, 7 Jul 2014 16:54:08 +0000 (09:54 -0700)]
introducing let-syntax
The let-syntax expander is different in that it doesn't apply
a mark to its token trees before expansion. This is used
for macro_rules, and it's because macro_rules is essentially
MTWT's let-syntax. You don't want to mark before expand sees
let-syntax, because there's no "after" syntax to mark again.
In some sense, the cleaner approach might be to introduce a new
AST node that macro_rules expands into; this would make it clearer
that the expansion of a macro is distinct from the addition of a
new macro binding.
auto merge of #14832 : alexcrichton/rust/no-rpath, r=brson
This commit disables rustc's emission of rpath attributes into dynamic libraries
and executables by default. The functionality is still preserved, but it must
now be manually enabled via a `-C rpath` flag.
This involved a few changes to the local build system:
* --disable-rpath is now the default configure option
* Makefiles now prefer our own LD_LIBRARY_PATH over the user's LD_LIBRARY_PATH
in order to support building rust with rust already installed.
* The compiletest program was taught to correctly pass through the aux dir as a
component of LD_LIBRARY_PATH in more situations.
The major impact of this change is that neither rustdoc nor rustc will work
out-of-the-box in all situations because they are dynamically linked. It must be
arranged to ensure that the libraries of a rust installation are part of the
LD_LIBRARY_PATH. The default installation paths for all platforms ensure this,
but if an installation is in a nonstandard location, then configuration may be
necessary.
Additionally, for all developers of rustc, it will no longer be possible to run
$target/stageN/bin/rustc out-of-the-box. The old behavior can be regained
through the `--enable-rpath` option to the configure script.
This change brings linux/mac installations in line with windows installations
where rpath is not possible.
auto merge of #15493 : brson/rust/tostr, r=pcwalton
This updates https://github.com/rust-lang/rust/pull/15075.
Rename `ToStr::to_str` to `ToString::to_string`. The naive renaming ends up with two `to_string` functions defined on strings in the prelude (the other defined via `collections::str::StrAllocating`). To remedy this I removed `StrAllocating::to_string`, making all conversions from `&str` to `String` go through `Show`. This has a measurable impact on the speed of this conversion, but the sense I get from others is that it's best to go ahead and unify `to_string` and address performance for all `to_string` conversions in `core::fmt`. `String::from_str(...)` still works as a manual fast-path.
Note that the patch was done with a script, and ended up renaming a number of other `*_to_str` functions, particularly inside of rustc. All the ones I saw looked correct, and I didn't notice any additional API breakage.
kwantam [Mon, 30 Jun 2014 21:04:10 +0000 (17:04 -0400)]
Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
- Unicode-aware functions now live in libunicode
- is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
is_uppercase, is_whitespace, is_alphanumeric, is_control,
is_digit, to_uppercase, to_lowercase
- added width method in UnicodeChar trait
- determines printed width of character in columns, or None if it is
a non-NULL control character
- takes a boolean argument indicating whether the present context is
CJK or not (characters with 'A'mbiguous widths are double-wide in
CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
(libunicode)
- functionality formerly in StrSlice that relied upon Unicode
functionality from Char is now in UnicodeStrSlice
- words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
- also moved Words type alias into libunicode because words method is
in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
combining the libunicode functionality with the Char functionality
from libcore
- thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str
The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.
auto merge of #15411 : mitchmindtree/rust/master, r=alexcrichton
I ran `make check` and everything went smoothly. I also tested `#[deriving(Decodable, Encodable)]` on a struct containing both Cell<T> and RefCell<T> and everything now seems to work fine.
auto merge of #15464 : dotdash/rust/bool_stores, r=pcwalton
LLVM doesn't handle i1 value in allocas/memory very well and skips a number of optimizations if it hits it. So we have to do the same thing that Clang does, using i1 for SSA values, but storing i8 in memory.
Store booleans as i8 in memory to improve optimizations by LLVM
LLVM doesn't really like types with a bit-width that isn't a multiple of
8 and disable various optimizations if it encounters such types used
with loads/stores. OTOH, booleans must be represented as i1 when used as
SSA values. To get the best results, we must use i1 for SSA values, and
i8 when storing the value to memory.
By using range asserts on loads, LLVM can eliminate the required
zero-extend and truncate operations.
static TEXT: &'static str = "\
Unicode est un standard informatique qui permet des échanges \
de textes dans différentes langues, à un niveau mondial.";
#[bench]
fn old_push_byte(bencher: &mut Bencher) {
bencher.bytes = TEXT.len() as u64;
bencher.iter(|| {
let mut new = String::new();
for b in TEXT.bytes() {
unsafe { new.as_mut_vec().push_all([b]) }
}
})
}
#[bench]
fn new_push_byte(bencher: &mut Bencher) {
bencher.bytes = TEXT.len() as u64;
bencher.iter(|| {
let mut new = String::new();
for b in TEXT.bytes() {
unsafe { new.as_mut_vec().push(b) }
}
})
}
```
collections: Optimize Vec when cloning from a slice
llvm is currently not able to conver `Vec::extend` into a memcpy
for `Copy` types, which results in methods like `Vec::push_all`
to run twice as slow as it should be running. This patch takes
the unsafe `Vec::clone` optimization to speed up all the operations
that are cloning a slice into a `Vec`.