Mark Adler [Mon, 18 Feb 2019 04:45:53 +0000 (20:45 -0800)]
Use ARM crc32 instructions if the ARM architecture has them.
The ARM crc32 instructions will be used only if an architecture is
explicitly specified at compile time that has those instructions.
For example, -march=armv8.1-a or -march=armv8-a+crc, or if the
machine being compiled on has the instructions, -march=native.
Mark Adler [Mon, 18 Feb 2019 03:48:57 +0000 (19:48 -0800)]
Add use of the ARMv8 crc32 instructions when requested.
Define the macro Z_ARM_CRC32 at compile time to use the ARMv8
(aarch64) crc32x and crc32b instructions. This code does not check
for the presence of the crc32 instructions. Those instructions are
optional for ARMv8.0, though mandatory for ARMv8.1 and later. The
use of the crc32 instructions is about ten times as fast as the
software braided calculation of the CRC-32. This can noticeably
speed up the decompression of gzip streams.
Mark Adler [Thu, 3 Jan 2019 02:10:40 +0000 (18:10 -0800)]
Don't bother computing check value after successful inflateSync().
inflateSync() is used to skip invalid deflate data, which means
that the check value that was being computed is no longer useful.
This commit turns off the check value computation, and furthermore
allows a successful return if the compressed data terminated in a
graceful manner. This commit also fixes a bug in the case that
inflateSync() is used before a header is ever processed. In that
case, there is no knowledge of a trailer, so the remainder is
treated as raw.
Mark Adler [Tue, 11 Dec 2018 09:11:38 +0000 (01:11 -0800)]
Speed up software CRC-32 computation by a factor of 1.5 to 3.
Use the interleaved method of Kadatch and Jenkins in order to make
use of pipelined instructions through multiple ALUs in a single
core. This also speeds up and simplifies the combination of CRCs,
and updates the functions to pre-calculate and use an operator for
CRC combination.
Mark Adler [Sun, 4 Nov 2018 18:31:46 +0000 (10:31 -0800)]
Add crc32_combine_gen() and crc32_combine_op() for fast combines.
When the same len2 is used repeatedly, it is faster to use
crc32_combine_gen() to generate an operator, that is then used to
combine CRCs with crc32_combine_op().
Mark Adler [Sat, 4 Aug 2018 21:27:02 +0000 (14:27 -0700)]
Clarify that prefix codes are counted in enough.c.
There is no assurance that all prefix codes are reachable as
optimal Huffman codes for the numbers of symbols encountered in
a deflate block. This code considers all possible prefix codes,
which might be a larger set than all possible Huffman codes,
depending on the constraints.
Mark Adler [Wed, 18 Apr 2018 05:09:22 +0000 (22:09 -0700)]
Fix a bug that can crash deflate on some input when using Z_FIXED.
This bug was reported by Danilo Ramos of Eideticom, Inc. It has
lain in wait 13 years before being found! The bug was introduced
in zlib 1.2.2.2, with the addition of the Z_FIXED option. That
option forces the use of fixed Huffman codes. For rare inputs with
a large number of distant matches, the pending buffer into which
the compressed data is written can overwrite the distance symbol
table which it overlays. That results in corrupted output due to
invalid distances, and can result in out-of-bound accesses,
crashing the application.
The fix here combines the distance buffer and literal/length
buffers into a single symbol buffer. Now three bytes of pending
buffer space are opened up for each literal or length/distance
pair consumed, instead of the previous two bytes. This assures
that the pending buffer cannot overwrite the symbol table, since
the maximum fixed code compressed length/distance is 31 bits, and
since there are four bytes of pending space for every three bytes
of symbol space.
Mark Adler [Sat, 3 Jun 2017 16:49:39 +0000 (09:49 -0700)]
Avoid the use of ptrdiff_t.
This isn't the right type anyway to assure that it contains a
pointer. That type would be intptr_t or uintptr_t. However the C99
standard says that those types are optional, so their use would not
be portable. This commit simply uses size_t or whatever configure
decided to use for size_t. That would be the same length as
ptrdiff_t, and so will work just as well. The code checks to see if
the length of the type used is the same as the length of a void
pointer, so there is already protection against the use of the
wrong type. The use of size_t (or ptrdiff_t) will almost always
work, as all modern architectures have an array size that is the
same as the pointer size. Only old segmented architectures would
have to fall back to the slower CRC-32 calculation, where the
amount of memory that can be accessed is larger than the maximum
array size.
Mark Adler [Sun, 16 Apr 2017 15:35:33 +0000 (08:35 -0700)]
Handle case where inflateSync used when header never processed.
If zlib and/or gzip header processing was requested, but a header
was never provided and inflateSync was used successfully, then the
inflate state would be inconsistent, trying to compute a check
value but with no flags set. This commit sets the inflate mode to
raw in this case, since there is no other assumption that can be
made if a header was requested but never seen.
Mark Adler [Sun, 5 Feb 2017 07:58:37 +0000 (23:58 -0800)]
Avoid a conversion error in gzseek when off_t type too small.
This is a problem in the odd case that the second argument of
LSEEK is a larger type than off_t. Apparently MinGW defines off_t
to be 32 bits, but _lseeki64 has a 64-bit second argument.
Also undo a previous commit to permit MinGW to use _lseeki64.
Mark Adler [Sat, 21 Jan 2017 09:50:26 +0000 (01:50 -0800)]
Limit hash table inserts after switch from stored deflate.
This limits hash table inserts to the available data in the window
and to the sliding window size in deflate_stored(). The hash table
inserts are deferred until deflateParams() switches to a non-zero
compression level.
Mark Adler [Mon, 16 Jan 2017 17:49:35 +0000 (09:49 -0800)]
Permit a deflateParams() parameter change as soon as possible.
This commit allows a parameter change even if the input data has
not all been compressed and copied to the application output
buffer, so long as all of the input data has been compressed to
the internal pending output buffer. This also allows an immediate
deflateParams change so long as there have been no deflate calls
since initialization or reset.
Mark Adler [Sun, 15 Jan 2017 16:22:16 +0000 (08:22 -0800)]
Permit immediate deflateParams changes before any deflate input.
This permits deflateParams to change the strategy and level right
after deflateInit, without having to wait until a header has been
written. The parameters can be changed immediately up until the
first deflate call that consumes any input data.
Mark Adler [Sun, 15 Jan 2017 16:15:55 +0000 (08:15 -0800)]
Update high water mark in deflate_stored.
This avoids unnecessary filling of bytes in the sliding window
buffer when switching from level zero to a non-zero level. This
also provides a consistent indication of deflate having taken
input for a later commit ...
Mark Adler [Tue, 3 Jan 2017 01:25:27 +0000 (17:25 -0800)]
Add warnings when compiling with assembler code.
There have been many reports of bugs in the assembler codes
intended to speed up deflate and inflate. They are third-party
contributions in contrib, and so are not supported by the zlib
maintainers.
Mark Adler [Sat, 31 Dec 2016 18:03:09 +0000 (10:03 -0800)]
Avoid the need for ssize_t.
Limit read() and write() requests to sizes that fit in an int.
This allows storing the return value in an int, and avoiding the
need to use or construct an ssize_t type. This is required for
Microsoft C, whose _read and _write functions take an unsigned
request and return an int.
Mark Adler [Sat, 3 Dec 2016 18:27:14 +0000 (10:27 -0800)]
Create z_size_t and z_ssize_t types.
Normally these are set to size_t and ssize_t. But if they do not
exist, then they are set to the smallest integer type that can
contain a pointer. size_t is unsigned and ssize_t is signed.
Mark Adler [Sat, 3 Dec 2016 16:29:57 +0000 (08:29 -0800)]
Don't need to emit an empty fixed block when changing parameters.
gzsetparams() was using Z_PARTIAL_FLUSH when it could use Z_BLOCK
instead. This commit uses Z_BLOCK, which avoids emitting an
unnecessary ten bits into the stream.
Mark Adler [Sat, 3 Dec 2016 16:18:56 +0000 (08:18 -0800)]
Clean up gz* function return values.
In some cases the return values did not match the documentation,
or the documentation did not document all of the return values.
gzprintf() now consistently returns negative values on error,
which matches the behavior of the stdio fprintf() function.
Mark Adler [Sat, 5 Nov 2016 15:43:29 +0000 (08:43 -0700)]
Speed up deflation for level 0 (storing).
The previous code slid the window and the hash table and copied
every input byte three times in order to just write the data as
stored blocks with no compression. This commit minimizes sliding
and copying, especially for large input and output buffers.
Level 0 compression is now more than 20 times faster than before
the commit.
Most of the speedup is due to deferring hash table slides until
deflateParams() is called to change the compression level away
from 0. More speedup is due to copying directly from next_in to
next_out when the amounts of available input data and output space
permit it, avoiding the intermediate pending buffer. Additionally,
only the last 32K of the used input data is copied back to the
sliding window when large input buffers are provided.
Mark Adler [Wed, 23 Nov 2016 07:29:19 +0000 (23:29 -0800)]
Assure that deflateParams() will not switch functions mid-block.
This alters the specification in zlib.h, so that deflateParams()
will not change any parameters if there is not enough output space
in the event that a block is emitted in order to allow switching
the compression function.
Mark Adler [Sun, 20 Nov 2016 19:36:15 +0000 (11:36 -0800)]
Increase verbosity required to warn about bit length overflow.
When debugging the Huffman coding would warn about resulting codes
greater than 15 bits in length. This is handled properly, and is
not uncommon. This increases the verbosity of the warning by one,
so that it is not displayed by default.
Mark Adler [Sun, 30 Oct 2016 16:25:32 +0000 (09:25 -0700)]
Use memcpy for stored blocks.
This speeds up level 0 by about a factor of three, as compared to
the previous byte-at-a-time loop. We can do much better though. A
later commit avoids this copy for level 0 with large buffers,
instead copying directly from the input to the output. This commit
still speeds up storing incompressible data found when compressing
normally.
Mark Adler [Fri, 28 Oct 2016 05:50:43 +0000 (22:50 -0700)]
Fix bug when level 0 used with Z_HUFFMAN or Z_RLE.
Compression level 0 requests no compression, using only stored
blocks. When Z_HUFFMAN or Z_RLE was used with level 0 (granted,
an odd choice, but permitted), the resulting blocks were mostly
fixed or dynamic. The reason is that deflate_stored() was not
being called in that case. The compressed data was valid, but it
was not what the application requested. This commit assures that
only stored blocks are emitted for compression level 0, regardless
of the strategy selected.
Mark Adler [Wed, 26 Oct 2016 03:45:41 +0000 (20:45 -0700)]
Make a noble effort at setting OS_CODE correctly.
This updates the OS_CODE determination at compile time to match as
closely as possible the operating system mappings documented in
the PKWare APPNOTE.TXT version 6.3.4, section 4.4.2.2. That byte
in the gzip header is used by nobody for anything, as far as I can
tell. However we might as well try to set it appropriately.
Mark Adler [Tue, 25 Oct 2016 03:11:41 +0000 (20:11 -0700)]
Do a more thorough check of the state for every stream call.
This verifies that the state has been initialized, that it is the
expected type of state, deflate or inflate, and that at least the
first several bytes of the internal state have not been clobbered.
Mark Adler [Mon, 24 Oct 2016 22:52:19 +0000 (15:52 -0700)]
Reject a window size of 256 bytes if not using the zlib wrapper.
There is a bug in deflate for windowBits == 8 (256-byte window).
As a result, zlib silently changes a request for 8 to a request
for 9 (512-byte window), and sets the zlib header accordingly so
that the decompressor knows to use a 512-byte window. However if
deflateInit2() is used for raw deflate or gzip streams, then there
is no indication that the request was not honored, and the
application might assume that it can use a 256-byte window when
decompressing. This commit returns an error if the user requests
a 256-byte window when using raw deflate or gzip encoding.