Age | Commit message (Collapse) | Author |
|
Fix data loss warning.
Fixes:
```
itkzlib-ng/inflate.c(1209,24): warning C4267: '=': conversion from 'size_t' to 'unsigned long', possible loss of data
itkzlib-ng/inflate.c(1210,26): warning C4267: '=': conversion from 'size_t' to 'unsigned long', possible loss of data
```
|
|
|
|
|
|
|
|
DFLTCC does not provide the necessary information to implement
inflateSyncPoint() - Incomplete-Function Status and Incomplete-Function
Length are the fields that provide the relevant information, but
unfortunately it's not enough. If DFLTCC is in use, the current code
checks software decompression state and always returns 0. This
(rightfully) confuses rsync, so fix by returning Z_STREAM_ERROR
instead.
|
|
The C standard says that bit shifts of negative integers is
undefined. This casts to unsigned values to assure a known
result.
|
|
|
|
|
|
|
|
|
|
performance.
|
|
Use uint8_t[8] struct on big-endian machines for speed.
|
|
|
|
|
|
ARM.
|
|
indent, initial function brace on the same line as definition, removed extraneous spaces and new lines.
|
|
inflateSync() is used to skip invalid deflate data, which means
that the check value that was being computed is no longer useful.
This commit turns off the check value computation, and furthermore
allows a successful return if the compressed data terminated in a
graceful manner. This commit also fixes a bug in the case that
inflateSync() is used before a header is ever processed. In that
case, there is no knowledge of a trailer, so the remainder is
treated as raw.
|
|
We don't want to compile arch-specific code when WITH_OPTIM is not set,
and the current checks don't take that into account.
|
|
* Merge aarch64 and arm cmake sections.
* Updated MSVC compiler support for ARM and ARM64.
* Moved detection for -mfpu=neon to where the flag is set to simplify add_intrinsics_option.
* Only add ${ACLEFLAG} on aarch64 if not WITH_NEON.
* Rename arch/x86/ctzl.h to fallback_builtins.h.
|
|
Remove BUILDFIXED support.
Split out MAKEFIXED into a separate 'makefixed' util that is easy
to use if we want to regenerate/verify inffixed.h.
|
|
and infback.c, making actual differences much easier to spot, easing maintenance.
|
|
* Remove old zlib readme.
* Remove old zlib change history from inflate.c.
* Remove old treebuild.xml and zlib pdf.
|
|
This also reduces the library size by 4120bytes or ~2.9%.
|
|
|
|
(#363)
|
|
Future versions of IBM Z mainframes will provide DFLTCC instruction,
which implements deflate algorithm in hardware with estimated
compression and decompression performance orders of magnitude faster
than the current zlib-ng and ratio comparable with that of level 1.
This patch adds DFLTCC support to zlib-ng. In order to enable it, the
following build commands should be used:
$ ./configure --with-dfltcc-deflate --with-dfltcc-inflate
$ make
When built like this, zlib-ng would compress in hardware on level 1,
and in software on all other levels. Decompression will always happen
in hardware. In order to enable DFLTCC compression for levels 1-6 (i.e.
to make it used by default) one could add -DDFLTCC_LEVEL_MASK=0x7e to
CFLAGS when building zlib-ng.
Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page. Therefore care should be taken when using hardware compression
when reproducible results are desired.
DFLTCC does not support every single zlib-ng feature, in particular:
* inflate(Z_BLOCK) and inflate(Z_TREES)
* inflateMark()
* inflatePrime()
* deflateParams() after the first deflate() call
When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.
This patch tries to add DFLTCC support in a least intrusive way.
All SystemZ-specific code was placed into a separate file, but
unfortunately there is still a noticeable amount of changes in the
main zlib-ng code. Below is the summary of those changes.
DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer and a window. Since DFLTCC requires parameter block to be
doubleword-aligned, and it's reasonable to allocate it alongside
deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
macros were introduced in order to encapsulate the allocation details.
The same is true for window, for which ZALLOC_WINDOW and
TRY_FREE_WINDOW macros were introduced.
While for inflate software and hardware window formats match, this is
not the case for deflate. Therefore, deflateSetDictionary and
deflateGetDictionary need special handling, which is triggered using the
new DEFLATE_SET_DICTIONARY_HOOK and DEFLATE_GET_DICTIONARY_HOOK macros.
deflateResetKeep() and inflateResetKeep() now update the DFLTCC
parameter block, which is allocated alongside zlib-ng state, using
the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.
In order to make unsupported deflateParams(), inflatePrime() and
inflateMark() calls to fail gracefully, the new DEFLATE_PARAMS_HOOK,
INFLATE_PRIME_HOOK and INFLATE_MARK_HOOK macros were introduced.
The algorithm implemented in hardware has different compression ratio
than the one implemented in software. In order for deflateBound() to
return the correct results for the hardware implementation, the new
DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
were introduced.
Actual compression and decompression are handled by the new DEFLATE_HOOK
and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
window on its own, calling updatewindow() is suppressed using the new
INFLATE_NEED_UPDATEWINDOW() macro.
In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming needs to
be suppressed using the new DEFLATE_NEED_CHECKSUM and
INFLATE_NEED_CHECKSUM macros.
DFLTCC will refuse to write an End-of-block Symbol if there is no input
data, thus in some cases it is necessary to do this manually. In order
to achieve this, bi_reverse and flush_pending were promoted from static
to ZLIB_INTERNAL and exposed via deflate.h.
Since the first call to dfltcc_inflate already needs the window, and it
might be not allocated yet, inflate_ensure_window was factored out of
updatewindow and made ZLIB_INTERNAL.
|
|
|
|
|
|
|
|
|
|
This makes the checks for arm cpu features as inexpensive as on the x86 side
by calling the runtime feature detection once in deflate/inflate init and then
storing the result in a global variable.
|
|
|
|
Based on a patch by Nigel Tao:
https://github.com/madler/zlib/pull/292/commits/e0ff1f330cc03ee04843f857869b4036593ab39d
This patch makes unzipping of files up to 1.2x faster on x86_64. The other part
(1.3x speedup) of the patch by Nigel Tao is unsafe as discussed in the review of
that pull request. zlib-ng already has a different way to optimize the memcpy
for that missing part.
The original patch was enabled only on little-endian machines. This patch adapts
the loading of 64 bits at a time to big endian machines.
Benchmarking notes from Hans Kristian Rosbach:
https://github.com/zlib-ng/zlib-ng/pull/224#issuecomment-444837182
Benchmark runs: 7, tested levels: 0-7, testfile 100M
develop at 796ad10 with -O3:
Level Comp Comptime min/avg/max Decomptime min/avg/max
0 100.02% 0.01/0.01/0.02 0.08/0.09/0.11
1 47.08% 0.49/0.50/0.51 0.37/0.39/0.40
2 36.02% 1.10/1.12/1.13 0.39/0.39/0.40
3 34.77% 1.32/1.34/1.37 0.38/0.38/0.38
4 33.41% 1.50/1.53/1.56 0.37/0.37/0.38
5 33.07% 1.85/1.87/1.90 0.36/0.37/0.38
6 32.83% 2.54/2.57/2.61 0.36/0.37/0.38
avg 45.31% 1.28 0.34
tot 62.60 16.58
PR224 with -O3:
Level Comp Comptime min/avg/max Decomptime min/avg/max
0 100.02% 0.01/0.01/0.02 0.09/0.09/0.10
1 47.08% 0.49/0.50/0.51 0.37/0.37/0.38
2 36.02% 1.09/1.11/1.13 0.38/0.38/0.39
3 34.77% 1.32/1.34/1.38 0.35/0.36/0.38
4 33.41% 1.49/1.52/1.54 0.36/0.36/0.37
5 33.07% 1.85/1.88/1.93 0.35/0.36/0.37
6 32.83% 2.55/2.58/2.65 0.35/0.35/0.36
avg 45.31% 1.28 0.33
tot 62.48 16.02
So I see about a 5.4% speedup on my x86_64 machine, not quite the 1.2x speedup
but a nice speedup nevertheless. This benchmark measures the total execution
time of minigzip, so that might have caused some inefficiencies.
At -O2, I only see a 2.7% speedup.
|
|
If zlib and/or gzip header processing was requested, but a header
was never provided and inflateSync was used successfully, then the
inflate state would be inconsistent, trying to compute a check
value but with no flags set. This commit sets the inflate mode to
raw in this case, since there is no other assumption that can be
made if a header was requested but never seen.
|
|
This CL fixes a security bug in zlib. It was reported upstream long ago
and the testcase was shared upstream but it's yet unsolved. As a fix,
state->check is set to the same value as the adler32 of an empty string.
Upstream bug: madler/zlib#245
Bug: chromium:697481 https://crbug.com/697481
Reviewed-on: https://chromium-review.googlesource.com/601193
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Adam Langley <agl@chromium.org>
Commit-Queue: Nicolás Peña <npm@chromium.org>
|
|
Move decrement in loop to avoid the following errors:
adler32.c:91:19: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'size_t' (aka 'unsigned long')
adler32.c:136:19: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'size_t' (aka 'unsigned long')
inflate.c:972:32: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int'
Fix the following bugs as recommended by Mika Lindqvist:
arch/x86/deflate_quick.c:233:22: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int'
arch/x86/fill_window_sse.c:52:28: runtime error: unsigned integer overflow: 1 - 8192 cannot be represented in type 'unsigned int'
|
|
* move definition of z_size_t to zbuild.h
|
|
|
|
|
|
to co-exist in an application that has been linked to something that
depends on stock zlib. Previously, that would cause random problems
since there is no way to guarantee what zlib version is being used
for each dynamically linked function.
Add the corresponding zlib-ng.h.
Tests, example and minigzip will not compile before they have been
adapted to use the correct functions as well.
Either duplicate them, so we have minigzip-ng.c for example, or add
compile-time detection in the source code.
|
|
|
|
- Split functableInit() function as separate functions for each functable member, so we don't need to initialize full functable in multiple places in the zlib-ng code, or to check for NULL on every invocation.
- Optimized function for each functable member is detected on first invocation and the functable item is updated for subsequent invocations.
- Remove NULL check in adler32() and adler32_z() as it is no longer needed.
|
|
- Add missing call to functableinit from inflateinit
- Fix external direct calls to adler32 functions without calling functableinit
|
|
When memory copy operations happen byte by byte, the processors are unable to
fuse the loads and stores together because of aliasing issues. This patch
clusters some of the memory copy operations in chunks of 16 and 8 bytes.
For byte memset, the compiler knows how to prepare the chunk to be stored.
When the memset pattern is larger than a byte, this patch builds the pattern for
chunk memset using the same technique as in Simon Hosie's patch
https://codereview.chromium.org/2722063002
This patch improves by 50% the performance of zlib decompression of a 50K PNG on
aarch64-linux and x86_64-linux when compiled with gcc-7 or llvm-5.
The number of executed instructions reported by valgrind --tool=cachegrind
on the decompression of a 50K PNG file on aarch64-linux:
- before the patch:
I refs: 3,783,757,451
D refs: 1,574,572,882 (869,116,630 rd + 705,456,252 wr)
- with the patch:
I refs: 2,391,899,214
D refs: 899,359,836 (516,666,051 rd + 382,693,785 wr)
The compression of a 260MB directory containing the code of llvm into a tar.gz
of 35MB and decompressing that with minigzip -d
on i7-4790K x86_64-linux, it takes 0.533s before the patch and 0.493s with the patch,
on Juno-r0 aarch64-linux A57, it takes 2.796s before the patch and 2.467s with the patch,
on Juno-r0 aarch64-linux A53, it takes 4.055s before the patch and 3.604s with the patch.
|
|
* Inflate using wider loads and stores.
In inflate_fast() the output pointer always has plenty of room to write. This
means that so long as the target is capable, wide un-aligned loads and stores
can be used to transfer several bytes at once.
When the reference distance is too short simply unroll the data a little to
increase the distance.
Change-Id: I59854eb25d2b1e43561c8a2afaf9175bf10cf674
|
|
|
|
|
|
This verifies that the state has been initialized, that it is the
expected type of state, deflate or inflate, and that at least the
first several bytes of the internal state have not been clobbered.
|
|
Based on upstream commit 5370d99a2affe0b040550cffbc0ba8fa790594b3
|
|
Based on upstream 7096424f23df1b1813237fb5f8bc8f34cfcedd0c, but
modified heavily to match zlib-ng.
|