Age | Commit message (Collapse) | Author |
|
When not using window, `window + wsize` applies a zero offset to a null pointer, which is undefined behavior.
|
|
|
|
|
|
output and window. In this instance if too many bytes are written it will not correctly write matches with distances close to the window size.
|
|
avoid errors with optimizations enabled.
|
|
|
|
|
|
|
|
* Reintroduce support for ZLIB_CONST in compat mode.
|
|
When inflate_fast checks for extra length bits, it first checks if the
number of extra length bits (in op) is non-zero. However, if the number
of extra length bits is 0, the `bits < op` check will be false, BITS(op)
will be 0, and DROPBITS(op) will do nothing. So, drop the conditional,
for a speedup of about 0.8%.
This makes the handling of extra length bits match the handling of extra
dist bits, which already lacks the extra conditional.
|
|
performance.
|
|
Use uint8_t[8] struct on big-endian machines for speed.
|
|
indent, initial function brace on the same line as definition, removed extraneous spaces and new lines.
|
|
handles that condition properly.
|
|
|
|
|
|
(#363)
|
|
The oss fuzzers started failing with the following assert
```
ASSERT: 0 == memcmp(data + offset, buf, len)
```
after the following patch has been pulled in the tree:
```
commit 20ca64fa5d2d8a7421ed86b68709ef971dcfbddf
Author: Sebastian Pop <s.pop@samsung.com>
Date: Wed Mar 6 14:16:20 2019 -0600
define and use chunkmemset instead of byte_memset for INFFAST_CHUNKSIZE
```
The function chunkcopysafe is assuming that the input `len` is less than 16 bytes:
```
if ((safe - out) < (ptrdiff_t)INFFAST_CHUNKSIZE) {
```
but we were called with `len = 22` because `safe` was defined too small:
```
- safe = out + (strm->avail_out - INFFAST_CHUNKSIZE);
```
and the difference `safe - out` was 16 bytes smaller than the actual `len`.
The patch fixes the initialization of `safe` to:
```
+ safe = out + strm->avail_out;
```
|
|
|
|
|
|
|
|
Fixed arithmetic overflow warnings in MSVC.
Fixed uint64_t to uint32_t casting warning.
Added assert to check if bits is greater than 32 before cast.
|
|
Seeing a few percent speedup by using a pointer instead of an
assigned structure. This seems to help the compiler to optimize
better.
|
|
Based on a patch by Nigel Tao:
https://github.com/madler/zlib/pull/292/commits/e0ff1f330cc03ee04843f857869b4036593ab39d
This patch makes unzipping of files up to 1.2x faster on x86_64. The other part
(1.3x speedup) of the patch by Nigel Tao is unsafe as discussed in the review of
that pull request. zlib-ng already has a different way to optimize the memcpy
for that missing part.
The original patch was enabled only on little-endian machines. This patch adapts
the loading of 64 bits at a time to big endian machines.
Benchmarking notes from Hans Kristian Rosbach:
https://github.com/zlib-ng/zlib-ng/pull/224#issuecomment-444837182
Benchmark runs: 7, tested levels: 0-7, testfile 100M
develop at 796ad10 with -O3:
Level Comp Comptime min/avg/max Decomptime min/avg/max
0 100.02% 0.01/0.01/0.02 0.08/0.09/0.11
1 47.08% 0.49/0.50/0.51 0.37/0.39/0.40
2 36.02% 1.10/1.12/1.13 0.39/0.39/0.40
3 34.77% 1.32/1.34/1.37 0.38/0.38/0.38
4 33.41% 1.50/1.53/1.56 0.37/0.37/0.38
5 33.07% 1.85/1.87/1.90 0.36/0.37/0.38
6 32.83% 2.54/2.57/2.61 0.36/0.37/0.38
avg 45.31% 1.28 0.34
tot 62.60 16.58
PR224 with -O3:
Level Comp Comptime min/avg/max Decomptime min/avg/max
0 100.02% 0.01/0.01/0.02 0.09/0.09/0.10
1 47.08% 0.49/0.50/0.51 0.37/0.37/0.38
2 36.02% 1.09/1.11/1.13 0.38/0.38/0.39
3 34.77% 1.32/1.34/1.38 0.35/0.36/0.38
4 33.41% 1.49/1.52/1.54 0.36/0.36/0.37
5 33.07% 1.85/1.88/1.93 0.35/0.36/0.37
6 32.83% 2.55/2.58/2.65 0.35/0.35/0.36
avg 45.31% 1.28 0.33
tot 62.48 16.02
So I see about a 5.4% speedup on my x86_64 machine, not quite the 1.2x speedup
but a nice speedup nevertheless. This benchmark measures the total execution
time of minigzip, so that might have caused some inefficiencies.
At -O2, I only see a 2.7% speedup.
|
|
* move definition of z_size_t to zbuild.h
|
|
|
|
When memory copy operations happen byte by byte, the processors are unable to
fuse the loads and stores together because of aliasing issues. This patch
clusters some of the memory copy operations in chunks of 16 and 8 bytes.
For byte memset, the compiler knows how to prepare the chunk to be stored.
When the memset pattern is larger than a byte, this patch builds the pattern for
chunk memset using the same technique as in Simon Hosie's patch
https://codereview.chromium.org/2722063002
This patch improves by 50% the performance of zlib decompression of a 50K PNG on
aarch64-linux and x86_64-linux when compiled with gcc-7 or llvm-5.
The number of executed instructions reported by valgrind --tool=cachegrind
on the decompression of a 50K PNG file on aarch64-linux:
- before the patch:
I refs: 3,783,757,451
D refs: 1,574,572,882 (869,116,630 rd + 705,456,252 wr)
- with the patch:
I refs: 2,391,899,214
D refs: 899,359,836 (516,666,051 rd + 382,693,785 wr)
The compression of a 260MB directory containing the code of llvm into a tar.gz
of 35MB and decompressing that with minigzip -d
on i7-4790K x86_64-linux, it takes 0.533s before the patch and 0.493s with the patch,
on Juno-r0 aarch64-linux A57, it takes 2.796s before the patch and 2.467s with the patch,
on Juno-r0 aarch64-linux A53, it takes 4.055s before the patch and 3.604s with the patch.
|
|
* Inflate using wider loads and stores.
In inflate_fast() the output pointer always has plenty of room to write. This
means that so long as the target is capable, wide un-aligned loads and stores
can be used to transfer several bytes at once.
When the reference distance is too short simply unroll the data a little to
increase the distance.
Change-Id: I59854eb25d2b1e43561c8a2afaf9175bf10cf674
|
|
On a benchmark using zlib to decompress a PNG image this change shows a 20%
speedup. It makes sense to special case distance = 1 of read after write
dependences because it is possible to replace the loop kernel with a memset
which is usually implemented in assembly in the libc, and because of the
frequency at which distance = 1 appears during the PNG decompression:
Distance Frequency
1 1009001
6 64500
9 29000
3 25500
144 14500
12 10000
15 3500
7 2000
24 1000
21 1000
18 1000
87 500
22 500
192 500
|
|
|
|
|
|
An old inffast.c optimization turns out to not be optimal anymore
with modern compilers, and furthermore was not compliant with the
C standard, for which decrementing a pointer before its allocated
memory is undefined. Per the recommendation of a security audit of
the zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, this "optimization" was removed, in order to
avoid the possibility of undefined behavior.
|
|
|
|
This commit was cherry-picked and was not done, resulting in a few
problems with gcc on 64bit windows.
This reverts commit edd7a72e056b994458ff040d4740b16b35336b60.
Conflicts:
arch/x86/crc_folding.c
arch/x86/fill_window_sse.c
deflate.c
deflate.h
match.c
trees.c
|
|
|
|
|
|
Signed-off-by: Daniel Axtens <dja@axtens.net>
Conflicts:
arch/x86/crc_folding.c
crc32.c
|
|
|
|
This makes it easier to implement support for ASM replacements using
configure parameters if needed later. Also since zlib-ng uses
compiler intrinsics, this needed a cleanup in any case.
|
|
Only internal functions, no exported functions in this commit.
|
|
|
|
Remove a few leftovers from the legacy OS support removal
|
|
|
|
|
|
This patch allows zlib to compile cleanly with the -Wcast-qual gcc
warning enabled, but only if ZLIB_CONST is defined, which adds
const to next_in and msg in z_stream and in the in_func prototype.
A --const option is added to ./configure which adds -DZLIB_CONST
to the compile flags, and adds -Wcast-qual to the compile flags
when ZLIBGCCWARN is set in the environment.
|
|
|
|
|
|
|
|
|
|
|