summaryrefslogtreecommitdiff
path: root/deflate.h
AgeCommit message (Collapse)Author
2021-12-24Don't define HASH_SIZE if it is already defined.Nathan Moinvaziri
2021-06-21Added Z_UNUSED define for ignore unused variables.Nathan Moinvaziri
2021-01-27Add extra space in deflate internal_state struct for future expansion.Hans Kristian Rosbach
Also make internal_state struct have a static size regardless of what features have been activated. Internal_state is now always 6040 bytes on Linux/x86-64, and 5952 bytes on Linux/x86-32.
2020-09-23Remove NIL preprocessor macro which isn't consistently enforced.Nathan Moinvaziri
2020-08-31Minor comments/whitespace cleanupHans Kristian Rosbach
2020-08-31Reorder s->block_open and s->reproducible.Hans Kristian Rosbach
2020-08-31Remove s->method since it is always set to the same value and never read.Hans Kristian Rosbach
2020-08-31Move and reduce size of s->pending_buf_sizeHans Kristian Rosbach
2020-08-31Rename ZLIB_INTERNAL to Z_INTERNAL for consistency.Nathan Moinvaziri
2020-08-27Fix more conversion warnings related to s->bi_valid, stored_len and misc.Hans Kristian Rosbach
2020-08-27Changes to deflate's internal_state struct members:Hans Kristian Rosbach
- Change window_size from unsigned long to unsigned int - Change block_start from long to int - Change high_water from unsigned long to unsigned int - Reorder to promote cache locality in hot code and decrease holes. On x86_64 this means the struct goes from: /* size: 6008, cachelines: 94, members: 57 */ /* sum members: 5984, holes: 6, sum holes: 24 */ /* last cacheline: 56 bytes */ To: /* size: 5984, cachelines: 94, members: 57 */ /* sum members: 5972, holes: 3, sum holes: 8 */ /* padding: 4 */ /* last cacheline: 32 bytes */
2020-08-23Increase hash table size from 15 to 16 bits.Hans Kristian Rosbach
This gives a good performance increase, and usually also improves compression. Make separate define HASH_SLIDE for fallback version of UPDATE_HASH.
2020-08-23Replace hash_bits, hash_size and hash_mask with defines.Hans Kristian Rosbach
2020-08-23Use unaligned 32-bit and 64-bit compare based on best match length when ↵Nathan Moinvaziri
searching for matches. Move TRIGGER_LEVEL to match_tpl.h since it is only used in longest match. Use early return inside match loops instead of cont variable. Added back two variable check for platforms that don't supported unaligned access.
2020-08-20Prevent unaligned double word access on ARMv7 in put_uint64NiLuJe
By implementing a (UNALIGNED_OK && !UNALIGNED64_OK) codepath.
2020-05-30Remove IPos typedef which also helps to reduce casting warnings.Nathan Moinvaziri
2020-05-24Simplify generic hash function using knuth's multiplicative hash.Nathan Moinvaziri
2020-05-24Use 64-bit bit buffer when emitting codes.Nathan Moinvaziri
2020-05-06Remove several NOT_TWEAK_COMPILER checks and their legacy code.Hans Kristian Rosbach
2020-05-06Split tree emitting code into its own source header to be included by both ↵Nathan Moinvaziri
trees.c and deflate_quick.c so that their functions can be statically linked for performance reasons.
2020-05-06Unify emitting of literals and match dist/lengths. Removed deflate quick ↵Nathan Moinvaziri
static tables, allowing for 32k window.
2020-05-01Standardize fill_window implementations and abstract out slide_hash_neon for ↵Nathan Moinvaziri
ARM.
2020-04-30Standardize insert_string functionality across architectures. Added ↵Nathan Moinvaziri
unaligned conditionally compiled code for insert_string and quick_insert_string. Unify sse42 crc32 assembly between insert_string and quick_insert_string. Modified quick_insert_string to work across architectures.
2020-03-17Remove cvs keywordsPavel P
2020-03-13Clean up zng_tr_tally code.Nathan Moinvaziri
2020-03-13Fixed possible unsigned integer overflow in send_bits when calculating the ↵Nathan Moinvaziri
new bits valid it bit buffer.
2020-02-08Fixed missing compressed_len count in deflate_quick when ZLIB_DEBUG defined. ↵Nathan Moinvaziri
Put most likely condition in send_bits first.
2020-02-08Change deflate_state's bi_buf from 16-bit to 32-bit.Nathan Moinvaziri
2020-02-07Added better aligned access support for put_short.Nathan Moinvaziri
Renamed putShortMSB to put_short_msb and moved to deflate.h with the rest of the put functions. Added put_uint32 and put_uint32_msb and replaced instances of put_byte with put_short or put_uint32.
2020-02-07Fixed formatting, 4 spaces for code intent, 2 spaces for preprocessor ↵Nathan Moinvaziri
indent, initial function brace on the same line as definition, removed extraneous spaces and new lines.
2019-10-24Fixed signed warnings in zng_tr_tally_dist on Windows.Nathan Moinvaziri
2019-10-22Use temp variables in send_all_trees tooHans Kristian Rosbach
Re-introduce private temp variables for val and len in send_bits macro.
2019-10-22Reduce indirections used by send_bits and send_code.Hans Kristian Rosbach
Also simplify the debug tracing into the define instead of using a separate static function. x86_64 shows a small performance improvement.
2019-09-20Split maketrees out into a separate toolHans Kristian Rosbach
2019-09-04Add slide_hash to functable, and enable the sse2-optimized version.Hans Kristian Rosbach
Add necessary code to cmake and configure. Fix slide_hash_sse2 to compile with zlib-ng.
2019-08-06Rename gzendian to zendian since it is included in more than just the gzip ↵Nathan Moinvaziri
library code.
2019-07-18Add zng_ prefix to internal functions to avoid linking conflicts with zlib. ↵Nathan Moinvaziri
(#363)
2019-07-18Add "reproducible" deflate parameterIlya Leoshkevich
IBM Z DEFLATE CONVERSION CALL may produce different (but valid) compressed data for the same uncompressed data. This behavior might be unacceptable for certain use cases (e.g. reproducible builds). This patch introduces Z_DEFLATE_REPRODUCIBLE parameter, which can be used to indicate that this is the case, and turn off IBM Z DEFLATE CONVERSION CALL.
2019-06-04Fixed compiler warnings on Windows in release mode (#349)Nathan Moinvaziri
This pull request attempts to fix some compiler warnings on Windows when compiled in Release mode. ``` "zlib-ng\ALL_BUILD.vcxproj" (default target) (1) -> "zlib-ng\zlibstatic.vcxproj" (default target) (6) -> zlib-ng\deflate.c(1626): warning C4244: '=': conversion from 'uint16_t' to 'unsigned cha r', possible loss of data [zlib-ng\zlibstatic.vcxproj] zlib-ng\deflate_fast.c(61): warning C4244: '=': conversion from 'uint16_t' to 'unsigned char', possible loss of data [zlib-ng\zlibstatic.vcxproj] zlib-ng\deflate_slow.c(89): warning C4244: '=': conversion from 'uint16_t' to 'unsigned char', possible loss of data [zlib-ng\zlibstatic.vcxproj] ```
2019-05-23Introduce inflate_ensure_window, make bi_reverse and flush_pending ZLIB_INTERNALIlya Leoshkevich
2019-03-08Update x86 and x86_64 arch checks to use the recommendedHans Kristian Rosbach
define names, resulting in improved compiler support. Based on the overviews from several sites, such as: http://nadeausoftware.com/articles/2012/02/c_c_tip_how_detect_processor_type_using_compiler_predefined_macros
2019-03-08remove MEMCPY, replace with memcpySebastian Pop
2018-12-18remove `unaligned store` UBsan warningsSebastian Pop
This patch addresses several warnings from `make test` when zlib-ng was configured -with-fuzzers -with-sanitizers: zlib-ng/trees.c:798:5: runtime error: store to misaligned address 0x63100125c801 for type 'uint16_t' (aka 'unsigned short'), which requires 2 byte alignment 0x63100125c801: note: pointer points here 00 80 76 01 8b 08 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior zlib-ng/trees.c:798:5 in zlib-ng/trees.c:799:5: runtime error: store to misaligned address 0x63100125c803 for type 'uint16_t' (aka 'unsigned short'), which requires 2 byte alignment 0x63100125c803: note: pointer points here 76 01 f5 08 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ Instead of using `*(uint16_t*) foo = bar` to write a uint16_t, call __builtin_memcpy which will be safe in case of memory page boundaries. Without the patch: Performance counter stats for './minigzip -9 llvm.tar': 13173.840115 task-clock (msec) # 1.000 CPUs utilized 27 context-switches # 0.002 K/sec 0 cpu-migrations # 0.000 K/sec 129 page-faults # 0.010 K/sec 57,801,072,298 cycles # 4.388 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 75,270,723,557 instructions # 1.30 insns per cycle 17,797,368,302 branches # 1350.963 M/sec 196,795,107 branch-misses # 1.11% of all branches 13.177897531 seconds time elapsed 45408 -rw-rw-r-- 1 spop spop 46493896 Dec 11 14:45 llvm.tar.gz With remove-unaligned-stores patch: 13184.736536 task-clock (msec) # 1.000 CPUs utilized 44 context-switches # 0.003 K/sec 1 cpu-migrations # 0.000 K/sec 129 page-faults # 0.010 K/sec 57,882,724,316 cycles # 4.390 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 75,235,920,853 instructions # 1.30 insns per cycle 17,826,873,999 branches # 1352.084 M/sec 196,050,096 branch-misses # 1.10% of all branches 13.185868238 seconds time elapsed 45408 -rw-rw-r-- 1 spop spop 46493896 Dec 11 14:46 llvm.tar.gz
2018-11-14remove 16-byte alignment from deflate_state::crc0Mike Klein
We noticed recently on the Skia tree that if we build Chromium's zlib with GCC, -O3, -m32, and -msse2, deflateInit2_() crashes. Might also need -fPIC... not sure. I tracked this down to a `movaps` (16-byte aligned store) to an address that was only 8-byte aligned. This address was somewhere in the middle of the deflate_state struct that deflateInit2_()'s job is to initialize. That deflate_state struct `s` is allocated using ZALLOC, which calls any user supplied zalloc if set, or the default if not. Neither one of these has any special alignment contract, so generally they'll tend to be 2*sizeof(void*) aligned. On 32-bit builds, that's 8-byte aligned. But because we've annotated crc0 as zalign(16), the natural alignment of the whole struct is 16-byte, and a compiler like GCC can feel free to use 16-byte aligned stores to parts of the struct that are 16-byte aligned, like the beginning, crc0, or any other part before or after crc0 that happens to fall on a 16-byte boundary. With -O3 and -msse2, GCC does exactly that, writing a few of the fields with one 16-byte store. The fix is simply to remove zalign(16). All the code that manipulates this field was actually already using unaligned loads and stores. You can see it all right at the top of crc_folding.c, CRC_LOAD and CRC_SAVE. This bug comes from the Intel performance patches we landed a few years ago, and isn't present in upstream zlib, Android's zlib, or Google's internal zlib. It doesn't seem to be tickled by Clang, and won't happen on 64-bit GCC builds: zalloc is likely 16-byte aligned there. I _think_ it's possible for it to trigger on non-x86 32-bit builds with GCC, but haven't tested that. I also have not tested MSVC. Reviewed-on: https://chromium-review.googlesource.com/1236613
2018-11-14Fix a bug that can crash deflate on some input when using Z_FIXED.Mark Adler
This bug was reported by Danilo Ramos of Eideticom, Inc. It has lain in wait 13 years before being found! The bug was introduced in zlib 1.2.2.2, with the addition of the Z_FIXED option. That option forces the use of fixed Huffman codes. For rare inputs with a large number of distant matches, the pending buffer into which the compressed data is written can overwrite the distance symbol table which it overlays. That results in corrupted output due to invalid distances, and can result in out-of-bound accesses, crashing the application. The fix here combines the distance buffer and literal/length buffers into a single symbol buffer. Now three bytes of pending buffer space are opened up for each literal or length/distance pair consumed, instead of the previous two bytes. This assures that the pending buffer cannot overwrite the symbol table, since the maximum fixed code compressed length/distance is 31 bits, and since there are four bytes of pending space for every three bytes of symbol space.
2018-01-31Adapt code to support PREFIX macros and update build scriptsMika Lindqvist
2017-08-17Make sure we don't export internal functionsHans Kristian Rosbach
2017-04-24Add a struct func_table and function functableInit.Hans Kristian Rosbach
The struct contains pointers to select functions to be used by the rest of zlib, and the init function selects what functions will be used depending on what optimizations has been compiled in and what instruction-sets are available at runtime. Tests done on a haswell cpu running minigzip -6 compression of a 40M file shows a 2.5% decrease in branches, and a 25-30% reduction in iTLB-loads. The reduction i iTLB-loads is likely mostly due to the inability to inline functions. This also causes a slight performance regression of around 1%, this might still be worth it to make it much easier to implement new optimized functions for various architectures and instruction sets. The performance penalty will get smaller for functions that get more alternative implementations to choose from, since there is no need to add more branches to every call of the function. Today insert_string has 1 branch to choose insert_string_sse or insert_string_c, but if we also add for example insert_string_sse4 then that would have needed another branch, and it would probably at some point hinder effective inlining too.
2017-02-23Let all platforms defining UNALIGNED_OK use the optimized put_shortHans Kristian Rosbach
implementation. Also change from pre-increment to post-increment to prevent a double-store on non-x86 platforms.
2017-02-18Let all x86 and x86_64 archs use the new UPDATE_HASH implementation,Hans Kristian Rosbach
this improves compression performance and can often provide slightly better compression.