Age | Commit message (Collapse) | Author |
|
longest_match related code.
|
|
|
|
|
|
quick_insert_string is being used instead.
|
|
quick_insert_string function used on Windows.
|
|
Use uint8_t[8] struct on big-endian machines for speed.
|
|
|
|
|
|
to compare258.
|
|
ARM.
|
|
unaligned conditionally compiled code for insert_string and quick_insert_string. Unify sse42 crc32 assembly between insert_string and quick_insert_string. Modified quick_insert_string to work across architectures.
|
|
Add necessary code to cmake and configure.
Fix slide_hash_sse2 to compile with zlib-ng.
|
|
|
|
* wrap crc32 in functable
* change internal crc32 api to use uint64_t rather than size_t for length
|
|
|
|
- Split functableInit() function as separate functions for each functable member, so we don't need to initialize full functable in multiple places in the zlib-ng code, or to check for NULL on every invocation.
- Optimized function for each functable member is detected on first invocation and the functable item is updated for subsequent invocations.
- Remove NULL check in adler32() and adler32_z() as it is no longer needed.
|
|
- Add missing call to functableinit from inflateinit
- Fix external direct calls to adler32 functions without calling functableinit
|
|
The struct contains pointers to select functions to be used by the
rest of zlib, and the init function selects what functions will be
used depending on what optimizations has been compiled in and what
instruction-sets are available at runtime.
Tests done on a haswell cpu running minigzip -6 compression of a
40M file shows a 2.5% decrease in branches, and a 25-30% reduction
in iTLB-loads. The reduction i iTLB-loads is likely mostly due to
the inability to inline functions. This also causes a slight
performance regression of around 1%, this might still be worth it
to make it much easier to implement new optimized functions for
various architectures and instruction sets.
The performance penalty will get smaller for functions that get more
alternative implementations to choose from, since there is no need
to add more branches to every call of the function.
Today insert_string has 1 branch to choose insert_string_sse
or insert_string_c, but if we also add for example insert_string_sse4
then that would have needed another branch, and it would probably
at some point hinder effective inlining too.
|