diff options
author | alk3pInjection <webmaster@raspii.tech> | 2021-08-30 16:43:38 +0800 |
---|---|---|
committer | alk3pInjection <webmaster@raspii.tech> | 2021-08-30 16:43:38 +0800 |
commit | cbe033a53bfe49d980774e59025e3b2af91778b7 (patch) | |
tree | 558535f91276162e0be70d07b34ed2e6577e38ad /doc/lz4_Frame_format.md | |
parent | fdd43c66dd9e77283aa8f7e52a881be44d622441 (diff) | |
parent | d44371841a2f1728a3f36839fd4b7e872d0927d3 (diff) |
Merge tag 'v1.9.3' into lineage-18.1HEADlineage-18.1
Change-Id: Iad56c1b17a32f9f356a4c1ff9557f0e79addf481
Diffstat (limited to 'doc/lz4_Frame_format.md')
-rw-r--r-- | doc/lz4_Frame_format.md | 56 |
1 files changed, 35 insertions, 21 deletions
diff --git a/doc/lz4_Frame_format.md b/doc/lz4_Frame_format.md index 0c98df1..7e08841 100644 --- a/doc/lz4_Frame_format.md +++ b/doc/lz4_Frame_format.md @@ -16,7 +16,7 @@ Distribution of this document is unlimited. ### Version -1.6.1 (30/01/2018) +1.6.2 (12/08/2020) Introduction @@ -75,7 +75,7 @@ __Frame Descriptor__ 3 to 15 Bytes, to be detailed in its own paragraph, as it is the most important part of the spec. -The combined __Magic Number__ and __Frame Descriptor__ fields are sometimes +The combined _Magic_Number_ and _Frame_Descriptor_ fields are sometimes called ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes. __Data Blocks__ @@ -85,14 +85,13 @@ That’s where compressed data is stored. __EndMark__ -The flow of blocks ends when the last data block has a size of “0”. -The size is expressed as a 32-bits value. +The flow of blocks ends when the last data block is followed by +the 32-bit value `0x00000000`. __Content Checksum__ -Content Checksum verify that the full content has been decoded correctly. -The content checksum is the result -of [xxh32() hash function](https://github.com/Cyan4973/xxHash) +_Content_Checksum_ verify that the full content has been decoded correctly. +The content checksum is the result of [xxHash-32 algorithm] digesting the original (decoded) data as input, and a seed of zero. Content checksum is only present when its associated flag is set in the frame descriptor. @@ -101,7 +100,7 @@ that all blocks were fully transmitted in the correct order and without error, and also that the encoding/decoding process itself generated no distortion. Its usage is recommended. -The combined __EndMark__ and __Content Checksum__ fields might sometimes be +The combined _EndMark_ and _Content_Checksum_ fields might sometimes be referred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes. __Frame Concatenation__ @@ -213,7 +212,7 @@ __Content Size__ This is the original (uncompressed) size. This information is optional, and only present if the associated flag is set. -Content size is provided using unsigned 8 Bytes, for a maximum of 16 HexaBytes. +Content size is provided using unsigned 8 Bytes, for a maximum of 16 Exabytes. Format is Little endian. This value is informational, typically for display or memory allocation. It can be skipped by a decoder, or used to validate content correctness. @@ -261,35 +260,48 @@ __Block Size__ This field uses 4-bytes, format is little-endian. -The highest bit is “1” if data in the block is uncompressed. +If the highest bit is set (`1`), the block is uncompressed. -The highest bit is “0” if data in the block is compressed by LZ4. +If the highest bit is not set (`0`), the block is LZ4-compressed, +using the [LZ4 block format specification](https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md). -All other bits give the size, in bytes, of the following data block -(the size does not include the block checksum if present). +All other bits give the size, in bytes, of the data section. +The size does not include the block checksum if present. -Block Size shall never be larger than Block Maximum Size. -Such a thing could happen for incompressible source data. -In such case, such a data block shall be passed in uncompressed format. +_Block_Size_ shall never be larger than _Block_Maximum_Size_. +Such an outcome could potentially happen for non-compressible sources. +In such a case, such data block must be passed using uncompressed format. + +A value of `0x00000000` is invalid, and signifies an _EndMark_ instead. +Note that this is different from a value of `0x80000000` (highest bit set), +which is an uncompressed block of size 0 (empty), +which is valid, and therefore doesn't end a frame. +Note that, if _Block_checksum_ is enabled, +even an empty block must be followed by a 32-bit block checksum. __Data__ Where the actual data to decode stands. It might be compressed or not, depending on previous field indications. -Uncompressed size of Data can be any size, up to “block maximum size”. -Note that data block is not necessarily full : -an arbitrary “flush” may happen anytime. Any block can be “partially filled”. + +When compressed, the data must respect the [LZ4 block format specification](https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md). + +Note that a block is not necessarily full. +Uncompressed size of data can be any size __up to__ _Block_Maximum_Size_, +so it may contain less data than the maximum block size. __Block checksum__ Only present if the associated flag is set. This is a 4-bytes checksum value, in little endian format, -calculated by using the xxHash-32 algorithm on the raw (undecoded) data block, +calculated by using the [xxHash-32 algorithm] on the __raw__ (undecoded) data block, and a seed of zero. The intention is to detect data corruption (storage or transmission errors) before decoding. -Block checksum is cumulative with Content checksum. +_Block_checksum_ can be cumulative with _Content_checksum_. + +[xxHash-32 algorithm]: https://github.com/Cyan4973/xxHash/blob/release/doc/xxhash_spec.md Skippable Frames @@ -386,6 +398,8 @@ and trigger an error if it does not fit within acceptable range. Version changes --------------- +1.6.2 : clarifies specification of _EndMark_ + 1.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer" 1.6.0 : restored Dictionary ID field in Frame header |