How To Improve An Algorithm

Algorithm improvement begins with good test procedures. My early test procedures relied on fairly small buffers of mostly common data. I used these to test best-case and edge-case conditions. However, the data was not real. Once I included about 580K of raw image data to compress, I was able to better test the throughput and integrity of the various compression algorithms.

This was the test the showed me that my implementation of the LZ77 algorithm was too inefficient to use. Once I discovered the weakness of the algorithm, I was able to build the IndexDictionary in order to improve the data processing times.

The following table illustrates the gain in throughput. My implementation of LZ77 uses two parameters to define the compression settings. The first is the number of bits in the token that defines the maximum offset, or the size of the sliding window, and the second is the number of bits in the token that defines the maximum common phrase size. The final token size is the sum of these two parameters.

Settings Original Implementation Index Dictionary Notes
6+2 6 sec. 1 sec. 64 byte window, 4 byte max. phrase size
9+3 42 sec. 5 sec. 512 byte window, 8 byte max. phrase size
12+4 241 sec. 28 sec. 4096 byte window, 16 byte max. phrase size
15+5 n/a 223 sec. 32,768 byte window, 32 byte max. phrase size

While the new implementation is certainly much faster, some of the edge-case tests are failing, validating their importance as well.

Update: the errors were caused by invalid code the BitPipeline Flush() method when flushing the final partial byte of a bit stream.

Comments are closed.