Repairing the Compression Library

Looking back at old code, particularly code that was written as part of a learning process, always inspires a sense of “What the hell was I thinking?”. I knew that my initial “design by convention” API had holes in the logic, but I didn’t know that the algorithms themselves had bugs in them. Library code gets tested in two ways. First, the base algorithms themselves get tested, usually, during the initial code development. The APIs themselves get tested by writing test code or application code that calls the functions stored in the library.

The test code for my compression library was not nearly as robust as it should have been. Although on one level I knew this, I figured that I’d fix it some time later. Aborted work on my GIF image file format plug-in brought that future to the present, so now I have to spend time repenting for past coding sins.

My compression library contains numerous algorithms, categorized as: { Lossless or Lossy } style, { Byte, Bit, N-Bit } data organization, optional 2-D image awareness and whether the compressor can pre-measure the size of the resulting compressed data stream. The algorithms that I’ve written have been written according to spec from various sources. Some are currently protected by patents. The algorithms are:

  • RLE (Run Length Encoding)
  • Bitwise RLE
  • PackBits
  • PackBits-16
  • Delta Row
  • Huffman
  • Adaptive Huffman
  • Arithmetic Encoding
  • Adaptive Arithmetic Encoding
  • Deflate
  • LZ77
  • LZW

Of these, arithmetic encoding and its adaptive cousin are currently protected by patents. PackBits and PackBits-16 is used by TIFF files and embedded PCL rasters. RLE is also used by various image file formats and embedded PCL and PCL/XL rasters. Delta Row is used by embedded PCL/XL rasters and may be protected by a patent. Huffman and Adaptive Huffman are very old algorithms that are still in use in a wide variety of applications, such as JPEG encoding. Deflate is used by various data compression tools, such as PKZIP. LZ77 is the initial dictionary-based algorithm. LZW is a fine-tuned version and is used in GIF file formats.

There are other data compression algorithms out there. All the algorithms that I’ve written so far have focused on lossless compression of static data. DCT and JPEG seek to perform lossy compression of static and image data. The MPEG algorithms seek to compress dynamic audio and video data.

So far, I’ve repaired the RLE and PackBits algorithms and am currently working on the Delta Row algorithm. The RLE algorithm had a pretty simple bug that prevented it from decompressing properly. That’s been fixed. The Delta Row algorithm design was horribly flawed. It is now properly compressing (I think) and doing decompression measurements. I’ll be working on fixing its decompression next.

Comments are closed.