Lossless Data Compression

Some lossless data compression algorithms are available in botan, currently all via third party libraries - these include zlib (including deflate and gzip formats), bzip2, and lzma. Support for these must be enabled at build time; you can check for them using the macros BOTAN_HAS_ZLIB, BOTAN_HAS_BZIP2, and BOTAN_HAS_LZMA.

Note

You should always compress before you encrypt, because encryption seeks to hide the redundancy that compression is supposed to try to find and remove.

Compression is done through the Compression_Algorithm and Decompression_Algorithm classes, both defined in compression.h

Compression and decompression both work in three stages: starting a message (start), continuing to process it (update), and then finally completing processing the stream (finish).

class Compression_Algorithm
void start(size_t level)

Initialize the compression engine. This must be done before calling update or finish. The meaning of the level parameter varies by the algorithm but generally takes a value between 1 and 9, with higher values implying typically better compression from and more memory and/or CPU time consumed by the compression process. The decompressor can always handle input from any compressor.

void update(secure_vector<uint8_t> &buf, size_t offset = 0, bool flush = false)

Compress the material in the in/out parameter buf. The leading offset bytes of buf are ignored and remain untouched; this can be useful for ignoring packet headers. If flush is true, the compression state is flushed, allowing the decompressor to recover the entire message up to this point without having the see the rest of the compressed stream.

class Decompression_Algorithm
void start()

Initialize the decompression engine. This must be done before calling update or finish. No level is provided here; the decompressor can accept input generated by any compression parameters.

void update(secure_vector<uint8_t> &buf, size_t offset = 0)

Decompress the material in the in/out parameter buf. The leading offset bytes of buf are ignored and remain untouched; this can be useful for ignoring packet headers.

This function may throw if the data seems to be invalid.

The easiest way to get a compressor is via the functions Compression_Algorithm::create and Decompression_Algorithm::create which both accept a string argument which can take values include zlib (raw zlib with no checksum), deflate (zlib’s deflate format), gzip, bz2, and lzma. A null pointer will be returned if the algorithm is unavailable.

Two older functions for this are

Compression_Algorithm *make_compressor(std::string type)
Decompression_Algorithm *make_decompressor(std::string type)

which call the relevant create function and then release the returned unique_ptr. Avoid these in new code.

To use a compression algorithm in a Pipe use the adapter types Compression_Filter and Decompression_Filter from comp_filter.h. The constructors of both filters take a std::string argument (passed to make_compressor or make_decompressor), the compression filter also takes a level parameter. Finally both constructors have a parameter buf_sz which specifies the size of the internal buffer that will be used - inputs will be broken into blocks of this size. The default is 4096.