Jump To Content

Username:
Password:

Register
-176d 0h 35m 42s left

Stupid compression tricks.

Posted by nuxi on 2008-Oct-07 at 21:55:06 in Computers (Login to reply)

A few years ago I wrote a program that generates a highly compressed PNG file. Something on the order of a ratio of 1,000 to 1. This was all fun and games, but the output files I have (I lost the source code) don't do much of anything to modern computers. 1 GB just isn't very impressive by today's standards. So I took it to the next level.

Using bzip2, I wrote an app that does a ratio of about 1,800,000 to 1. Strangely enough 0x00 is not the optimal byte to compress with bzip2, 0xFB is. This is due to the first pass of bzip2 being RLE. Any sequence over 4 bytes long is converted into 4 bytes followed by a length byte. The highest value that can appear as the length byte is 0xFB (251). Obviously this means any stream of 0x00 is converted to a sequence of 00 00 00 00 FB, which is less than ideal. Streams of 0xFB on the other hand are converted to FB FB FB FB FB.

Bzip2 also compresses in blocks with each block containing up to 45,899,235 bytes. A stream of 0xFB compresses down to blocks 26-bytes long. I can pre-compute the checksum of a single block and just dump it to a file repeatedly! Thus is born my latest toy program.

1 TB file

Source code

Some day I may rewrite my app that makes giant PNG files and actually put some thought into how the deflate compression algorithm works and what uncompressed PNG chunks look like. I may be able to improve on the 1,000 to 1 compression ratio I obtained with the original.