Date: 15/08/2006
www.2600uk.com
Introduction
Until recently, storage media was fairly restrictive in the amount of data it could hold. A 50-megabyte file, that would now happily sit on a memory chip only millimetres across, would have taken at least 35 floppy discs. To this end, computer technicians developed methods to reduce the amount of space that files take up on disc. By compressing the data, much more information can be stored without the need for multiple or expensive storage media.
Compression
Often known as “zipping” or “Raring”, special programs can be used to reduce the overall file size through a method known as “loss less compression”. By removing any redundant information in a file, the total size is reduced but the original state of the file can be restored via decompression. In comparison, many graphics formats, such as JPEG, use a method of compression known as “lossy”, where the information discarded can never be returned.
The LZ Scheme
Most file compression tools use a process known as the LZ Scheme, named after its creators, Lempel and Ziv. The LZ method uses what is known as an “adaptive dictionary-based” algorithm. As the compression tool reads the file to be reduced, it scans for recurring patterns in the data and writes then to a dictionary. This dictionary is included with the compressed file so that it may be fully decompressed to its original state.
The LZ method creates pointers in the file to the relevant dictionary entry. As the method is adaptive, it will constantly look for new and more efficient ways to build its dictionary based on the patterns it finds. Lets take the following sentence as an example.
She sells seashells on the seashore
This phrase as it stands would take up 35 bytes. By applying the logic of LZ compression, we can reduce this size down to a smaller 29 bytes by creating efficient pointers to recurring patterns:
Phrase
S1s232 on t13ore
Dictionary
1. he[space]
2. ells[space]
3. seash
Whether the target file is compressed by a lot or a little is all down to how many patterns it holds. Databases and word processing documents will have a greater chance of size a reduction, as they are more prone to redundancies and patterning than, for example, an image. Likewise, the larger/longer the original file, the greater the chance for redundancies and thus a higher saving will result.
As your application reads the compressed data from the drive, it will decompress the file by including the pattern required each time it encounters a pointer to a relevant dictionary position.
15/08/2006 Biomech www.2600uk.com
EOF
