ftp.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1996/11/09/19:26:17

Message-ID: <32851B7C.66E2@ananke.amu.edu.pl>
Date: Sun, 10 Nov 1996 01:02:04 +0100
From: Mark Habersack <grendel AT ananke DOT amu DOT edu DOT pl>
Reply-To: grendel AT ananke DOT amu DOT edu DOT pl
Organization: Home, sweet home
MIME-Version: 1.0
To: George Foot <mert0407 AT sable DOT ox DOT ac DOT uk>
CC: djgpp AT delorie DOT com
Subject: Re: Why not to use 'tar' before packing DJGPP?
References: <32823D97 DOT 44DD AT sabat DOT tu DOT kielce DOT pl> <3282A82E DOT 7EE7 AT cs DOT com> <55vapk$s4l AT news DOT ox DOT ac DOT uk> <babcock DOT 847510845 AT cybercom DOT net> <561pv7$36c AT news DOT ox DOT ac DOT uk>

George Foot wrote:

> Sorry, I don't really understand tar (yes, I'm a Dos user...), but I
tar is not a compression utility, it stands for Tape ARchiver and was
primarily designed to hold contents of tapes used in the early days of
computing (and sometimes till today) as a storage media. As such, tar
was meant to be the exact image of the data on tape. 

> thought the point of the original article was that tar could achieve
> better compression ratios than zip? The quoted figures certainly looked
> impressive...
The reason of better compression ratios of tar archives is very simple.
LZW compression which (or rather its modification) is used by PKZIP and
some other archivers uses something called a "sliding dictionary". It's
a structure that holds pairs of data: pattern and its numerical code.
When the compressor reads the input stream it looks up the dictionary to
whether the just-read pattern already ocurred in the previously read
stream of data. If so, then the pattern is replaced with the
corresponding code read from dictionary - thus you have just compressed
the input data. This is a simplified description of the LZW algorithm,
but it's enough to understand what follows. If the archiver compresses
several files, it resets its dictionary every time a new file is opened
for reading. This way all the patterns are lost and the compressor has
to create the disctionary from scratch - this of course reduces the
compression ratio as, at many times, the patterns have to be created
anew although they ocurred, say, two files before. OTOH, when the
archives compresses one file, like e.g. a tar archive, it doesn't reset
the dictionary that often and there is less probability that the same
patterns will be created many times. Thus the compression ratio
increases at about 30%.
Hope this will clarify things a little

greetings, mark

-- 
**************************************************************************
You tell me I'm drunk then you sit back and smug a while convinced that
you're right, that you're still in command of your senses. I laugh at
your superior attitude, your insincere platitudes will make me throw up.
The sooner you realise I'm perfectly happy if I'm left to decide the
company I choose.
********************** http://ananke.amu.edu.pl/~grendel
*****************


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019