Mail Archives: djgpp/1994/10/30/15:55:59

ftp.delorie.com/archives/browse.cgi
search
Mail Archives: djgpp/1994/10/30/15:55:59
Date: Mon, 31 Oct 94 03:56:42 JST

From: Stephen Turnbull <turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp>

To: eliz AT is DOT elta DOT co DOT il

Cc: djgpp AT sun DOT soe DOT clarkson DOT edu

Subject: stat()/fstat() for DJGPP, v.02
While I agree with Eli's responses to Morten Welinder's comments about
stat()/fstat(), I'd like to make some comments based on my own
experience and system configuration.

   EZ> 5. Directory size is not reported zero by stat(); the number
   EZ>    of used directory entries (sans the ``.'' and ``..''
   EZ>    pseudo-entries) multiplied by entry size is returned
   MW> This could be expensive and is misleading.  If you create a
   MW> directory with 1000 files then delete them all, the size of
   MW> the directory should not change.

How can this be more misleading than DOS's normal approach of showing
directory sizes as 0?  If you add 1000 files to a directory, its size
*should* change.  This is necessarily going to happen more often than
the case Morten describes!  The only time I can see this as misleading
is in something like du, and a du using Eli's stat() is always going
to give a better approximation than one using DOS functions for files'
sizes.

   Not too expensive, as my experience shows (stat() is already
   an expensive function).  Misleading? not entirely.  For
   regular files, sizes are also reported as only the number of
   *used* bytes they hold; the last cluster (may be as large as
   16KB) is usually incomplete, but this doesn't bother us.  It
   is true that rewriting a file usually returns unused clusters
   to the system, while deleting files in a directory doesn't,
   but to report this slack part of the directory is *indeed*
   expensive (you must work on BIOS level and read the FAT for
   this), and is totally impossible on networked drives.  So,

Really?  I guess that the raw disk-reading functions are BIOS-level
and wouldn't be available for network drives, but it's not the FAT you
need to read, it's a regular file with the directory bit set (except
for the root directory, and doing stat()s on the root directory can't
be that common).  Right?  So couldn't you try some dodge like
resetting the directory attribute (I suppose this might also require
BIOS-level functions) and reading the raw directory data?  I have no
idea if something like that would work, but it's weird enough that it
might.  (It also looks very unsafe.  And expensive---two extra disk
operations to reset and set the directory attribute.)

   I chose a (hopefully useful) compromise.  After all,

Be that as it may, I'm not sure I agree with this compromise.  One can
imagine a program that compares directory statistics, and runs the
defragger based on (among other things) the number of directories that
are much bigger than their file count justifies.

   directories with a large number of unused entries are rare
   (unless you didn't run your favorite defragger for 5 years
   or so ;-)

Until I repartitioned my disk, I had 16KB clusters *and* always had at
least one directory with a couple dozen KB of unused entries: my
Ghostscript build directory after 'make clean'.  I would guess that
people who do a lot of beta testing of such large programs would have
several such directories.  So in terms of "will you find such a
problem on a given system," they're common.  Of course, at that time I
had about 2000 directories and at most 3 such beta test directories.
In terms of percentage of directories with such a problem, .015% is
going to be pretty high for most lusers.  :-)
    I didn't run a defragger for a long time because my favorite one
was Norton Speedisk, which chokes on big drives (this is an old
version, about Norton Utilities 5.0).

   EZ>   3. I don't know how to obtain time fields for root directories,
   MW> You could use the volume label as a better fall back.  Also, I

   A disk is not required to have a label; in fact, most floppies
   don't have one.  Even if a label is present, it can easily be
   changed, thus changing its time stamp.  In my view, this makes
   the label method unreliable.

A lot of DOSes (well, IBM's, anyway) automatically put serial numbers
on floppies.  I believe this is done using the volume label bit, but
I'm not sure.  As for lack of reliability....

   MW>                                                          Also, I
   MW> think there are some time stamp in the boot record.  Semi-expensive

   AFAIK, there is no time stamp in the boot record, but if you know
   otherwise, please tell me where in the boot record it dwells.

In my small experience, boot records are more likely to change than
volume labels!  (For hard drives, anyway.)  Most of my colleagues
still have disks labelled "MSDOS_5" or the like.  Most of them have
some sort of multiboot utility installed after the initial
installation.  (I don't know how common this is outside of Oriental
countries; here average users need multiboot because Japanese DOS and
English DOS don't like each other's programs very much.)
    I assume that the rationale for setting the date of the root
directory to Anno Gatesii 0 is that the root directory is the earliest
object created in most file systems.  But this is not universal.  For
example, the MSDOS 5 file system is incompatible with that of MSDOS 4.
So to upgrade, one would probably back up one's system, then restore
after reformatting the hard drive and reinstalling DOS.  The root
directory then is younger than anything but new system files.
    On Unix, this can happen even without backup and restore.  I've
been doing a lot of fiddling with my Linux system.  What I have done
to minimize backup/restore cycles is to create 5 partitions: ROOT,
ROOT-TEST, USR, USR-TEST, and HOME.  ROOT contains my current working
system's minimal boot and system repair utilities, ROOT-TEST the
corresponding new installation.  Now, typically the new installation
doesn't include lots of the utilities I use, so I mount the USR-TEST
partition on /usr and the USR partition on /stable-usr, and often
*everything* in /stable-usr is older than /.  (HOME of course contains
the directories where I do my non-system work.)
    Simpler than that would be having the superuser 'touch /', which
ought to work.  (I haven't tried it, but why not?)  This could happen
as a typo....
    If you want to stick with the "oldest file" rationale, it might be
useful to consider setting the default date to Anno Unixii 0, since
Linuxers using the UMSDOS file system *YUCK* may be able to read their
Unix files from MSDOS (shudder, talk about security holes).  Which
leads to the rather bizarre concept that somebody could get hold of an
old PDP-11 tar, and restore a Unix file system from before DOS was
born to their MS-DOS disk.  I guess this is pretty silly, and pre-1980
files are going to be far rarer than directories with 500 deleted file
entries....  Although one can imagine someone touch'ing a file to such
a date (why, I don't know, and why they wouldn't want to touch to a
"before Unix" date I can't figure out either).
    Given all this, it's not clear to me what the meaning of
"reliability" of the root directory's date might be.  I don't find the
date of the volume label to be at all implausible, assuming it exists.
In some sense it's the last major change the user has made to the root
directory.  (Eg, when I reuse a floppy I typically 'del /fsxyz *.*' and
change the volume label---this is much less obstructive when running
DESQview/X than a reformat.)
    One way to deal with it might be to set up the sources so that
individuals could create their own preferred order of checking the
various alternatives easily.  I think that it's probably a good idea
to make the library f?stat as standard as possible, but if there's
some reason the date of / can matter, it ought to be possible to alter
the default behavior.
- Raw text -
webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019
Date:	Mon, 31 Oct 94 03:56:42 JST
From:	Stephen Turnbull <turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp>
To:	eliz AT is DOT elta DOT co DOT il
Cc:	djgpp AT sun DOT soe DOT clarkson DOT edu
Subject:	stat()/fstat() for DJGPP, v.02