ftp.delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2002/04/29/18:17:29

X-Authentication-Warning: delorie.com: mailnull set sender to djgpp-bounces using -f
Message-ID: <3CCD7E35.20A38E1E@acm.org>
From: Eric Sosman <esosman AT acm DOT org>
X-Mailer: Mozilla 4.72 [en] (Win95; U)
X-Accept-Language: en
MIME-Version: 1.0
Newsgroups: comp.os.msdos.djgpp
Subject: Re: how to determine if a file is text/binary
References: <c21a43ff DOT 0204291216 DOT 53eaf67c AT posting DOT google DOT com>
Lines: 28
Date: Mon, 29 Apr 2002 22:08:47 GMT
X-Complaints-To: abuse AT worldnet DOT att DOT net
X-Trace: bgtnsc04-news.ops.worldnet.att.net 1020118127 (Mon, 29 Apr 2002 22:08:47 GMT)
NNTP-Posting-Date: Mon, 29 Apr 2002 22:08:47 GMT
Organization: AT&T Worldnet
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

xeon wrote:
> Hi,
> I'm wondering, how to determine is a file is a text file, or a binary
> file, programatically. I'm thinking about reading 4 bytes from the
> file and test them if they're in the range of usual text ([a-z],
> [A-Z], etc. The 4 bytes is read from the following locations : 1st
> byte, last byte, and 2 randomly selected offset inside the file. Is
> this enough?

    Not really.  The fundamental problem is in formulating precise
definitions of "text file" and "binary file:" try to do so and you'll
quickly discover the kinds of trouble you'll get into.

    For example, is a file containing "abc\n" a text file of one
three-letter newline-terminated line, or is it a binary file
storing the number 0x6162630a ==  1633837834?  Or 'tother way
round, if you find a byte with the high bit set are you looking
at a binary file or at a text file containing the character ""?

    That said, you can make a guess of sorts, although you'll never
be 100% accurate.  Take a look at the source of the "file" program
for some ideas.

Eric Sosman
esosman AT acm DOT org

- Raw text -

  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019