4.0581 BinHex, DOS and Two Icon Programs (1/393)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Wed, 10 Oct 90 16:39:33 EDT

Humanist Discussion Group, Vol. 4, No. 0581. Wednesday, 10 Oct 1990.

Date: Wed, 10 Oct 90 02:32:15 CDT
From: Richard Goerwitz <goer@sophist.uchicago.edu>
Subject: Binhex, DOS

Let me try to sort out some of the confusion that's been surrounding
discussion of arc, binhex, stuff, SGML, etc. for those who are new
at these games.

While it is nearly always possible to compress information, computers
allow especially easy application of well-known compression algorithms
to machine readable data. On Macs something called stuff is the most
popular compression utility. The MS-DOS world uses arc, zoo, zip,
and several others. These latter are much more than mere compression
programs. They are archivers. I have them (all three) implemented
under Unix as well. In addition, Unix users use pack and compress.
Compress is available for DOS. On Unix-based networks, one normally
uses a combination of an archiver (e.g. cpio, tar) with a compressor
(e.g. pack or compress) to archive binaries and directory trees for
public distribution. Tar has been implemented under MS-DOS, so files
so archived can be accessed by IBM PCers as well.

These compression programs are different from programs like binhex.
Binhex, apparently, is the way Mac devotees put information into a
format that can be transferred using 7-bit ASCII codes. Beware that,
although the format is probably not an arcane one, it is not one that
everyone will be able to deal with. Probably the most widely used
bin -> ASCII encoding program is the Unix uuencode program, which
takes groups of 3 8-bit sequences, and changes them into 4-byte
ASCII sequences. Some sites cannot handle uuencoding, so this is not
universal. There exist programs that do essentially the same thing
as uuencode, but yet avoid certain character codes that will tend
to get mangled when passing through EBCDIC sites. The most popular
of these is xxencode.

SGML has nothing to do wit binhex, stuff, and the like. It is a
specification for a method of outfitting files containing textual
information with markers to indicate coding scheme, structure, major
divisions, etc.

If you plan on posting publicly accessible code or information, think
very carefully about who will be using it. Will it be restricted to
Mac users, or will it be used by a larger group? If the latter is
the case, it would be sensible to use an encoding method which is im-
plemented on as wide a range of systems as possible. Probably this
means using uuencode. I have no idea whether any of the archivers
that MS-DOS and Unix people all have access to (e.g. tar, arc, zoo,
zip) have been implemented for the Mac. Mac users tend to be off in
their own corner as far as standards go, so it is possible that they
do not. If someone would fill us in on the details, I'd be very
grateful.

Appended below are two Icon programs that implement uuen/decode. Note
that this is source code. (It is very unwise to accept code that has
been prepackaged and compiled; kinda like taking drugs with secret,
and possibly harmful, ingredients.) Icon will run on Macs, PCs, under
Unix, VMS, on Ataris, and many more machines. It is, moreover, free,
and can be obtained via the icon-project@arizona.edu. Naturally, this
program is implemented in C as well, but I know that C compilers are
expensive in many instances (e.g. VMS), and (unless the code is par-
ticularly portable) often take considerable messaging in order to
accept code written in another environment. If you don't have Icon
on your current machine, get it! The price is right, and it is very
well suited for Humanists.

############################################################################
#
# Name: iiencode.icn
#
# Title: iiencode (port of the Unix/C uuencode program to Icon)
#
# Author: Richard L. Goerwitz
#
# Version: 1.3
#
############################################################################
#
# This is an Icon port of the Unix/C uuencode utility. Since
# uuencode is publicly distributable BSD code, I simply grabbed a
# copy, and rewrote it in Icon. The only basic functional change I
# made to the program was to simplify the notion of file mode.
# Everything is encoded with 0644 permissions. Operating systems
# differ so widely in how they handle this sort of thing that I
# decided just not to worry about it.
#
# Usage is the same as the Unix uuencode command, i.e. a first
# (optional) argument gives the name the file to be encoded. If this
# is omitted, iiencode just uses the standard input. The second and
# final argument gives the name the encoded file should be given when
# it is ultimately decoded:
#
# iiencode [infile] remotefilename
#
# BUGS: Slow. I decided to go for clarity and symmetry, rather than
# speed, and so opted to do things like use ishift(i,j) instead of
# straight multiplication (which under Icon v8 is much faster). Note
# that I followed the format of the newest BSD release, which refuses
# to output spaces. If you want to change things back around so that
# spaces are output, look for the string "BSD" in my comments, and
# then (un)comment the appropriate sections of code.
#
# NOTE ON MS-DOS: Operating systems for which newline translation is
# necessary will be able to decode text files only - that is, unless
# the code below is altered to open all files in "untranslate" mode.
# The trouble here is that if this change is made, text files will
# not come out looking right (they will remain in their native Unix
# format). I will leave it to the user to decide which alternative
# is the lesser of evils: 1) To be able to decode binary files, but
# have to manually convert text files to MS-DOS format, or 2) to have
# automatic conversion of text files, but not be able to read binary
# files (the way the code stands now).
#
############################################################################
#
# See also: iidecode.icn
#
############################################################################


procedure main(a)

local in, filename

# optional 1st argument
if *a = 2 then {
filename := pop(a)
if not (in := open(filename, "r")) then {
write(&errout,"Can't open ",a[1],".")
exit(1)
}
}
else in := &input

if *a ^^= 1 then {
write(&errout,"Usage: iiencode [infile] remotefile")
exit (2)
}

# This generic version of uuencode treats file modes in a primitive
# manner so as to be usable in a number of environments. Please
# don't get fancy and change this unless you plan on keeping your
# modified version on-site (or else modifying the code in such a
# way as to avoid dependence on a specific operating system).
writes("begin 644 ",a[1],"\n")

encode(in)

writes("end\n")
exit(0)

end



procedure encode(in)

# Copy from in to standard output, encoding as you go along.

local line

# 1 (up to) 45 character segment
while line := reads(in, 45) do {
writes(ENC(*line))
line ? {
while outdec(move(3))
pos(0) | outdec(left(tab(0), 3, " "))
}
writes("\n")
}
# Uuencode adds a space and newline here, which is decoded later
# as a zero-length line (signals the end of the decoded text).
# writes(" \n")
# The new BSD code (compatible with the old) avoids outputting
# spaces by writing a ` (see also how it handles ENC() below).
writes("`\n")

end



procedure outdec(s)

# Output one group of 3 bytes (s) to standard output. This is one
# case where C is actually more elegant than Icon. Note well!

local c1, c2, c3, c4

c1 := ishift(ord(s[1]),-2)
c2 := ior(iand(ishift(ord(s[1]),+4), 8r060),
iand(ishift(ord(s[2]),-4), 8r017))
c3 := ior(iand(ishift(ord(s[2]),+2), 8r074),
iand(ishift(ord(s[3]),-6), 8r003))
c4 := iand(ord(s[3]),8r077)
every writes(ENC(c1 | c2 | c3 | c4))

return

end



procedure ENC(c)

# ENC is the basic 1 character encoding procedure to make a char
# printing.

# New BSD code doesn't output spaces...
return " " ^^== char(iand(c, 8r077) + 32) | "`"
# ...the way the old code does:
# return char(iand(c, 8r077) + 32)

end


############################################################################
#
# Name: iidecode.icn
#
# Title: iidecode (port of the Unix/C uudecode program to Icon)
#
# Author: Richard L. Goerwitz
#
# Version: 1.4
#
############################################################################
#
# This is an Icon port of the Unix/C uudecode utility. Since
# uudecode is publicly distributable BSD code, I simply grabbed a
# copy, and rewrote it in Icon. The only basic functional change I
# made to the program was to simplify the notion of file mode.
# Everything is encodedwith 0644 permissions. Operating systems
# differ so widely in how they handle this sort of thing that I
# decided just not to worry about it.
#
# Usage is the same as the Unix uudecode command, i.e. a first
# (optional) argument gives the name the file to be decoded. If this
# is omitted, iidecode just uses the standard input:
#
# iidecode [infile]
#
# Even people who do not customarily use Unix should be aware of
# the uuen/decode program and file format. It is widely used, and has
# been implemented on a wide variety of machines for sending 8-bit
# "binaries" through networks designed for ASCII transfers only.
#
# BUGS: Slow. I decided to go for clarity and symmetry, rather than
# speed, and so opted to do things like use ishift(i,j) instead of
# straight multiplication (which under Icon v8 is much faster).
#
# NOTE ON MS-DOS: Systems for which newline translation is necessary
# can encode files. The problem is that, since iiencode sends coded
# files to the standard output, it is impossible to avoid sending out
# OS-specific code at the end of each line. While most uudecode
# programs will be able to handle the resulting file, they will not
# always decode the filename properly. Binary files simply won't
# work, unless the program is modified to write to a file instead of
# the standard output. If you do this, make sure you open the file
# for writing in untranslated mode.
#
############################################################################
#
# See also: iiencode.icn
#
############################################################################

procedure main(a)

local in, filename, dest

# optional 1st (and only) argument
if *a = 1 then {
filename := pop(a)
if not (in := open(filename, "r")) then {
write(&errout,"Can't open ",a[1],".")
exit(1)
}
}
else in := &input

if *a ^^= 0 then {
write(&errout,"Usage: iidecode [infile] remotefile")
exit (2)
}

# Find the "begin" line, and determine the destination file name.
!in ? {
tab(match("begin ")) &
tab(many(&digits)) & # mode ignored
tab(many(' ')) &
dest := trim(tab(0),'\r') # concession to MS-DOS
}

# If dest is null, the begin line either isn't present, or is
# corrupt (which necessitates our aborting with an error msg.).
if /dest then {
write(&errout,"No begin line.")
exit(3)
}

# Tilde expansion is heavily Unix dependent, and we can't always
# safely write the file to the current directory. Our only choice
# is to abort.
if match("^^",dest) then {
write(&errout,"Please remove ^^ from input file begin line.")
exit(4)
}

out := open(dest, "w")
decode(in, out) # decode checks for "end" line
if not match("end", !in) then {
write(&errout,"No end line.\n")
exit(5)
}
exit(0)

end



procedure decode(in, out)

# Copy from in to out, decoding as you go along.

local line, chunk

while line := read(in) do {

if *line = 0 then {
write(&errout,"Short file.\n")
exit(10)
}

line ? {
n := DEC(ord(move(1)))

# Uuencode signals the end of the coded text by a space
# and a line (i.e. a zero-length line, coded as a space).
if n <= 0 then break

while (n > 0) do {
chunk := move(4) | tab(0)
outdec(chunk, out, n)
n -:= 3
}
}
}

return

end



procedure outdec(s, f, n)

# Output a group of 3 bytes (4 input characters). N is used to
# tell us not to output all of the chars at the end of the file.

local c1, c2, c3

c1 := iand(
ior(
ishift(DEC(ord(s[1])),+2),
ishift(DEC(ord(s[2])),-4)
),
8r0377)
c2 := iand(
ior(
ishift(DEC(ord(s[2])),+4),
ishift(DEC(ord(s[3])),-2)
),
8r0377)
c3 := iand(
ior(
ishift(DEC(ord(s[3])),+6),
DEC(ord(s[4]))
),
8r0377)

if (n >= 1) then
writes(f,char(c1))
if (n >= 2) then
writes(f,char(c2))
if (n >= 3) then
writes(f,char(c3))

end



procedure DEC(c)

# single character decode
return iand(c - 32, 8r077)

end