4.0570 SGML, Markup, Empiricism (5/120)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Fri, 5 Oct 90 14:12:02 EDT

Humanist Discussion Group, Vol. 4, No. 0570. Friday, 5 Oct 1990.


(1) Date: Thu, 04 Oct 90 07:48:54 EDT (27 lines)
From: Willard McCarty <MCCARTY@vm.epas.utoronto.ca>
Subject: A beginner's guide to SGML?

(2) Date: 4 October 90, 00:05:57 CDT (27 lines)
From: Bill Ball <C476721@UMCVMB>
Subject: markup and word processing

(3) Date: Thu, 04 Oct 90 14:51:30 +0100 (31 lines)
From: Dominik Wujastyk <ucgadkw@ucl.ac.uk>
Subject: sed script

(4) Date: Thu, 4 Oct 90 01:53:32 EDT (26 lines)
From: Sheizaf Rafaeli <USERLLHB@UMICHUB.BITNET>
Subject: Markup vs. WYSIWYG

(5) Date: Fri, 5 Oct 90 08:22:58 EDT (9 lines)
From: kma@cosy.uoguelph.ca
Subject: Planning Campus

(1) --------------------------------------------------------------------
Date: Thu, 04 Oct 90 07:48:54 EDT
From: Willard McCarty <MCCARTY@vm.epas.utoronto.ca>
Subject: A beginner's guide to SGML?

As it happens, I am now faced with having to encode some texts in the
simplest possible way (so as to mark accented characters) and may soon
be faced with having to encode other texts for other basic features, or
at least to recommend how this should be done. My situation causes me to
wonder about two things.

(1) For a text intended only to be read by humans for the information it
contains, how should accented characters be encoded? In this case, it
seems to be that the most minimal markup possible (e.g., u" or "u for
u-umlaut) is what one wants, yes? For texts to be sent over networks to
various countries, must one look out for the possibility that an ASCII
character handy for indicating an accent (e.g., `, the leading single
quotation mark) will be coopted by software for another purpose? If so,
what should I do?

(2) Does there exist anywhere, preferably in electronic form, a
beginner's guide to simplest TEI-conformant SGML? One that specifies --
in less than 5 pages or 10K bytes, and in the most straightforward
manner -- what we should do to achieve minimalist markup? May we hope
for such a thing soon? Free of charge, on Humanist?


Willard McCarty
(2) --------------------------------------------------------------31----
Date: 4 October 90, 00:05:57 CDT
From: Bill Ball C476721 at UMCVMB
Subject: markup and word processing

My thoughts on standardized text formats run to more radical solutions
than have been discussed so far in the thread on TEI & SGML. I
generally see mainframes as communication and network server machines.
Many of us are using ever more powerful PCs to read our Bitnet mail
already. The day when PCs need be limited to ASCII is rapidly passing.
Why not move toward something like display postscript for a document
standard? Upgraded networks like NREN can handle the traffic. Current
high end PCs can handle the display (the NEXT already uses display
postscript). An advanced standard like this could display and print
documents, with color illustrations, in a form extremely close to the
original. Proper software can mark, index, analyze, and do whatever with
text regardless of the form it is in. This software is much more likely
to get written, and get distributed at reasonable cost, if it works on a
format of interest in general (i.e. business) computing. It seems to me
that in general we are letting the relatively low state of the art in
mainframe WP software set unreasonably severe limits on document
standards when much more is already within reach.

my $ 0.02 worth,

Bill Ball
c476721@UMCVMB

(3) --------------------------------------------------------------45----
Date: Thu, 04 Oct 90 14:51:30 +0100
From: Dominik Wujastyk <ucgadkw@ucl.ac.uk>
Subject: sed script

The sed script that was recently mentioned as a way of stripping
SGML strings from a file suffers from what I see as the main
problem with SED, AWK, PERL and even (gulp) ICON. They are
all profoundly line-oriented.

If you have a file like this:

This is a test <file just to see> if the SGML <strings
do indeed get stripped> or not.

And you run the cited SED commands on it (sed 's/<.*>//g' in >out)
you will get this:

This is a test if the SGML <strings
do indeed get stripped> or not.

This is no good, of course. I expect that there is a simple way around
it (any offers? I hardly know SED). But it would be refreshing to come
across one of these much-vaunted tools that didn't see a file as
primarily a set of lines. HUMANISTS hardly ever have texts with tags
that are confined to single lines. Try to think of one, in a normal
text: quotes? underlining? citations? sentences? words? None of these
can safely be assumed not to be broken across a line boundary.

Dominik


(4) --------------------------------------------------------------32----
Date: Thu, 4 Oct 90 01:53:32 EDT
From: Sheizaf Rafaeli <USERLLHB@UMICHUB.BITNET>
Subject: Markup vs. WYSIWYG

Leslie Burkholder asks for studies of text processing approaches.
Two good places to start are

B. Shneiderman _Designing the User Interface_, Addison Wesley, 1987
and Norman and Draper (eds) _User Center System Design_, Lawrence
Erlbaum, 1986.

There have been later empirical studies, but these two are the
most comprehensive reviews I know.

Your insistence¨D on empirical studies is understandable. How about
asking about the theory guiding hypothesis generation?

...And, is it too dangerous to mention Halio in this forum?


Sheizaf Rafaeli
School of Business Administration
University of Michigan
Sheizaf@UMICHUB or Sheizaf_Rafaeli@ub.cc.UMICH.edu
or 71271,763 on Compuserve or (313) 763 2373

(5) --------------------------------------------------------------18----
Date: Fri, 5 Oct 90 08:22:58 EDT
From: kma@cosy.uoguelph.ca
Subject: Planning Campus

I think Burkholder is just pointing to another major
computers-in-education project .... to 'even things out', as he says. I
will dig out the details for the EDUCOM review and enter here .... or
perhaps I'll do some photocopying and send to Charles Ess by standard
mail ? ? ken .....