18.571 acronyms, text, markup

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Sun, 6 Feb 2005 10:10:12 +0000

               Humanist Discussion Group, Vol. 18, No. 571.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

   [1] From: Norman Hinton <hinton_at_springnet1.com> (12)
         Subject: Re: 18.567 Phonological Interpretation (Acronyms,
                 Markup)

   [2] From: lachance_at_origin.chass.utoronto.ca (Francois (118)
                 Lachance)
         Subject: Re: 18.560 acronyms, text, writing

--[1]------------------------------------------------------------------
         Date: Sun, 06 Feb 2005 10:06:28 +0000
         From: Norman Hinton <hinton_at_springnet1.com>
         Subject: Re: 18.567 Phonological Interpretation (Acronyms, Markup)

Well, I broke my rule of not reading notes in which each line ends with
"=20": but I must say I disagree with most of your suggestions.

Your characterization of linguists who proceed only from phonology to
morphology etc. may have been correct in the days of structural
linguistics, but those days are long gone -- some 40 years into the past.

You distinctions between "Internet acronyms" and other acronyms don't make
a great deal of sense: if acronyms ae not formed from the initial letters
of words, they are not acronyms. And it makes little difference how they
are pronounced.

BTW (as they say), I don't know what this means:

>Even "URL," which is semantically=20
>uncomplicated as opposed to the other acronyms, is officially spelled out=20
>but often pronounced as a word. Its phonological status isn't clear.

--[2]------------------------------------------------------------------
         Date: Sun, 06 Feb 2005 10:07:23 +0000
         From: lachance_at_origin.chass.utoronto.ca (Francois Lachance)
         Subject: Re: 18.560 acronyms, text, writing

Willard,

Where is the "natural" in language? What is ever "plain" about text?
Those intervening in the "plain text" thread might wish to examine Chapter 7
"Foreign Languages and Non-Roman Text" of Rusty Harold's _XML Bible_ . The
chapter provides an excellent set of definitions and examples for the
terms: script, character set, fonts and glyph. I underline to the title of
the chapter and also recall that ASCII is an acronym marking the
intersection of a technological development and a particular geo-political
formation. Some of the "plain text" thread has appeared to be putatively
about the superiority of one character set over another -- with all the
examples being in the same script, i.e. Roman.

It is a pity that Rusty's chapter was not entitled: "Languages Other Than
English and Non-Roman Scripts". However, Rusty's book does not reproduced
the commonly encountered English use of "ASCII" as a synonym for "text" or
the evenly more common synedoche of using "text" to mean "ASCII".

<quote>
[...]As you learned in Chapter 6, an XML document is divided into text and
binary entities. Each text entity has an encoding. If the encoding is not
explicitly specified in the entity's definition, then the defalut is UTF-8
--- a compressed form of Unicode which leaves pure ASCII text unchanged.
Thus XML files that contain nothing but the common ASCII characters may be
edited with tools that are unaware of the complications of dealing with
multi-byte character sets like Unicode.
<cit>p. 169</cit>
</quote>

In Chapter 6, Rusty's book distinguishes "text" and "binary" entities, not
on the basis of type of encoding (ASCII vs Unicode) but on the basis of
data type:

<quote>
XML documents are made up of storage unites called <hi>entitites</hi>.
Each entity contains either text or binary data, never both. Text data is
comprised of characters. Binary data is used for images and applets and
the like.
<cit>p. 134</cit>
</quote>

Bytes that represent characters; bytes that represent other things than
characters. Still that binary/text pair. Where did it come from?
Many subscribers to Humanist will recall that many ftp applications with
GUI interfaces offer radio buttons so that users mchoose between "binary"
and "text". With a command line interface, the user input is "A" or "B"
--- ASCII or binary. Many other subscribers to Humanist will recall
"BinHex".

But how comes it to be that there is an appellation of binary and
non-binary files when both are composed of bits? Ah, the intersection of
naming and counting. <quote><hi>A binary file</hi> uses all eight of the
bits in it; in a non-binary file, the top bit is set to
zero.<cit>Maclopedia, "Encoding Files"</cit></quote> Use and behaviour
influence naming. Yes there are non-binary files that do not start out as
ASCII: converted from 8-bit (binary) to 7-bit (non-binary) "to encode any
type of file, including word processing, graphics, spreadsheets, and
software applications." Readers of manuals will find the shipping of such
converted non-binary filesreferenced as "text-compatible transmission".

And so I invite readers to consult the computing dictionary of their
choice and look up the following terms:

byte
word
text

My particular favourite is the entry for "text" at Free Online Dictionary
of Computing
http://wombat.doc.ic.ac.uk/foldoc/index.html

<quote>
     1. Executable code, especially a "pure code" portion shared between
     multiple instances of a program running in a multitasking operating
     system.

     Compare English.

     2. Textual material in the mainstream sense; data in ordinary ASCII or
     EBCDIC representation (see flat ASCII). "Those are text files; you can
     review them using the editor."

     These two contradictory senses confuse hackers too.
<cit>
http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=text&action=
Accessed: Sat, 05 Feb 2005 14:42:16 GMT </cit>
</quote>

The reference to "English":
<quote
     1. (Obsolete) The source code for a program, which may be in any
     language, as opposed to the linkable or executable binary produced
     from it by a compiler. The idea behind the term is that to a real
     hacker, a program written in his favourite programming language is at
     least as readable as English. Usage: mostly by old-time hackers,
     though recognisable in context.
<cite>http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?English
Accessed: Sat, 05 Feb 2005 14:42:26 GMT</cit>
</quote>

The hint for the above was given by Wendell's implicit invitation to
examine how programers speak/spoke about code:

> Study, for example, the manuscripts of a working playwright, and it becomes
> hard to say with any certainty what's text and what's markup. Revisions
> scrawled in the margins? Stage directions, blocking notes, lists of props?
> (Like property lists appearing in comments in a programmer's code.)

<clip/>

> Best regards,
> Wendell Piez

I don't know if Wendell knows, remembers, or cares for the lyrics of a
Marianne Faithfull ...

     What are you fighting for ?
     It's not my security.
     It's just an old war,
     Not even a cold war,
     Dont say it in russian,
     Dont say it in german.
     Say it in broken english,
     Say it in broken english.

My political point is that to build a world where access is improved is to
envisage a world where humans help each other and to cherish the eternal
need for translation, conversion, compression, retrieval and performance.
To recognize the essential fungibility of the digital (not just the
electronic digital), is to appreciate the role of the human. People are
part of the network too. They are integral agents for preparing and
directing output to devices for braille, print, voice synthesis or even
machine translation of natural languages like broken english and spanglish
and pidgin ...

   --
Francois Lachance, Scholar-at-large
http://www.chass.utoronto.ca/~lachance/jardin

2005 Year of Comparative Connections. DIA: Comparative connections? LOGZ:
Connection, first. Comparison, next. DIA: Check. Comparable ways of
connecting. LOGZ: Selection outcomes, first. Comparative Connections,
next.
Received on Sun Feb 06 2005 - 05:15:25 EST

This archive was generated by hypermail 2.2.0 : Sun Feb 06 2005 - 05:15:28 EST