18.506 computing and composition

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Fri, 21 Jan 2005 09:08:16 +0000

               Humanist Discussion Group, Vol. 18, No. 506.
       Centre for Computing in the Humanities, King's College London
                     Submit to: humanist_at_princeton.edu

         Date: Fri, 21 Jan 2005 09:02:18 +0000
         From: "Donald Weinshank" <weinshan_at_cse.msu.edu>
         Subject: RE: 18.499 computing and composition

David Reed wrote:
In a message dated 1/19/2005 3:56:55 PM Mountain Standard Time,
willard.mccarty_at_KCL.AC.UK writes:
>(3) does anybody know about a program that'll strip out the
>useless code from a M$Word-created HTML file? (as a plain ascii
>file the text in question is about 17K; in its full flower, as
>published to HTML by Word, it's 48K). (By the way, I've tried
>M$Word's "filtered" HTML and Dreamweaver's HTML cleanup.
>Neither touch the mess.)

I use a little program called web2text that handles just about
everything. You do have to do a little clean up on the quotes and dashes
in most cases.

David Reed

One simple approach is to save WORD files as RTF (Rich Text Format). This
strips out most of the junk, and I then import the RTF file into FrontPage,
for example.

Dr. Don Weinshank Professor Emeritus Comp. Sci. & Eng.
1520 Sherwood Ave., East Lansing MI 48823-1885
Ph. 517.337.1545 FAX 517.337.1665
Received on Fri Jan 21 2005 - 04:16:26 EST

This archive was generated by hypermail 2.2.0 : Fri Jan 21 2005 - 04:16:27 EST