Home About Subscribe Search Member Area

Humanist Discussion Group

< Back to Volume 33

Humanist Archives: Nov. 3, 2019, 6 a.m. Humanist 33.368 - speech-to-text

                  Humanist Discussion Group, Vol. 33, No. 368.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                Submit to: humanist@dhhumanist.org

        Date: 2019-11-02 13:06:41+00:00
        From: Bill Benzon 
        Subject: Google speech-to-text on YouTube is impressive

Willard and fellow humanists:

Every once in awhile I transcribe a bit of speech from a YouTube video. It is
tedious work. I can't type fast enough to transcribe in real time and even if
I could, the sound is not always clear. So I listen to a segment, stop the
video, and transcribe what I just heard. If I'm not sure, I've got to go
back a bit and listen again. But YouTube doesn't give you much control over
just how far back you go. More often than not I go back too far and waste time
listening to stuff I've already transcribed.

I was doing this yesterday when, in poking around, I accidentally hit the switch
that toggled closed captions. And all of a sudden I noticed two lines of
transcribed words crawling across the bottom of the video. And they were pretty
accurate, accurate enough to make transcription a bit easier. Just stop the
video and copy what you see on the screen. If it's not completely accurate,
what you see nonetheless is enough to cue you so you can type a more version.

I don't know just when this started happening. Google acquired YouTube over a
decade ago (in 2006) and it would have been since then. I'd guess it was in
the last five years.

Now to the point: To a first approximation this is the same kind of machine-
learning technology that powers Google Translate. Google Translate is certainly
not perfect and, for many purposes, it may not even be very good. But for casual
use it is useful and certainly better than nothing. I'd guess that most of us
-- when we're not caught up in being defensive about computing ­-- find
Google Translate pretty impressive. But speech-to-text? Not impressive, routine.

And yet the underlying technology is pretty much the same. In neither case does
the machine understand language. But we know that translation requires
understanding and reflexively and mistakenly project such understanding to the
machine. But transcription seems easier. You're just writing down what you
hear, but that doesn't require real understanding.

In these cases our untutored intuitions betray us. Reasonably accurate speech-
to-text is no more and no less impressive and crude translation. Though to the
extent that the ordinary sense of "translate"”" implies understanding, Google
Translate doesn't do translation. What would we call what it does? Transmute?
That seems more accurate: Google Transmute.

Finally, I note that when I'm having difficulty with transcription I call on
my understanding of what's being said. And I'm pretty sure I'd be useless
at trying to transcribe any language other than English; understanding aside, my
ear isn't tuned to any other language. In speech-to-text it's the need to
tune the machine to the language that requires machine learning of large piles
of data.

Bill Benzon



Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php

Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.