Home About Subscribe Search Member Area

Humanist Discussion Group

< Back to Volume 34

Humanist Archives: Aug. 9, 2020, 10:25 a.m. Humanist 34.221 - on GPT-3 and imitation

                  Humanist Discussion Group, Vol. 34, No. 221.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                Submit to: humanist@dhhumanist.org

    [1]    From: Bill Benzon 
           Subject: Re: [Humanist] 34.216: on GPT-3 [laundered version] (113)

    [2]    From: Willard McCarty 
           Subject: mimesis, imitation and mimicry (38)

        Date: 2020-08-08 08:16:37+00:00
        From: Bill Benzon 
        Subject: Re: [Humanist] 34.216: on GPT-3 [laundered version]

Brigitte Rath points out [Humanist 34.214]:

> Ferdinand de Saussure's famous structuralist model of language is also
relational, but it is heterogeneous: all signifiers -- "sound images" for
Saussure -- are of the same kind, and within the homogeneous set of signifiers
-- all of them "sound images" --, each signifier is defined precisely by being
different from all others. The same holds for all signifieds, concepts for
Saussure. A sign is formed when a signifier is connected to a signified -- a
sound image to a concept -- and thus when *categorically different* units, each
defined differentially within its own homogeneous system, are brought together.
Signification arises out of a *heterogeneous* system.

First, as far as I know, Saussure never studied semantics very much, nor did
other structural linguists. Lévi-Strauss's work on myth (the four volumes of
Mythologies in particular) may be the most significant body of work we have on
relationships among signifiers. But while that work has been much cited its
substance as been all but forgotten. For that matter, linguists paid very little
attention to semantics until well into the post 1973 era of the cognitive
sciences ('73 is when Longuet-Higgins coined the term 'cognitive-

We need to think very carefully about what GPT-3 does because, while it is true
that it has no access to signifiers (and I really have to use this terminology,
otherwise all is lost), it has nonetheless somehow managed, in effect, to infer
a structure of what we might call virtual or ghost signifiers (my terms). What
makes its output so uncanny is that we can easily read it as though it was
created by a being in more or less full possession of a full relational network
of signifiers. How did it do that? It did it by learning to predict what comes
next in text created by people, that is, by beings in full possession of the
relational network of signifieds. We need to keep that in mind, always.

How does GPT-3 work? Roughly [1]:

1. First the signifier tokens in the corpus are replaced by vectors of numbers
which position those signifiers in what we might call a high-dimensional space
of virtual semantics (actually, it is vector or distributional semantics). [2]

2. Then GPT-3 trains on the corpus (300 billion tokens) by moving through it
from the beginning to the end, attempting to predict the next word at each step
of the way. GPT-3 has 175B parameters distributed over the 96 layers of its
neural net. These parameters are initialized with random weights. Those values
are then modified during training. GPT-3 'read' a word and guesses the next
one. Initially, the guess will be wrong. So the parameter weights are adjusted
accordingly (back-propagation). It guesses again; if it's wrong, weights are
revised; if it is right, it goes on to guess the next word. And so on. When it
has gone through the entire corpus the parameter weights reflect what it has
learned about the distribution of signifiers in a very large body of real text.
Remember, though, that the actual distribution was created by people who possess
those signifiers. So GPT-3's parameter weights now somehow reflect of embody
the relational structure of those signifieds.

3. A user then feeds GPT-3 a prompt and GPT-3 continues the language in the
prompt by using its stored weights. It is, in effect, guessing the next word,
and the next, and so on. The linear structure of signifiers that it creates as
output reflects the multidimensional pattern of relations among signifieds as
stored in its parameter weights.

While sign relationship between a signifier and a signified is arbitrary, there
is nothing at all arbitrary about the relationships among a string of signifiers
in a text. Those relationships are ultimately governed by the system of
signifieds as expressed through the mechanisms of syntax and pragmatics.

I strongly suspect that the mere fact that GPT-3s methods work so well (and not
only GPT-3) gives us strong clues about the nature of the system of signifieds.
That gives this work some philosophical and psychological heft.

I'll also say that I do not at all believe that GPT-3 is only one, two, or
three steps away from becoming that magical mystical being, the AGI (artificial
general intelligence). The fact is, what it 'knows', it knows quite rigidly.
The ONLY thing it can do is to make predictions about what comes next. And I am
skeptical about the value of trying models with even more parameters. The models
may gain more power for awhile, but they'll approach an asymptote, as they do
that, however, they'll keep using electricity at a ferocious rate. It's
estimated that GPT-3's training run used $4.6 million (US dollars) worth of
electrical power.

As a final note, I'm pretty sure that many AI researchers were inspired by the
computer in Star Trek. I know that David Ferrucci, of Watson fame, was. Isn't
the Star Trek Computer (STC) a general artificial intelligence? I think so. Is
it some kind of super-intelligence, more intelligent than its human users? No,
it isn't. It has superior access to a wide range of information and is a whizz
at calculating, correlating, and so forth. But it doesn't conduct original
research. Is the STC malevolent? Not at all. It seems to me that the STC is just
what we want from an advanced artificial intelligence. Maybe we'll get it one

[1] There is a blither of material on the web about how transformers - the
generic technology behind GPT-3 - work. I have found these posts by Jay Alamar
useful: The Illustrated GPT-2 (Visualizing Transformer Language Models),
https://jalammar.github.io/illustrated-gpt2/. How GPT3 Works - Visualizations
and Animations, https://jalammar.github.io/how-gpt3-works-visualizations-

[2] This is not the place to explain how this is done. The basic idea was
invented by Gerard Salton in the 1960s and '70s and used in document
retrieval. I've written a blog post on the subject: Notes toward a theory of
the corpus, Part 2: Mind [#DH], https://new-savanna.blogspot.com/2018/12/notes-
toward-theory-of-corpus-part-2.html. Michael Gavin has written a very useful and
more detailed article on the subject: Is there a text in my data? (Part 1): On
Counting Words, Cultural Analytics, January 25, 2020,

Bill Benzon



        Date: 2020-08-08 07:15:35+00:00
        From: Willard McCarty 
        Subject: mimesis, imitation and mimicry

Jim Rovira is quite right in Humanist 34.218 about my sloppy use of the
word 'mimesis' in 34.214 to question the aim of our computational
efforts. I've become a slight bit better informed since encountering
shortly thereafter some of the literature on imitation. (Thanks to Bo
Sørensen for suggesting more.) I really should have known better, what
with Eric Auerbach's Mimesis (1946) looming in the background. More
recently, Merlin Donald, for example, has delineated a spectrum of
senses from 'mimesis', 'imitation' and 'mimicry' in "Imitation and
mimesis", in Hurley and Chater's collection, Perspectives on Imitation.
From Neuroscience to Social Science (2005), vol. 2, and with many
others made clear how deeply human imitation is. He writes,

> All human beings represent reality through mimetic means, and
> language is scaffolded on mimesis in a child’s development (Nelson,
> 1996). We are mimetic creatures. We identify mimetically with our
> tribal group and have an irresistible tendency to conform to its
> norms. Conformity, on all levels of overt behavior, is one of our
> signature traits, conferred by a universal mimetic tendency. We
> conform not only to the immediate patterns of our social group but
> also to the internalized ideals and archetypes of that group. And
> those archetypes shape the roles we tend to play during life, as
> actors in our own dramatic productions. (pp. 299-300)

But writings on imitation as deeply human also raise the question of 
how machinic imitation differs. Jim R points to the gap we need to mind. 
Is that gap traversable, or is traversing it simply the wrong way to 
frame the problem? 


Willard McCarty (www.mccarty.org.uk/),
Professor emeritus, Department of Digital Humanities, King's College
London; Editor, Interdisciplinary Science Reviews
(www.tandfonline.com/loi/yisr20) and Humanist (www.dhhumanist.org)

Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php

Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.