Home About Subscribe Search Member Area

Humanist Discussion Group


< Back to Volume 33

Humanist Archives: April 9, 2020, 10 a.m. Humanist 33.743 - Big Data and Truth

                  Humanist Discussion Group, Vol. 33, No. 743.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Henry Schaffer 
           Subject: Re: [Humanist] 33.742: Big Data and Truth (107)

    [2]    From: Samuel Huskey 
           Subject: Re: [Humanist] 33.742: Big Data and Truth (51)

    [3]    From: Jacqueline Wernimont 
           Subject: Re: [Humanist] 33.742: Big Data and Truth (20)


--[1]------------------------------------------------------------------------
        Date: 2020-04-09 01:53:22+00:00
        From: Henry Schaffer 
        Subject: Re: [Humanist] 33.742: Big Data and Truth

Big Data can, in many instances, be very helpful. It can also lead one
astray - and one might overlook the error because one has the results of
Machine Learning / Artificial Intelligence operating on Big Data.

I went to a terrific talk last year on an example of using CNNs
(Convolutional Neural Networks - an important portion of Machine Learning -
ML) to classify photos. The training phase was to feed the CNN with 10,000
photos of wolves and 10,000 photos of dogs, all labeled, with the idea that
the resulting trained CNN should be able to quite accurately decide from a
photo whether the animal is a dog or a wolf - and they do look fairly
similar.

The computer, with the right software and with lots of GPUs built in (they
just greatly speed up the computation - but they do cost $$) chugged away
for IIRC days. And then it showed it could classify the training photos
with 98% accuracy.

The researcher then fed the CNN an unlabeled picture of a wolf, and the CNN
announced it was a dog - and that there was 97% certainty.

The point of the talk really was to explain what happened. The picture of
the wolf showed it in a meadow in summertime. The training data sets (Big
ones!) were reviewed - almost all of the wolf pictures showed them in snow,
while almost all the dog pictures showed them on grass or on wood or
carpeted floors.

A review of the algorithm showed that it basically ignored the animal - and
made the decision on the background. White/snow led to the decision that it
was a wolf. Green grass or earth tones indicated it was a dog.

The fault wasn't with the CNN or the software or the hardware - it was with
the Big Data used to train.

So one is back to a problem that any statistician will bring up at the
beginning - how to get representative data - whether small, medium or Big.
The example I usually use to present BBD (Biased Big Data) is the very
large poll (with 2+ million responses) that predicted that Alf Landon would
win the 1936 US Presidential election.
https://en.wikipedia.org/wiki/The_Literary_Digest

--henry

P.S. re: the current discussions on citations - I give no citation for the
talk I mention - my notes are in my, now inaccessible office, so the
details are also from memory - and so they might not strictly adhere to the
facts. But I promise that the gist of what I describe is reliable.

On Wed, Apr 8, 2020 at 9:38 AM Humanist  wrote:

>                   Humanist Discussion Group, Vol. 33, No. 742.
>             Department of Digital Humanities, King's College London
>                    Hosted by King's Digital Lab
>                        www.dhhumanist.org
>                 Submit to: humanist@dhhumanist.org
>
>
>
>
>         Date: 2020-04-07 07:45:45+00:00
>         From: Willard McCarty 
>         Subject: Big Data unexamined
>
> Yesterday Humanist published an announcement of a conference on
> methodology specifically directed to examination of how social data is
> collected and handled (33.735). I was struck by the statement that,
>
> > While the body of literature using digital and social media data
> > is growing at a staggering rate, accompanying methodological
> contributions
> > about the process of conducting research with digital and social media
> data
> > remains thin.
>
> I expect that we're all familiar with the claim, expressed or implied,
> that with Big Data we can finally leave guesswork behind and see what is
> really going on. A variant of this is the claim that unadorned,
> undistorted truth arises from the mathematical (i.e. statistical)
> analysis of such data; e.g. that we can finally see how wrong we've been
> about some literary phenomenon because we've not until now had all of it
> at our command. Ok, I am exaggerating beyond, I suspect, what any
> careful scholar, data scientist or machine-learning expert would say, or
> say in public. Nevertheless, the notion of objectivity achieved through
> Big Data is about in the world and needs some light shown on it, yes?
>
> I am reminded of arguments openly made some decades ago that simply
> providing all existing versions of a text in digital form with tools to
> manipulate them would eliminate the need for textual editors. No one now
> would say any such thing, but we do get carried away -- and carried away
> from the really hard (and exciting) work. Yesterday, in the same batch
> of postings as the one about that conference, Ken Friedman quoted his
> doctoral professor Dorothy Harris:
>
> > "Be true to your sources and your sources will be true to you."
>
> 'To be true to' ... How much that demands of us! If anything, use of
> digital tools makes that an even greater challenge.
>
> Comments?
>
> Yours,
> WM
> --
> Willard McCarty (www.mccarty.org.uk/),
> Professor emeritus, Department of Digital Humanities, King's College
> London; Editor, Interdisciplinary Science Reviews
> (www.tandfonline.com/loi/yisr20) and Humanist (www.dhhumanist.org)


--[2]------------------------------------------------------------------------
        Date: 2020-04-08 15:24:03+00:00
        From: Samuel Huskey 
        Subject: Re: [Humanist] 33.742: Big Data and Truth

Dear Willard,

You wrote:
I am reminded of arguments openly made some decades ago that simply providing
all existing versions of a text in digital form with tools to manipulate them
would eliminate the need for textual editors. No one now would say any such
thing, but we do get carried away

This is still a view that people hold. See, for example, Peter Heslin, "The
Dream of a Universal Variorum: Digitizing the Commentary Tradition," in
Classical Commentaries: Explorations in a Scholarly Genre., 494-511 (Oxford,
2016). Heslin writes (503), "The reader ought to be able to see instantly the
text as reported by any given witness or previous editor, not as a collection of
variants reported against the editor's text, but in its own right. The editor
could still give his or her preferred text, but as one option among many, which
the reader could change at well." See also
P. Monella, Why Are There No Comprehensively Digital Scholarly Editions of
Classical Texts? libreriauniversitaria.it edizioni, 2018.
https://iris.unipa.it/handle/10447/294132.

Advocates of comprehensive editions that include transcriptions of all available
witness and sources often accuse traditional textual critics of positivism in
aiming to recover what an author wrote. That is a misunderstanding of the point
of a critical edition, but it does speak to your request for comments on Big
Data and truth. I don't doubt that interesting results would come from the
application of Big Data techniques to transcriptions of all of the witnesses and
editions for a particular text or corpus, if the those transcriptions should
ever become available (and that's a big "if"). I imagine that the number of
readers able to drink from such a fire hose is small, though the number of
people who think they want to is large. Even so, I would be interested in a
discussion of whether such scholarship would be any less positivist than
traditional textual criticism (if it really is positivist in the first place).

Sincerely,

Sam

Samuel J. Huskey
Associate Professor
Department of Classics and Letters
University of Oklahoma
650 Parrington Oval
Carnegie Building
Norman, OK  73019-4042

Pronouns: he, him, his






--[3]------------------------------------------------------------------------
        Date: 2020-04-08 14:29:03+00:00
        From: Jacqueline Wernimont 
        Subject: Re: [Humanist] 33.742: Big Data and Truth

Thanks for sharing this, Willard. I'm struck by how often we see the straw
man arguments that X is popular but Y hasn't been considered. There is so
much on the ethics of social media use within the community that presents
at AoIR, at HASTAC, and even at regional and international DH conferences.
While there's not an equal representation of these kinds of questions at
ACM meetings, they show up there too! Granted such discussions end up on
the margins of mainline programming, but that doesn't mean they don't exist.

I get the urge to make a topic seem urgent, but I wish that authors would
consider the important work that is being erased when people claim that the
work is "thin" or "understudied" -- far too often it is work done by
scholars of color, by women and non-binary scholars, by people at
institutions that are not "flagship" or "R1" as we would say in the US. The
call could do the work of elevating what *has* been done while also noting
its structural marginality without performing erasure in the name of
innovative or the supposedly novel scholarship...

Jacque




_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php


Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)

This site is maintained under a service level agreement by King's Digital Lab.