Humanist Discussion Group

Humanist Archives: April 13, 2020, 8:08 a.m. Humanist 33.755 - Big Data and Truth

                  Humanist Discussion Group, Vol. 33, No. 755.
            Department of Digital Humanities, King's College London
                   Hosted by King's Digital Lab
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Tim Smithers 
           Subject: Re: [Humanist] 33.743: Big Data and Truth (218)

    [2]    From: Francois Lachance 
           Subject: capta, inventa, acta < Re: [Humanist] 33.749: Big Data and Truth (41)


--[1]------------------------------------------------------------------------
        Date: 2020-04-10 14:51:44+00:00
        From: Tim Smithers 
        Subject: Re: [Humanist] 33.743: Big Data and Truth

Dear Marlene and Henry,

Marlene, yes, I think the Lisa Gitelman book [1] is an
unusually good collection of pieces about data and their
nature, because, I would say, each chapter is a story about a
kind of real data, and not some kind of Big Data bashing,
which is where we seem to have moved to now.  (Not that I'm
against this.)

I use several chapters from Gitelman's book in PhD teaching I
do.

For a course on measurement and measuring for researchers, I
use

   Chapter 4 Where Is That Moon, Anyway?  The Problem of
    Interpreting Historical Solar Eclipse Observations
   By Matthew Stanley

For one on models and modelling for researchers, I use

   Chapter 3: From Measuring Desire to Quantifying
    Expectations: A Late Nineteenth-Century Effort to Marry
    Economic Theory and Data
   By Kevin R Brine and Mary Poovey

See note [2] below for the brilliant introduction the opening
paragraph of this chapter presents.

For my designing and building a critical state of the art for
research, I use

   Chapter 2: Procrustean Marxism and Subjective Rigor: Early
    Modern Arithmetic and Its Readers
   By Travis D Williams

as an example of how to think about reading research
publications, the opening paragraph of which ends with.

    "...  I will argue that there is a reciprocal
     correspondence between "reading" and "rigor," so much so
     that to read mathematics appropriately, thoroughly, and
     respectfully, one must do the mathematics itself."

As Williams argues it is for mathematics, I think it is
essentially the same for other kinds of work: there is a
strong relationship, if not actual correspondence, between
reading and rigor.

And I also use

   Chapter 6: Paper as Passion: Niklas Luhmann and His Card
    Index
   By Markus Krajewski, translated by Charles Marcrum II

and

   Chapter 8: Data Bite Man: The Work of Sustaining a
    Long-Term Study
   By David Ribes and Steven J Jackson

The other three chapters are good reads too.


Henry, you say, having explained the failure of the attempt to
train a CNN to distinguish dogs from wolves.

   "The fault wasn't with the CNN or the software or the
    hardware - it was with the Big Data used to train."

This seems too generous to me, even perhaps mistaken.  I would
say it was the people who trained the CNN using this data set
in the way they did that were at fault.  I think it is always
the people who do this kind of work, and never the data, nor,
as you say, the software or hardware, that are to blame when
things go wrong.


Another early, but still good book on the matter, which I
imagine many here know, but which is still worth mentioning, I
think, is.

    Cathy O'Neil, 2016. Weapons of Math Destruction, Crown.

This is the first in a useful bibliography on published
critiques of Big Data that Ernest Davis started collecting in
2014, but stopped adding to in 2018, when the topic exploded.
His list is still here, and still quite useful, I think.

    Recent Critiques of Big Data: Small Bibliography
    (https://cs.nyu.edu/faculty/davise/papers/BigDataBib.html)

Ernest Davis is, by the way, co-author with Gary Marcus of a
recent book

     Rebooting AI: Building Artificial Intelligence we can
     Trust, Pantheon Books, 2019.

This too takes a sensible view, I think, of what we can
reasonably expect AI, and, in particular, the more recent
machine learning flavours of this, such as CNN, can do for us.

Best regards,

Tim


Notes

[1] Lisa Gitelman (Ed), 2013.  "Raw Data" Is an Oxymoron, MIT
    Press

[2] "In 1891, an ambitious young doctoral candidate in New
     Haven drew up a plan for an elaborate mechanism, whose
     operations were intended to help readers visualize how
     the economy worked.  Two years later, with financial help
     from a colleague, the same young man, now an assistant
     professor of mathematics at Yale, turned his plan into a
     three-dimensional machine, which demonstrated in real
     time and space the economic principles he had described
     in his dissertation.  Three years after that, in 1896,
     our mathematician-turned-economist undertook two
     additional projects: he tried to stabilize the definition
     of the term "capital"; and he attempted to put theory in
     a quantitative form and test his theoretical hypothesis
     against the available data.  In this series of moves—from
     a drawing to a three-dimensional machine to a
     quantitative formulation that could use and be tested
     against empirical data -- Irving Fisher simultaneously
     helped liberate the academic discipline of economics from
     its nineteenth-century polemical phase and created the
     prototype for what is now the normative way that
     economists make truth claims about the economy.  In the
     process he exposed something peculiar about the nature of
     the data economic claims invoke: the numbers that seem
     simply to represent actual economic events are actually
     the products of a complex historical and practical
     process that has made them useful for the formulation in
     which they appear.  This process, moreover, embeds
     aspects of the assumptions most economists now take for
     granted into the data themselves.  Not only is economic
     data never raw, then, in the sense of being
     uninterpreted, but also the form that makes data suitable
     for economists' use carries with it assumptions about
     quantification and value that now go unnoticed and
     unremarked."


> On 09 Apr 2020, at 11:00, Humanist  wrote:
>
>                  Humanist Discussion Group, Vol. 33, No. 743.
>            Department of Digital Humanities, King's College London
>                   Hosted by King's Digital Lab
>                       www.dhhumanist.org
>                Submit to: humanist@dhhumanist.org
>
>
>    [1]    From: Henry Schaffer 
>           Subject: Re: [Humanist] 33.742: Big Data and Truth (107)

> 

> --[1]------------------------------------------------------------------------
>        Date: 2020-04-09 01:53:22+00:00
>        From: Henry Schaffer 
>        Subject: Re: [Humanist] 33.742: Big Data and Truth
>
> Big Data can, in many instances, be very helpful. It can also lead one
> astray - and one might overlook the error because one has the results of
> Machine Learning / Artificial Intelligence operating on Big Data.
>
> I went to a terrific talk last year on an example of using CNNs
> (Convolutional Neural Networks - an important portion of Machine Learning -
> ML) to classify photos. The training phase was to feed the CNN with 10,000
> photos of wolves and 10,000 photos of dogs, all labeled, with the idea that
> the resulting trained CNN should be able to quite accurately decide from a
> photo whether the animal is a dog or a wolf - and they do look fairly
> similar.
>
> The computer, with the right software and with lots of GPUs built in (they
> just greatly speed up the computation - but they do cost $$) chugged away
> for IIRC days. And then it showed it could classify the training photos
> with 98% accuracy.
>
> The researcher then fed the CNN an unlabeled picture of a wolf, and the CNN
> announced it was a dog - and that there was 97% certainty.
>
> The point of the talk really was to explain what happened. The picture of
> the wolf showed it in a meadow in summertime. The training data sets (Big
> ones!) were reviewed - almost all of the wolf pictures showed them in snow,
> while almost all the dog pictures showed them on grass or on wood or
> carpeted floors.
>
> A review of the algorithm showed that it basically ignored the animal - and
> made the decision on the background. White/snow led to the decision that it
> was a wolf. Green grass or earth tones indicated it was a dog.
>
> The fault wasn't with the CNN or the software or the hardware - it was with
> the Big Data used to train.
>
> So one is back to a problem that any statistician will bring up at the
> beginning - how to get representative data - whether small, medium or Big.
> The example I usually use to present BBD (Biased Big Data) is the very
> large poll (with 2+ million responses) that predicted that Alf Landon would
> win the 1936 US Presidential election.
> https://en.wikipedia.org/wiki/The_Literary_Digest
>
> --henry
>
> P.S. re: the current discussions on citations - I give no citation for the
> talk I mention - my notes are in my, now inaccessible office, so the
> details are also from memory - and so they might not strictly adhere to the
> facts. But I promise that the gist of what I describe is reliable.
>





--[2]------------------------------------------------------------------------
        Date: 2020-04-10 12:07:33+00:00
        From: Francois Lachance 
        Subject: capta, inventa, acta < Re: [Humanist] 33.749: Big Data and Truth

Hello Jeremy

I need a foot note for capta, inventa, acta. Is there a handy reference for
these data types. Thanks. Please.

I love the flourish of the these Latinized terms and I want to learn more.

I wonder how each of the terms involve different ethical questions and actors.

Much obliged,

F

Hello Dr. Doran,

I wonder how the KPLEX project might leverage the typology teasingly supplied by
Jeremy Hunsinger.

I am taking advantage of the serendipitous juxtaposition of your two messages to
Humanist (such juxtapositions supplied by our ever faithful moderator, Willard)
Any thoughts to share on this Willard?

F

For those who missed it, here is the URL to the Knowledge Complexity (KPLEX)
site

> https://kplex-project.eu/deliverables/

Cheers

F

Francois Lachance
Scholar-at-large
http://www.chass.utoronto.ca/~lachance
https://berneval.hcommons.org

to think is often to sort, to store and to shuffle: humble, embodied tasks




_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.