Humanist Discussion Group, Vol. 33, No. 755. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: email@example.com  From: Tim Smithers
Subject: Re: [Humanist] 33.743: Big Data and Truth (218)  From: Francois Lachance Subject: capta, inventa, acta < Re: [Humanist] 33.749: Big Data and Truth (41) -------------------------------------------------------------------------- Date: 2020-04-10 14:51:44+00:00 From: Tim Smithers Subject: Re: [Humanist] 33.743: Big Data and Truth Dear Marlene and Henry, Marlene, yes, I think the Lisa Gitelman book  is an unusually good collection of pieces about data and their nature, because, I would say, each chapter is a story about a kind of real data, and not some kind of Big Data bashing, which is where we seem to have moved to now. (Not that I'm against this.) I use several chapters from Gitelman's book in PhD teaching I do. For a course on measurement and measuring for researchers, I use Chapter 4 Where Is That Moon, Anyway? The Problem of Interpreting Historical Solar Eclipse Observations By Matthew Stanley For one on models and modelling for researchers, I use Chapter 3: From Measuring Desire to Quantifying Expectations: A Late Nineteenth-Century Effort to Marry Economic Theory and Data By Kevin R Brine and Mary Poovey See note  below for the brilliant introduction the opening paragraph of this chapter presents. For my designing and building a critical state of the art for research, I use Chapter 2: Procrustean Marxism and Subjective Rigor: Early Modern Arithmetic and Its Readers By Travis D Williams as an example of how to think about reading research publications, the opening paragraph of which ends with. "... I will argue that there is a reciprocal correspondence between "reading" and "rigor," so much so that to read mathematics appropriately, thoroughly, and respectfully, one must do the mathematics itself." As Williams argues it is for mathematics, I think it is essentially the same for other kinds of work: there is a strong relationship, if not actual correspondence, between reading and rigor. And I also use Chapter 6: Paper as Passion: Niklas Luhmann and His Card Index By Markus Krajewski, translated by Charles Marcrum II and Chapter 8: Data Bite Man: The Work of Sustaining a Long-Term Study By David Ribes and Steven J Jackson The other three chapters are good reads too. Henry, you say, having explained the failure of the attempt to train a CNN to distinguish dogs from wolves. "The fault wasn't with the CNN or the software or the hardware - it was with the Big Data used to train." This seems too generous to me, even perhaps mistaken. I would say it was the people who trained the CNN using this data set in the way they did that were at fault. I think it is always the people who do this kind of work, and never the data, nor, as you say, the software or hardware, that are to blame when things go wrong. Another early, but still good book on the matter, which I imagine many here know, but which is still worth mentioning, I think, is. Cathy O'Neil, 2016. Weapons of Math Destruction, Crown. This is the first in a useful bibliography on published critiques of Big Data that Ernest Davis started collecting in 2014, but stopped adding to in 2018, when the topic exploded. His list is still here, and still quite useful, I think. Recent Critiques of Big Data: Small Bibliography (https://cs.nyu.edu/faculty/davise/papers/BigDataBib.html) Ernest Davis is, by the way, co-author with Gary Marcus of a recent book Rebooting AI: Building Artificial Intelligence we can Trust, Pantheon Books, 2019. This too takes a sensible view, I think, of what we can reasonably expect AI, and, in particular, the more recent machine learning flavours of this, such as CNN, can do for us. Best regards, Tim Notes  Lisa Gitelman (Ed), 2013. "Raw Data" Is an Oxymoron, MIT Press  "In 1891, an ambitious young doctoral candidate in New Haven drew up a plan for an elaborate mechanism, whose operations were intended to help readers visualize how the economy worked. Two years later, with financial help from a colleague, the same young man, now an assistant professor of mathematics at Yale, turned his plan into a three-dimensional machine, which demonstrated in real time and space the economic principles he had described in his dissertation. Three years after that, in 1896, our mathematician-turned-economist undertook two additional projects: he tried to stabilize the definition of the term "capital"; and he attempted to put theory in a quantitative form and test his theoretical hypothesis against the available data. In this series of moves—from a drawing to a three-dimensional machine to a quantitative formulation that could use and be tested against empirical data -- Irving Fisher simultaneously helped liberate the academic discipline of economics from its nineteenth-century polemical phase and created the prototype for what is now the normative way that economists make truth claims about the economy. In the process he exposed something peculiar about the nature of the data economic claims invoke: the numbers that seem simply to represent actual economic events are actually the products of a complex historical and practical process that has made them useful for the formulation in which they appear. This process, moreover, embeds aspects of the assumptions most economists now take for granted into the data themselves. Not only is economic data never raw, then, in the sense of being uninterpreted, but also the form that makes data suitable for economists' use carries with it assumptions about quantification and value that now go unnoticed and unremarked." > On 09 Apr 2020, at 11:00, Humanist wrote: > > Humanist Discussion Group, Vol. 33, No. 743. > Department of Digital Humanities, King's College London > Hosted by King's Digital Lab > www.dhhumanist.org > Submit to: firstname.lastname@example.org > > >  From: Henry Schaffer > Subject: Re: [Humanist] 33.742: Big Data and Truth (107) > > -------------------------------------------------------------------------- > Date: 2020-04-09 01:53:22+00:00 > From: Henry Schaffer > Subject: Re: [Humanist] 33.742: Big Data and Truth > > Big Data can, in many instances, be very helpful. It can also lead one > astray - and one might overlook the error because one has the results of > Machine Learning / Artificial Intelligence operating on Big Data. > > I went to a terrific talk last year on an example of using CNNs > (Convolutional Neural Networks - an important portion of Machine Learning - > ML) to classify photos. The training phase was to feed the CNN with 10,000 > photos of wolves and 10,000 photos of dogs, all labeled, with the idea that > the resulting trained CNN should be able to quite accurately decide from a > photo whether the animal is a dog or a wolf - and they do look fairly > similar. > > The computer, with the right software and with lots of GPUs built in (they > just greatly speed up the computation - but they do cost $$) chugged away > for IIRC days. And then it showed it could classify the training photos > with 98% accuracy. > > The researcher then fed the CNN an unlabeled picture of a wolf, and the CNN > announced it was a dog - and that there was 97% certainty. > > The point of the talk really was to explain what happened. The picture of > the wolf showed it in a meadow in summertime. The training data sets (Big > ones!) were reviewed - almost all of the wolf pictures showed them in snow, > while almost all the dog pictures showed them on grass or on wood or > carpeted floors. > > A review of the algorithm showed that it basically ignored the animal - and > made the decision on the background. White/snow led to the decision that it > was a wolf. Green grass or earth tones indicated it was a dog. > > The fault wasn't with the CNN or the software or the hardware - it was with > the Big Data used to train. > > So one is back to a problem that any statistician will bring up at the > beginning - how to get representative data - whether small, medium or Big. > The example I usually use to present BBD (Biased Big Data) is the very > large poll (with 2+ million responses) that predicted that Alf Landon would > win the 1936 US Presidential election. > https://en.wikipedia.org/wiki/The_Literary_Digest > > --henry > > P.S. re: the current discussions on citations - I give no citation for the > talk I mention - my notes are in my, now inaccessible office, so the > details are also from memory - and so they might not strictly adhere to the > facts. But I promise that the gist of what I describe is reliable. > -------------------------------------------------------------------------- Date: 2020-04-10 12:07:33+00:00 From: Francois Lachance Subject: capta, inventa, acta < Re: [Humanist] 33.749: Big Data and Truth Hello Jeremy I need a foot note for capta, inventa, acta. Is there a handy reference for these data types. Thanks. Please. I love the flourish of the these Latinized terms and I want to learn more. I wonder how each of the terms involve different ethical questions and actors. Much obliged, F Hello Dr. Doran, I wonder how the KPLEX project might leverage the typology teasingly supplied by Jeremy Hunsinger. I am taking advantage of the serendipitous juxtaposition of your two messages to Humanist (such juxtapositions supplied by our ever faithful moderator, Willard) Any thoughts to share on this Willard? F For those who missed it, here is the URL to the Knowledge Complexity (KPLEX) site > https://kplex-project.eu/deliverables/ Cheers F Francois Lachance Scholar-at-large http://www.chass.utoronto.ca/~lachance https://berneval.hcommons.org to think is often to sort, to store and to shuffle: humble, embodied tasks _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: email@example.com List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.