Home | About | Subscribe | Search | Member Area |
Humanist Discussion Group, Vol. 33, No. 743. Department of Digital Humanities, King's College London Hosted by King's Digital Lab www.dhhumanist.org Submit to: humanist@dhhumanist.org [1] From: Henry SchafferSubject: Re: [Humanist] 33.742: Big Data and Truth (107) [2] From: Samuel Huskey Subject: Re: [Humanist] 33.742: Big Data and Truth (51) [3] From: Jacqueline Wernimont Subject: Re: [Humanist] 33.742: Big Data and Truth (20) --[1]------------------------------------------------------------------------ Date: 2020-04-09 01:53:22+00:00 From: Henry Schaffer Subject: Re: [Humanist] 33.742: Big Data and Truth Big Data can, in many instances, be very helpful. It can also lead one astray - and one might overlook the error because one has the results of Machine Learning / Artificial Intelligence operating on Big Data. I went to a terrific talk last year on an example of using CNNs (Convolutional Neural Networks - an important portion of Machine Learning - ML) to classify photos. The training phase was to feed the CNN with 10,000 photos of wolves and 10,000 photos of dogs, all labeled, with the idea that the resulting trained CNN should be able to quite accurately decide from a photo whether the animal is a dog or a wolf - and they do look fairly similar. The computer, with the right software and with lots of GPUs built in (they just greatly speed up the computation - but they do cost $$) chugged away for IIRC days. And then it showed it could classify the training photos with 98% accuracy. The researcher then fed the CNN an unlabeled picture of a wolf, and the CNN announced it was a dog - and that there was 97% certainty. The point of the talk really was to explain what happened. The picture of the wolf showed it in a meadow in summertime. The training data sets (Big ones!) were reviewed - almost all of the wolf pictures showed them in snow, while almost all the dog pictures showed them on grass or on wood or carpeted floors. A review of the algorithm showed that it basically ignored the animal - and made the decision on the background. White/snow led to the decision that it was a wolf. Green grass or earth tones indicated it was a dog. The fault wasn't with the CNN or the software or the hardware - it was with the Big Data used to train. So one is back to a problem that any statistician will bring up at the beginning - how to get representative data - whether small, medium or Big. The example I usually use to present BBD (Biased Big Data) is the very large poll (with 2+ million responses) that predicted that Alf Landon would win the 1936 US Presidential election. https://en.wikipedia.org/wiki/The_Literary_Digest --henry P.S. re: the current discussions on citations - I give no citation for the talk I mention - my notes are in my, now inaccessible office, so the details are also from memory - and so they might not strictly adhere to the facts. But I promise that the gist of what I describe is reliable. On Wed, Apr 8, 2020 at 9:38 AM Humanist wrote: > Humanist Discussion Group, Vol. 33, No. 742. > Department of Digital Humanities, King's College London > Hosted by King's Digital Lab > www.dhhumanist.org > Submit to: humanist@dhhumanist.org > > > > > Date: 2020-04-07 07:45:45+00:00 > From: Willard McCarty > Subject: Big Data unexamined > > Yesterday Humanist published an announcement of a conference on > methodology specifically directed to examination of how social data is > collected and handled (33.735). I was struck by the statement that, > > > While the body of literature using digital and social media data > > is growing at a staggering rate, accompanying methodological > contributions > > about the process of conducting research with digital and social media > data > > remains thin. > > I expect that we're all familiar with the claim, expressed or implied, > that with Big Data we can finally leave guesswork behind and see what is > really going on. A variant of this is the claim that unadorned, > undistorted truth arises from the mathematical (i.e. statistical) > analysis of such data; e.g. that we can finally see how wrong we've been > about some literary phenomenon because we've not until now had all of it > at our command. Ok, I am exaggerating beyond, I suspect, what any > careful scholar, data scientist or machine-learning expert would say, or > say in public. Nevertheless, the notion of objectivity achieved through > Big Data is about in the world and needs some light shown on it, yes? > > I am reminded of arguments openly made some decades ago that simply > providing all existing versions of a text in digital form with tools to > manipulate them would eliminate the need for textual editors. No one now > would say any such thing, but we do get carried away -- and carried away > from the really hard (and exciting) work. Yesterday, in the same batch > of postings as the one about that conference, Ken Friedman quoted his > doctoral professor Dorothy Harris: > > > "Be true to your sources and your sources will be true to you." > > 'To be true to' ... How much that demands of us! If anything, use of > digital tools makes that an even greater challenge. > > Comments? > > Yours, > WM > -- > Willard McCarty (www.mccarty.org.uk/), > Professor emeritus, Department of Digital Humanities, King's College > London; Editor, Interdisciplinary Science Reviews > (www.tandfonline.com/loi/yisr20) and Humanist (www.dhhumanist.org) --[2]------------------------------------------------------------------------ Date: 2020-04-08 15:24:03+00:00 From: Samuel Huskey Subject: Re: [Humanist] 33.742: Big Data and Truth Dear Willard, You wrote: I am reminded of arguments openly made some decades ago that simply providing all existing versions of a text in digital form with tools to manipulate them would eliminate the need for textual editors. No one now would say any such thing, but we do get carried away This is still a view that people hold. See, for example, Peter Heslin, "The Dream of a Universal Variorum: Digitizing the Commentary Tradition," in Classical Commentaries: Explorations in a Scholarly Genre., 494-511 (Oxford, 2016). Heslin writes (503), "The reader ought to be able to see instantly the text as reported by any given witness or previous editor, not as a collection of variants reported against the editor's text, but in its own right. The editor could still give his or her preferred text, but as one option among many, which the reader could change at well." See also P. Monella, Why Are There No Comprehensively Digital Scholarly Editions of Classical Texts? libreriauniversitaria.it edizioni, 2018. https://iris.unipa.it/handle/10447/294132. Advocates of comprehensive editions that include transcriptions of all available witness and sources often accuse traditional textual critics of positivism in aiming to recover what an author wrote. That is a misunderstanding of the point of a critical edition, but it does speak to your request for comments on Big Data and truth. I don't doubt that interesting results would come from the application of Big Data techniques to transcriptions of all of the witnesses and editions for a particular text or corpus, if the those transcriptions should ever become available (and that's a big "if"). I imagine that the number of readers able to drink from such a fire hose is small, though the number of people who think they want to is large. Even so, I would be interested in a discussion of whether such scholarship would be any less positivist than traditional textual criticism (if it really is positivist in the first place). Sincerely, Sam Samuel J. Huskey Associate Professor Department of Classics and Letters University of Oklahoma 650 Parrington Oval Carnegie Building Norman, OK 73019-4042 Pronouns: he, him, his --[3]------------------------------------------------------------------------ Date: 2020-04-08 14:29:03+00:00 From: Jacqueline Wernimont Subject: Re: [Humanist] 33.742: Big Data and Truth Thanks for sharing this, Willard. I'm struck by how often we see the straw man arguments that X is popular but Y hasn't been considered. There is so much on the ethics of social media use within the community that presents at AoIR, at HASTAC, and even at regional and international DH conferences. While there's not an equal representation of these kinds of questions at ACM meetings, they show up there too! Granted such discussions end up on the margins of mainline programming, but that doesn't mean they don't exist. I get the urge to make a topic seem urgent, but I wish that authors would consider the important work that is being erased when people claim that the work is "thin" or "understudied" -- far too often it is work done by scholars of color, by women and non-binary scholars, by people at institutions that are not "flagship" or "R1" as we would say in the US. The call could do the work of elevating what *has* been done while also noting its structural marginality without performing erasure in the name of innovative or the supposedly novel scholarship... Jacque _______________________________________________ Unsubscribe at: http://dhhumanist.org/Restricted List posts to: humanist@dhhumanist.org List info and archives at at: http://dhhumanist.org Listmember interface at: http://dhhumanist.org/Restricted/ Subscribe at: http://dhhumanist.org/membership_form.php
Editor: Willard McCarty (King's College London, U.K.; Western Sydney University, Australia)
Software designer: Malgosia Askanas (Mind-Crafts)
This site is maintained under a service level agreement by King's Digital Lab.