19.149 Web page archives - interesting legal mess

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Sat, 16 Jul 2005 06:49:14 +0100

               Humanist Discussion Group, Vol. 19, No. 149.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

         Date: Sat, 16 Jul 2005 06:43:02 +0100
         From: Simon Tanner <simon.tanner_at_KCL.AC.UK>
         Subject: Web page archives - interesting legal mess

Are internet archives covered by copyright law just like everything
else - seems we may find out the court's opinion in the US.

Best,
          Simon

>Date: Wed, 13 Jul 2005 22:19:28 -0700
>From: ryan griffis <grifray_at_YAHOO.COM>
>>Another Suit
>
>(link will probably break at the "?")
>http://www.nytimes.com/2005/07/13/technology/13suit.html?
>ei=5088&en=377b4b470d4593e0&ex=1278907200&adxnnl=6&partner=rssnyt&emc=rs
>s&adxnnlx=1121290900-rouEQzAGo8GqbZlwwjPI6A
>
>Keeper of Expired Web Pages Is Sued Because Archive Was Used in Another
>Suit
>
>By TOM ZELLER Jr.
>Published: July 13, 2005
>
>The Internet Archive was created in 1996 as the institutional memory of
>the online world, storing snapshots of ever-changing Web sites and
>collecting other multimedia artifacts. Now the nonprofit archive is on
>the defensive in a legal case that represents a strange turn in the
>debate over copyrights in the digital age.
>
>Beyond its utility for Internet historians, the Web page database,
>searchable with a form called the Wayback Machine, is also routinely
>used by intellectual property lawyers to help learn, for example, when
>and how a trademark might have been historically used or violated.
>
>That is what brought the Philadelphia law firm of Harding Earley Follmer
>& Frailey to the Wayback Machine two years ago. The firm was defending
>Health Advocate, a company in suburban Philadelphia that helps patients
>resolve health care and insurance disputes, against a trademark action
>brought by a similarly named competitor.
>
>In preparing the case, representatives of Earley Follmer used the
>Wayback Machine to turn up old Web pages - some dating to 1999 -
>originally posted by the plaintiff, Healthcare Advocates of
>Philadelphia.
>
>Last week Healthcare Advocates sued both the Harding Earley firm and the
>Internet Archive, saying the access to its old Web pages, stored in the
>Internet Archive's database, was unauthorized and illegal.
>
>The lawsuit, filed in Federal District Court in Philadelphia, seeks
>unspecified damages for copyright infringement and violations of two
>federal laws: the Digital Millennium Copyright Act and the Computer
>Fraud and Abuse Act.
>
>"The firm at issue professes to be expert in Internet law and
>intellectual property law," said Scott S. Christie, a lawyer at the
>Newark firm of McCarter & English, which is representing Healthcare
>Advocates. "You would think, of anyone, they would know better."
>
>But John Earley, a member of the firm being sued, said he was not
>surprised by the action, because Healthcare Advocates had tried to amend
>similar charges to its original suit against Health Advocate, but the
>judge denied the motion. Mr. Earley called the action baseless,
>adding: "It's a rather strange one, too, because Wayback is used every
>day in trademark law. It's a common tool."
>
>The Internet Archive uses Web-crawling "bot" programs to make copies of
>publicly accessible sites on a periodic, automated basis. Those copies
>are then stored on the archive's servers for later recall using the
>Wayback Machine.
>
>The archive's repository now has approximately one petabyte - roughly
>one million gigabytes - worth of historical Web site content, much of
>which would have been lost as Web site owners deleted, changed and
>otherwise updated their sites.
>
>The suit contends, however, that representatives of Harding Earley
>should not have been able to view the old Healthcare Advocates Web pages
>- even though they now reside on the archive's servers - because the
>company, shortly after filing its suit against Health Advocate, had
>placed a text file on its own servers designed to tell the Wayback
>Machine to block public access to the historical versions of the site.
>
>Under popular Web convention, such a file - known as robots.txt -
>dictates what parts of a site can be examined for indexing in search
>engines or storage in archives.
>
>Most search engines program their Web crawlers to recognize a robots.txt
>file, and follow its commands. The Internet Archive goes a step further,
>allowing Web site administrators to use the robots.txt file to control
>the archiving of current content, as well as block access to any older
>versions already stored in the archive's database before a robots.txt
>file was put in place.
>
>But on at least two dates in July 2003, the suit states, Web logs at
>Healthcare Advocates indicated that someone at Harding Earley, using the
>Wayback Machine, made hundreds of rapid-fire requests for the old
>versions of the Web site. In most cases, the robot.txt blocked the
>request. But in 92 instances, the suit states, it appears to have
>failed, allowing access to the archived pages.
>
>In so doing, the suit claims, the law firm violated the Digital
>Millennium Copyright Act, which prohibits the circumventing of
>"technological measures" designed to protect copyrighted materials. The
>suit further contends that among other violations, the firm violated
>copyright by gathering, storing and transmitting the archived pages as
>part of the earlier trademark litigation.
>
>The Internet Archive, meanwhile, is accused of breach of contract and
>fiduciary duty, negligence and other charges for failing to honor the
>robots.txt file and allowing the archived pages to be viewed.
>
>Brewster Kahle, the director and a founder of the Internet Archive, was
>unavailable for comment, and no one at the archive was willing to talk
>about the case - although Beatrice Murch, Mr. Kahle's assistant and a
>development coordinator, said the organization had not yet been formally
>served with the suit.
>
>Mr. Earley, the lawyer whose firm is named along with the archive,
>however, said no breach was ever made. "We wouldn't know how to, in
>effect, bypass a block." he said.
>
>Even if they had, it is unclear that any laws would have been broken.
>
>"First of all, robots.txt is a voluntary mechanism," said Martijn
>Koster, a Dutch software engineer and the author of a comprehensive
>tutorial on the robots.txt convention (robotstxt.org). "It is designed
>to let Web site owners communicate their wishes to cooperating robots.
>Robots can ignore robots.txt."
>
>William F. Patry, an intellectual property lawyer with Thelen Reid &
>Priest in New York and a former Congressional copyright counsel, said
>that violations of the copyright act and other statutes would be
>extremely hard to prove in this case.
>
>He said that the robots.txt file is part of an entirely voluntary
>system, and that no real contract exists between the nonprofit Internet
>Archive and any of the historical Web sites it preserves.
>
>"The archive here, they were being the good guys," Mr. Patry said,
>referring to the archive's recognition of robots.txt commands. "They
>didn't have to do that."
>
>Mr. Patry also noted that despite Healthcare Advocates' desire to
>prevent people from seeing its old pages now, the archived pages were
>once posted openly by the company. He asserted that gathering them as
>part of fending off a lawsuit fell well within the bounds of fair use.
>
>Whatever the circumstances behind the access, Mr. Patry said, the sole
>result "is that information that they had formerly made publicly
>available didn't stay hidden."
>
>------------------------------

++++++++++++++++++++++++++++++++++++++++
Simon Tanner
Director, King's Digital Consultancy Services
King's College London
Kay House, 7 Arundel Street, London WC2R 3DX
tel: +44 (0)7793 403542
email: simon.tanner_at_kcl.ac.uk
www.kcl.ac.uk/kdcs/
Received on Sat Jul 16 2005 - 01:53:57 EDT

This archive was generated by hypermail 2.2.0 : Sat Jul 16 2005 - 01:53:58 EDT