Screen scraping is a problem that has vexed website owners since the early days of e-commerce – how to make valuable content available to users and customers, but prevent competitors from accessing such content for commercial purposes. Even in the advent of social media, mobile commerce, and advanced software, the issue remains relevant to today’s companies, as evidenced by the craigslist’s victory this past week against an aggregator that had formerly scraped its user postings.
An ongoing dispute from this past winter that we have been watching has raised these long-standing issues anew.
Heritage Auctions, a major auction house that specializes in rare coins, entertainment memorabilia and natural historical items, has brought a multi-count suit against Christie’s, alleging that its competitor scraped millions of proprietary and copyrighted photos and listings from Heritage’s website and reposted them on its own subscriber-only auction site Collectrium. (Heritage Capital Corp. v. Christie’s, Inc., No. 16-03404 (N.D. Tex. filed Dec. 9, 2016)). Plaintiffs claim that Collectrium removed copyright notices from the original listings and photos and ported the data onto its own site, thereby saving significant costs from producing similar listings or paying licensing fees and allegedly causing harm to Heritage in additional IT-related costs and diverted or lost business.
Heritage’s website use agreement contained typical provisions restricting commercial use of the listings, unauthorized access or any automated collection of listings through “database scraping” of its website. Heritage claims it suspended multiple fake accounts associated with defendants after it discovered such accounts had scraped millions of listings in 2016. Such data allegedly included active as well as past listings, which may be valuable for users wishing to ascertain the market price for a certain collectible.
Plaintiffs brought a host of claims typically associated with scraping disputes, including: copyright infringement, Computer Fraud and Abuse Act (CFAA) unauthorized access claims, DMCA anticircumvention claims, as well as related state law breach of contract (i.e., the website use agreement) and trespass claims.
Collectrium contends that its software and searchable auction database, like well-known airline or hotel price websites and apps, merely aggregates publicly available data to create a comprehensive resource for collectors to “benchmark the value of their collections and assess markets in a single experience” (and that each listing provides a link back to the originating auction house). The defendants stress that the Collectrium database is “not an aggregation of listings from auction houses,” rather Collectrium simply reviews publicly available data from auction houses “to extract historical facts surrounding prior auctions” to create a certain searchable format (along with its own custom analytics) about collectibles across the industry. In fact, in its defense, the defendants have contended that its incorporation of prior listings to create a valuation research tool for collectors is transformative and a fair use.
Defendants filed a motion to dismiss and compel arbitration based upon the terms of Heritage’s website user agreement. Defendants argue that the complaint alleged defendants registered accounts on Heritage’s website and thus agreed to be bound by the site’s terms, which contain a purportedly broad arbitration clause.
Subsequently, the plaintiffs filed a motion for a preliminary injunction barring defendants and their agents from accessing and scraping data from the Heritage website and publishing any data obtained from the site, and destroying all copies of plaintiffs’ data taken from its site. In short, Heritage claims it will likely succeed on its copyright claims based upon side-by-side comparisons to listings on Collectrium and on its CFAA claim because defendants allegedly knew their scraping activity was unauthorized based upon the site terms, defendant’s IP randomization efforts to cover their tracks and defendants’ opening of new accounts using false names after original ones were suspended by Heritage. In its opposition, defendants claim that Christie’s and the plaintiffs are not direct competitors as they operate in different categories, and that, contrary to plaintiff’s claims, Collectrium drives traffic to plaintiffs’ website, as users are directed back to Heritage’s site when researching a Heritage item. Moreover, defendants claim that a preliminary injunction is not warranted because it had already removed plaintiffs’ listings from the Collectrium database when the suit was commenced and removed additional items that were “mistakenly missed” upon the first pass, thus making preliminary relief unnecessary. Interestingly, to rebut the merits of plaintiff’s copyright claims, defendants argue that plaintiffs impliedly licensed their website content because none of the allegedly infringing content came from portions of the website excluded by a robots.txt file directing crawlers not scrape those pages.
Despite the relative maturity of internet commerce, the law on scraping remains undeveloped and often has not provided clear remedies for certain spidering activities. Still, Heritage’s copyright infringement claims based upon defendant’s alleged copying of listings previously registered with the Copyright Office would likely be on clearer footing than the other asserted causes of action (though, defendants have claimed fair use and have attempted to rebut the validity of those copyright registrations). It remains to be seen if the parties will reach a resolution or whether the matter will go to arbitration or stay in federal court.
UPDATE: Subsequently, the court granted defendant’s motion to compel arbitration based upon Heritage’s website use agreement.