Operators of public-facing websites are typically concerned about the unauthorized, technology-based extraction of large volumes of information from their sites, often by competitors or others in related businesses.  The practice, usually referred to as screen scraping, web harvesting, crawling or spidering, has been the subject of many questions and a fair amount of litigation over the last decade.

However, despite the litigation in this area, the state of the law on this issue remains somewhat unsettled: neither scrapers looking to access data on public-facing websites nor website operators seeking remedies against scrapers that violate their posted terms of use have very concrete answers as to what is permissible and what is not.

In the latest scraping dispute, the e-commerce site QVC objected to the Pinterest-like shopping aggregator Resultly’s scraping of QVC’s site for real-time pricing data.  In its complaint, QVC claimed that Resultly “excessively crawled” QVC’s retail site (purpotedly sending search requests to QVC’s website at rates ranging from 200-300 requests per minute to up to 36,000 requests per minute) causing a crash that wasn’t resolved for two days, resulting in lost sales.  (See QVC Inc. v. Resultly LLC, No. 14-06714 (E.D. Pa. filed Nov. 24, 2014)). The complaint alleges that the defendant disguised its web crawler to mask its source IP address and thus prevented QVC technicians from identifying the source of the requests and quickly repairing the problem.  QVC brought some of the causes of action often alleged in this type of case, including violations of the Computer Fraud and Abuse Act (CFAA), breach of contract (QVC’s website terms of use), unjust enrichment, tortious interference with prospective economic advantage, conversion and negligence and breach of contract.  Of these and other causes of action typically alleged in these situations, the breach of contract claim is often the clearest source of a remedy.

It’s a problem that has vexed website owners since the days of the dot-com boom – how to make certain user-generated content available to users or subscribers, but also prevent competitors and other unauthorized parties from scraping, linking to or otherwise accessing that content for their own commercial purposes.

The

How can a website operator lose the broad immunity for liability associated with user-generated content conferred by Section 230 of the Communications Decency Act (CDA)?

Section 230 has been consistently interpreted by most courts to protect website operators against claims arising out of third-party content, despite some less than honorable

A simple copyright notice (e.g., “© [Year of First Publication] [Owner]”) on a website can imply an assertion of ownership in individual elements of the website and constitute “copyright management information” under the Digital Millennium Copyright Act (DMCA), a Texas district court held.  A Texas investment company learned this lesson

The U.S. Securities and Exchange Commission gave disclosures made through social media platforms such as Facebook and Twitter a conditional “thumbs up” in a Report of Investigation it released on April 2, 2013.  Issuers of securities, the SEC stated, can use social media to disseminate material, nonpublic information without having

Pinterest is the hot hot hot social media site that lets users create online “pinboards” of interesting or inspiring images. Although users may upload their own images to their pinboards, Pinterest emphasizes the pinning of images from third-party Web sites through the use of inline links.

This of course generates

"Internet exceptionalism" is the notion that the Internet is a special and unique communications medium to which special rules should apply. In the legal field, that notion is manifested in legal rules that have been crafted by judges, legislatures and regulators for application in situations involving Internet communications.

In some

The Internet Corporation for Assigned Names and Numbers (ICANN), the organization that is responsible for the allocation of Internet domain names and IP addresses, is about to launch a new program that will  permit organizations to create and operate generic top-level domains (“gTLD”s) (e.g., .com, .net, etc.).  Last week, ICANN released a draft version of the “Draft Applicant Guidebook for new Generic Top-level Domains” (the “Guidebook”) which sets out proposed policies and processes for the new gTLD program.

There are currently 21 top level domains, including the familiar .com, .net and .org domains, and the less well-populated .info, .kids and .biz domains. Under the new gTLD program applicants can design and “self-select” a new domain that they feel is appropriate for their customers or for their target market.  By way of example, the “XYZ” company might choose to apply for and operate the “.xyz” domain on behalf of itself and its related corporate entities, or a trade organization might choose to apply for and operate a domain reflecting the nature of its membership.

You should be aware of the new gTLD program and the draft Guidelines.

The practice of search engine crawling and caching of Web site content has infrequently been litigated. (The Perfect 10 case is a significant exception.) This may be because most Web site operators want their content to be indexed and available on search engines. Those Web site operators that do not want their content copied and indexed can stop crawling and caching by deploying a robots.txt file. By generally accepted convention, a robots.txt file is consulted by most search engine crawlers for instructions from a Web site operator as to whether, and to what extent, Web site content may be copied and indexed. The inclusion of the "noindex" metatag in a robots.txt file instructs a crawler that the content may not be copied and indexed.

In Parker v. Yahoo!, Inc., 2008 U.S. Dist. LEXIS 74512 (E.D. Pa. Sep. 26, 2008), the district court held that a Web site operator’s failure to deploy a robots.txt file containing instructions not to copy and cache Web site content gave rise to an implied license to index that site. The court did say, however, that the implied license in the case may have, at some point, been terminated by the operator.