In the first half of 2023, a deluge of new generative artificial intelligence (“GAI”) tools hit the market, with companies ranging from startups to tech giants rolling out new products. In the large language model space alone, we have seen OpenAI’s GPT-4, Meta’s LLaMA, Anthropic’s Claude 2, Microsoft’s Bing AI, and others.
A proliferation of tools has meant a proliferation of terms and conditions. Many popular tools have both a free version and a paid version, which each subject to different terms, and several providers also have ‘enterprise’ grade tools available to the largest customers. For businesses looking to trial GAI, the number of options can be daunting.
This article sets out three key items to check when evaluating a GAI tool’s terms and conditions. Although determining which tool is right for a particular business is a complex question that requires an analysis of terms and conditions in their entirety – not to mention nonlegal considerations like pricing and technical capabilities – the below items can provide prospective customers with a starting place, as well as bellwether to help spot terms and conditions that are more or less aggressive than the market standard.
- Training Rights
Along with new GAI tools, 2023 has been a year of new GAI lawsuits. OpenAI and other GAI providers currently face claims alleging unauthorized and improper use of plaintiffs’ proprietary data as GAI model training material, with claims variously based on copyright, contract, and privacy law. And lawsuits aren’t the only way that GAI providers have lately faced increased scrutiny over how and where they obtain training data to develop their GAI products. For example, in April, the popular social media platform Reddit announced a plan to begin charging for access to its API, which is generally how GAI providers import its data into their models (i.e., Reddit has decided that user posts shouldn’t be given away for free to GAI providers whose products might undermine the popularity of their platform). Other popular news publishers and media companies have taken defensive technical measures (e.g., amending robots.txt files to “disallow” certain bots) to block OpenAI’s GPTBot web crawler from ingesting news stories for future training purposes. On top of these new hurdles, the FTC is reportedly looking into OpenAI’s collection of user data (among other issues, such as publication of false information and potentially anti-competitive practices surrounding GAI).
In light of these challenges, many of which pertain to the training of GAI models, it is perhaps not surprising that some GAI providers have revised their tools’ terms to reassure users about how user data may – or more precisely, may not be – used. For example, Microsoft has updated its default commercial terms for its Azure OpenAI service (which provides licensed access to OpenAI’s GPT models) to explicitly state that user inputs are not used for training, and GitHub has done the same for its GAI coding tool, Copilot. OpenAI has made a similar update to its template Enterprise Agreement. Even Anthropic (provider of ChatGPT competitor Claude), the newest player on the scene whose terms assert a broad right to use user data to develop new products and services, explicitly excludes model training. On the other side of the coin, this summer Zoom faced backlash over asserting a broad right to turn user data into training material – a position it eventually walked back.
The upshot is that it is now off-market for a GAI provider to assert a right to use customer data for training purposes without at least providing an opt-out mechanism. The biggest providers have abandoned this position, but as GAI companies proliferate, customers should be watchful.
2. Use Restrictions
For many businesses, one of the most attractive aspects of GAI in general and large language models in particular is use case flexibility. Unlike most machine learning technology, which traditionally has been designed and deployed for a particular job or small set of jobs (e.g., speech-to-text, facial recognition), many of the newest large language models are capable of a surprising range of tasks, from responding to customer queries in chat interfaces to fixing bugs in software code.
One area where restrictions are common – and potentially problematic – is development of new products and services. Many providers do not want their tools to be used to build competitive technology, and so terms and conditions often restrict use of the applicable provider’s services accordingly. The ubiquity of these sorts of restrictions shows that the market has congealed to some degree on this issue, such that most businesses likely will not get far in a negotiation with a GAI provider if they attempt to reject the concept of a competitive use restriction outright. However, the specific language of these restrictions is critical and can vary significantly, from narrow prohibitions referring specifically and narrowly to training large language models that directly compete with the provider’s model, to general prohibitions on developing or improving similar products or services of any kind. While restrictions of the former type generally would not be a problem for most end users (unless of course they are intending to build their own GAI models), the latter can trip up businesses precisely because the providers’ GAI tools have so many possible functionalities – such that many of the end users’ products and services into which the providers tools might be integrated could be considered similar. In sum, all use restrictions – but especially competitive prohibitions – should be reviewed carefully before a business begins using a particular GAI tool, to ensure that the business is on the same page with the provider about how exactly the tool may and may not be used. Where competition prohibitions for a given tool are broad and vague and the provider is inflexible, some businesses might consider looking at other options.
3. Responsibility for Outputs
At this point, it is well established that even the most advanced GAI tools can produce outputs with a litany of flaws, from made-up facts (known as ‘hallucination’) to elements copied verbatim from training data (known as ‘memorization’) – and that these flaws can cause harm and sometimes lead to lawsuits. Sometimes, these flaws are (or at least should be) easily spotted through basic diligence and removed before they do any harm, but this will not always be practicable. For example, it may not be feasible for an end user to detect that a GAI tool has generated output that infringes a third party’s intellectual property – until the owner of that IP comes knocking. At the same time, most GAI providers are not in a position to police every individual output themselves, and so far, no one has developed a foolproof filter or other technical solution (at least not publicly). The upshot is that responsibility for outputs is often one of the most hotly negotiated subjects in GAI contracts, since the risks can be difficult to mitigate.
Unsurprisingly, most GAI providers disclaim all responsibility for their tools’ outputs, such that customers must use the outputs at their own risk. However, Microsoft recently partially broke from this trend with its Copilot Copyright Commitment, assuring paid customers that they can “use Microsoft’s Copilot services and the output they generate without worrying about copyright claims.” Microsoft Copilot actually isn’t the only tool to provide this sort of assurance, but it nevertheless may presage a market shift in response to customer concerns, similar to the shift on model training rights discussed above. In any case, businesses exploring GAI tool options should consider output responsibility generally and the potential for harm to third parties specifically, and should make sure that internal guardrails are appropriately strong where GAI providers do not offer output assurances.
* * *
Determining whether a given GAI tool is right for a particular business requires an analysis of the tool’s terms and conditions in their entirety, as well as nonlegal considerations like pricing and technical capabilities. That said, training rights, use restrictions and output responsibility will virtually always be relevant considerations, and where a GAI provider stands on these issues may help prospective customers evaluate that provider’s terms in the context of the broader GAI market.
 Microsoft Copilot’s commitment actually extends to over types of IP as well, but does not cover non-IP claims and includes certain conditions, such as the user having sufficient rights to their inputs.