Does HIPAA itself prohibit AI training on PHI?

HIPAA's Privacy Rule restricts use of PHI to permitted purposes. Whether AI model training is a permitted purpose depends on the BAA terms. If the BAA doesn't specifically prohibit training, the vendor may interpret training as a permitted purpose under 'improving the Service.' Specific contract language is required to close the gap.

How do I add the no-training clause to a vendor's standard BAA?

Most vendors will accept a redline adding no-training language. The negotiation surfaces whether the vendor's architecture actually supports the clause. Vendors with model improvement loops integrated into product architecture may push back; vendors with standard no-training infrastructure will accept readily.

3 BAA Clauses That Block AI Training on PHI

The 3 clauses that close the AI-training gap (and the redline diagnostic)

Most pharma compliance teams have refined BAA templates that handle PHI permitted use, safeguards, breach notification, and subcontractor flow-down. The templates are mature. They’re also missing the 3 specific clauses that close the AI-training gap — and the missing clauses are the difference between your interview transcripts staying private and flowing into vendor model training pipelines as a default.

The omission isn’t drafting failure. It’s a vintage problem. BAA templates were written before LLMs were embedded in research vendor pipelines. In 2018, “the vendor uses customer data to improve the Service” was a basic SaaS product improvement question — addressed implicitly by the broader “permitted use” language. In 2026, the same phrase can mean training proprietary AI models on customer PHI, fine-tuning sub-processor models, or feeding customer interview content into model improvement pipelines via agentic workflows. None of these are explicitly addressed by standard BAA templates, and most generic AI research tools have model improvement loops built into their core architecture.

The fix is 3 specific clauses, drafted below. The redline negotiation is also the diagnostic: a vendor that accepts the 3 clauses within hours has standing no-training infrastructure. A vendor that pushes back, asks to redraft, or escalates to engineering review has model improvement loops integrated into product architecture and will struggle to comply even after signing.

3 BAA Clauses to Block AI Training on PHI

Add to your standard BAA template; track vendor response time as a compliance signal

Prohibition on Business Associate model training. Explicit no-training clause covering vendor's proprietary models, foundation models, and any model informing product capabilities — extending to PHI and de-identified data derived from PHI.
Sub-processor flow-down. Required no-training contractual terms with all sub-processors handling Customer Data, including AI model providers (Anthropic, OpenAI, Google).
Audit rights for AI training compliance. Customer's right to audit vendor's no-training compliance via redacted sub-processor terms, model training logs, and architectural documentation.

Vendors with standing no-training infrastructure accept the 3-clause redline within hours. Vendors with model improvement loops in product architecture push back, escalate to engineering, or hedge — that response is the diagnostic. Full clause language in the sections below.

Why do generic AI research tools default to training on customer data?

Most generic AI research tools — built consumer-first — have model improvement loops as core product architecture. Customer interaction data flows into training pipelines as a default. The architectural decisions that make consumer-brand AI products good (model fine-tuning on customer interactions, sub-processor model improvements, analytics platforms accessing content for product debugging) are the architectural decisions that create the AI training gap in healthcare research.

Without explicit no-training BAA language, the gap is structural:

The vendor’s product team views customer data as the path to product improvement
Sub-processor agreements (with AI model providers like Anthropic, OpenAI, Google) typically don’t include explicit no-training terms unless renegotiated
Analytics platforms (Mixpanel, Amplitude, similar) routinely log content for debugging
Internal model fine-tuning pipelines may incorporate customer interactions as training data

Each of these is reasonable consumer-brand SaaS practice. None are compatible with healthcare research where customer interview data may include PHI.

The contract language that fixes the gap

Three specific clauses to add to your BAA template:

1. Prohibition on Business Associate model training

“Business Associate shall not use Customer Data, including PHI and de-identified data derived from PHI, to train any artificial intelligence or machine learning model, including but not limited to: (a) Business Associate’s proprietary models; (b) general-purpose foundation models; (c) any model that informs the Business Associate’s product capabilities. Customer Data is firewalled from any model training pipelines, including those operated by Business Associate’s sub-processors.”

This clause prohibits the vendor from using customer interview data to train any AI model — proprietary or otherwise.

2. Sub-processor model training cascading

“Business Associate shall ensure that all sub-processors that may handle Customer Data, including AI model providers, operate under contractual terms prohibiting the use of Customer Data for AI model training. Business Associate shall make available to Customer, upon request, redacted excerpts demonstrating the no-training contractual terms with each AI model sub-processor.”

This clause requires the vendor to extend no-training prohibitions to sub-processors and provide redacted contract evidence on request.

3. Audit rights for AI training compliance

“Customer’s audit rights under this Agreement extend to verification of Business Associate’s compliance with the AI model training prohibitions specified herein. Business Associate shall maintain records sufficient to demonstrate that Customer Data has not been used for AI model training and shall provide such records on reasonable notice.”

This clause makes AI training compliance specifically auditable, separate from the general audit rights covering other BAA terms.

How do vendors respond when you propose no-AI-training BAA language?

The proposal surfaces structural information about the vendor’s architecture:

Vendors with healthcare-purpose-built no-training infrastructure accept the language readily. Their architecture supports the clause; the no-training contractual terms with sub-processors already exist; the audit-trail documentation is already maintained. Standard infrastructure.
Vendors with consumer-first architecture push back. Common patterns: requesting “reasonable improvements” carve-outs, proposing aggregated/anonymized data exception, deflecting to “we’ll need to renegotiate with sub-processors.” Each of these is a structural signal that the vendor’s architecture doesn’t currently support no-training compliance.
Vendors that flatly decline are signaling either that their architecture cannot support the clause or that customer data flows into training are core to their commercial model. Either way, they’re not appropriate for healthcare research engagements requiring this protection.

The negotiation pattern is the diagnostic. Vendors that accept the language quickly have made the architectural investment. Vendors that resist or deflect have not.

Why is the AI-training gap invisible in most BAA reviews?

Pharma compliance teams reviewing BAAs typically focus on the established clauses: BAA execution, breach notification, subcontractor flow-down, US data residency, audit rights. AI training language is a 2025-2026 addition that hasn’t propagated through standard procurement checklists yet.

The gap surfaces when:

A pharma sponsor’s regulatory team discovers in audit that customer research data flowed into a vendor’s model training pipeline. Submission integrity questioned.
A competitor petition challenges the validity of qualitative research informing FDA submissions, citing the AI training gap as evidence of contaminated input.
An academic research institution asks the same question during IRB review and finds the vendor cannot answer.

By the time the gap surfaces, the engagement is complete. The remediation is expensive: re-process or replace affected research, document the gap-and-fix, potentially withdraw and resubmit affected regulatory submissions.

Front-loading the AI training language in BAA negotiation is a procurement-velocity tax (~5-10 days of additional negotiation per engagement) that pays back many-fold by avoiding the surfacing risk.

What to do this week

If your team has active research engagements with AI-native research vendors:

Audit current BAAs for explicit no-training language. Most won’t have it.
Send proposed BAA amendment to active vendors with the three clauses above. Track who accepts, who pushes back, who deflects.
Add no-training language to your standard BAA template for future vendor engagements.

The gap exists in most active healthcare research engagements as of mid-2026. The fix is straightforward contract language. The diagnostic value of vendor responses to the proposal is the real procurement asset.

See the BAA Checklist for Research Vendors Reference Guide for Carevoices’ BAA approach.