The EU AI Act's first substantive enforcement phase arrived in February 2026, bringing with it new obligations for providers of general-purpose AI models. The rules aren't targeted at marketers, but they create conditions that marketing and content teams should understand.
The short version: AI providers operating in the EU market must now publish summaries of what data was used to train their models, and they must comply with EU copyright law in their training data practices. For brands, the implications run in two directions: what these rules reveal about how AI models were trained, and what they might eventually tell brands about how their content is being used.
What the rules actually require
The EU AI Act distinguishes between AI systems with limited risk and general-purpose AI models (GPAI) that can be adapted across tasks. The large language models that power ChatGPT, Gemini, Perplexity, and Claude all fall into the GPAI category.
For GPAI providers, the February 2026 enforcement phase requires:
Training data summaries that describe, at a category level, where training data came from. This doesn't mean a line-by-line list of every URL scraped, but it does require disclosure of the broad sources and data types used, including whether web crawl data, licensed content, or synthetic data was included.
Copyright compliance documentation showing that training data was acquired in a way consistent with EU copyright law, including the text and data mining exception, which allows training on publicly available content unless rightsholders have explicitly opted out.
Watermarking and content provenance for AI-generated outputs. This applies primarily to AI-generated images and video, but the underlying provenance infrastructure is being built out for text as well.
What this does not require is granular disclosure of specific websites or publications used in training. The summaries are categorical, not itemized.
What this means for brands whose content trains AI
If your brand has published substantial public content, there's a reasonable chance some of it has been ingested into one or more LLM training datasets. This has been true for years. What changes now is that the mechanisms around this are becoming more formalized.
The opt-out framework matters here. EU copyright law allows rightsholders to reserve their content from text and data mining by specifying this on their website. Major AI providers are required to honor these reservations. Brands that want to opt out of having their public content used for future training have a clearer path to doing so than before.
Most marketing teams will not choose to opt out. Content that trains AI models is content that influences how those models understand and describe your brand. Opting out might reduce the risk of your content being used in ways you don't control, but it also removes a channel through which AI models learn who you are.
The more useful question is whether your content accurately represents your brand's positioning, capabilities, and target use cases. If the content on your website or in third-party publications describes your product in outdated terms, or emphasizes features that have since been replaced, that's what AI models are learning from. The EU transparency rules don't fix this, but they reinforce why it's worth auditing.
The training data summary requirement creates an indirect benefit for brands: it becomes clearer which categories of content each AI provider uses. If a provider's summary indicates heavy reliance on licensed publication data from specific sectors, brands with strong editorial presence in those sectors have a clearer signal about where to invest their content efforts.
Content provenance and the future of attribution
The content provenance requirements in the EU AI Act are currently focused on AI-generated images and synthetic media, where the risk of deception is more immediate. But the infrastructure being built for provenance, including the C2PA (Coalition for Content Provenance and Authenticity) standard, is designed to scale to text.
If text provenance becomes more widely adopted, it creates the possibility of tracking when and where a piece of content was cited or used in an AI-generated output. This would be a significant shift. Right now, there's no reliable way to know whether a specific blog post or product page contributed to a particular AI recommendation. Provenance infrastructure could eventually make that visible.
This is not imminent for text, and it will require adoption across the entire publishing and AI ecosystem to be useful. But the EU AI Act is pushing the infrastructure in that direction, which matters for brands thinking about long-term content strategy.
Practical steps for content teams
The EU AI Act doesn't require brands to take any immediate action. It places obligations on AI providers, not the companies that use AI tools or publish content. That said, there are a few things worth doing in response to this enforcement milestone.
Review your robots.txt and any opt-out signals. If you want to participate in AI training datasets (the default for publicly accessible content), make sure your opt-out signals aren't blocking AI crawlers accidentally. Some technical SEO configurations that were set up for other purposes inadvertently exclude AI training crawlers.
Audit third-party content about your brand for accuracy. AI models trained on inaccurate third-party content will describe your brand inaccurately. The EU rules don't address this, but the increased focus on training data provenance is a good prompt to check what's out there.
Don't expect transparency summaries to tell you exactly how AI models perceive your brand. The summaries will confirm which categories of data were used in training, not whether your specific content was included or what it contributed. The only way to know how AI models actually describe your brand is to ask them.
See where your brand stands in AI search
Track how ChatGPT, Gemini, Perplexity, and Claude recommend your brand vs competitors.
Start tracking freeThe most useful framing for marketing teams is this: the EU AI Act is a regulatory forcing function that makes AI companies be more explicit about their relationship with the content they train on. It doesn't give brands direct control over how they're represented in AI outputs. Measurement and active content strategy remain the primary tools.
How transparency could eventually help visibility tracking
The long arc of the EU AI Act's transparency requirements points toward a world where the connection between content and AI output is more legible. Training data summaries are a first step. Content provenance tracking is a longer horizon. Rights management frameworks for AI-generated content are further still.
For brand visibility in AI search, the relevant milestone will be if and when AI providers are required to disclose at the query level which sources informed a given response. Some providers, like Perplexity, already do this voluntarily for retrieval-augmented results. Regulation may eventually push closed-model providers toward more disclosure.
Until that happens, the opacity of AI recommendation systems means brands need to measure their own visibility from the outside. Whaily does this by running structured queries across models and tracking what each model says about your brand over time.
FAQ
Does the EU AI Act apply to companies outside the EU? Yes. Like GDPR, the AI Act applies to providers whose AI systems are used in the EU, regardless of where the provider is headquartered. OpenAI, Anthropic, Google, and other major providers are all subject to GPAI obligations if they operate in the EU market.
What happens if AI providers don't comply with the training data summary requirement? Non-compliance can trigger fines from EU member state authorities, similar to the GDPR enforcement model. Fines for GPAI providers that fail to meet transparency obligations can reach 3% of annual global turnover.
Should we add AI training opt-out signals to our website? This is a business decision. Opting out means your public content won't be used in future training cycles by providers that honor opt-out signals. The tradeoff is that content helps AI models understand your brand. Most brands are better served by ensuring their public content is accurate and comprehensive rather than opting out.
Will the EU AI Act create a training data registry that brands can query? No. The current requirements are for summaries at a categorical level, not a searchable index of training sources. More granular disclosure is not on the enforcement roadmap for the current phase.
See where your brand stands in AI search
Track how ChatGPT, Gemini, Perplexity, and Claude recommend your brand vs competitors.
Start tracking free