The proposed framework treats systems as derivative works, requiring disclosure of architecture, datasets, aiming to strengthen contributor influence amid legal uncertainty.
Researchers from Yale university have put forward a new licensing concept that would compel AI developers who train models on open-source code to disclose key elements of their systems, including model design and training data.
The proposal adapts the long-standing “copyleft” principle from the free software movement to address challenges posed by generative AI.
The framework, known as the Contextual Copyleft AI License (CCAI), was detailed in the International Journal of Law and Information Technology. It treats AI models trained on open-source software as derivative works, a classification that could impose reciprocal obligations on developers using such material.
According to Grant Shanklin, a researcher at Yale’s Digital Ethics Center, the approach is intended to give open-source contributors greater influence over how their code is used in AI development. He noted that extending copyleft to AI could also encourage a broader ecosystem of tools aligned with open-source principles. Shanklin co-authored the paper with Emmie Hine, Claudio Novelli, Tyler Schroder, and Luciano Floridi, who leads Yale’s Digital Ethics Center.
Copyleft licenses like the GNU General Public License require that any derivative software remain open under similar terms. The proposed CCAI applies this logic to generative AI by treating elements such as model architecture and training datasets as outputs derived from the original codebase, thereby attaching transparency requirements.
The licensing concept comes at a time of increasing friction between commercial AI developers and the open-source community, as firms have trained models on publicly available code without offering equivalent transparency or sharing benefits with contributors. The Yale paper examines both the legal viability and policy rationale of extending copyleft into the AI domain. The authors argue that such a framework could reinforce core open-source values if integrated with broader AI governance efforts.
However, the concept remains theoretical. No finalized license text has been issued, and courts have yet to address whether AI models can be legally treated as derivative works of their training data. Adoption by major open-source projects or platforms — and the likelihood of surviving legal scrutiny — remains uncertain.