I. Introduction: The Data–AI Collision
The rapid expansion of artificial intelligence systems has
fundamentally altered how data is collected, processed, and repurposed. At the
center of this transformation lies a legal question that India has only begun
to confront: how should personal data used in AI training be regulated?
India’s Digital Personal Data Protection Act, 2023 (“DPDP
Act”) establishes a foundational framework for personal data governance.
However, it was not drafted with modern machine learning pipelines in mind.
This creates a structural tension: a law designed for transactional data
processing is now being applied to probabilistic, large-scale, and often opaque
AI systems.
This essay argues that while the DPDP Act clearly extends to
aspects of AI training, its application is neither straightforward nor
absolute. Instead, it exposes a set of unresolved legal, technical, and
policy fault lines that will define India’s AI regulatory trajectory.
II. AI Training as “Processing”: A Doctrinal Starting
Point
At a formal level, AI training appears to fall squarely
within the Act’s definition of “processing,” which includes collection,
storage, use, and adaptation of personal data. Training datasets—especially
those scraped from the internet, often contain identifiable or inferable
personal information.
Where an entity determines the purpose and means of such
processing, it qualifies as a data fiduciary, triggering obligations of:
- purpose
limitation
- data
minimization
- accuracy
- security
safeguards
This classification is doctrinally sound. However, it raises
a deeper question: what exactly is being regulated, the dataset, the model,
or the outputs?
The DPDP Act is largely silent on whether:
- trained
model weights derived from personal data remain “personal data,” or
- downstream
inferences constitute fresh processing events
This ambiguity is not incidental, It reflects a broader
mismatch between legal categories and technical architectures.
III. The Myth of “Public Data” in AI Training
A persistent assumption in AI development is that publicly
available data is freely usable. The DPDP framework complicates this view.
The mere accessibility of data does not strip it of its
character as personal data. If information relates to an identifiable
individual, its reuse—particularly at scale—can still fall within regulatory
scope. This position aligns with global privacy norms, including those under
the General Data Protection Regulation.
However, a categorical rejection of public data reuse would
be equally flawed.
The DPDP Act leaves room—albeit ambiguously—for:
- reasonable
uses consistent with context
- potential
exemptions for research or statistical purposes
- processing
of anonymised data
The real issue, therefore, is not whether public data can be
used, but under what conditions such use remains lawful. The article’s
strongest contribution lies in dismantling the “free data” myth, but a complete
analysis must also acknowledge the spectrum of permissible uses.
IV. Consent, Scale, and the Limits of Traditional
Compliance
A strict reading of the DPDP Act suggests that personal data
processing generally requires consent. Applied literally, this would render
most large-scale AI training exercises legally untenable.
But this interpretation quickly encounters practical limits:
- Training
datasets may contain billions of data points from diffuse sources
- Data
subjects are often unidentifiable or uncontactable
- Models
cannot easily “unlearn” specific data once trained
This creates a structural incompatibility between individual-centric
consent frameworks and aggregate, statistical learning systems.
If enforced rigidly, consent requirements could:
- significantly
constrain domestic AI development
- incentivize
regulatory arbitrage
- push
innovation into less accountable jurisdictions
Conversely, a diluted interpretation risks undermining the
very privacy protections the Act seeks to guarantee.
The law, as it stands, offers no clear resolution—only a policy
dilemma.
V. The Problem of Data Subject Rights in Machine Learning
Systems
The DPDP Act grants individuals rights such as:
- access
to their data
- correction
and erasure
- grievance
redressal
In conventional systems, these rights are administratively
manageable. In AI systems, they are technically fraught.
For instance:
- Erasure:
Removing an individual’s data from a trained model may require retraining
or complex machine unlearning techniques, which are still experimental.
- Access:
It is unclear how a model can meaningfully disclose whether and how a
specific individual’s data influenced its outputs.
These challenges are not merely operational—they call into
question whether existing rights frameworks are conceptually compatible with
machine learning systems.
Without interpretive guidance, compliance risks becoming
either:
- superficial
(formal but ineffective), or
- prohibitively
burdensome
VI. Regulatory Ambiguity and the Risk of Overcorrection
A defining feature of the current landscape is uncertainty.
Key aspects remain unsettled:
- the
scope of “legitimate uses”
- the
treatment of inferred or derived data
- enforcement
priorities and thresholds
In such an environment, two risks emerge:
- Overcompliance:
Firms adopt excessively restrictive practices, stifling innovation
unnecessarily
- Undercompliance:
Firms exploit ambiguity, leading to privacy harms and eventual regulatory
backlash
The absence of AI-specific provisions in the DPDP Act
suggests that much will depend on:
- subordinate
legislation
- regulatory
guidance
- judicial
interpretation
Until then, the law operates less as a rulebook and more as
a framework for contestation.
VII. India in Comparative Perspective
Unlike jurisdictions that are developing AI-specific
regulatory regimes, India currently relies on a horizontal data protection
framework.
This approach has advantages:
- flexibility
- technology
neutrality
- reduced
regulatory fragmentation
But it also has limitations:
- lack
of clarity on automated decision-making
- no
explicit provisions on algorithmic accountability or bias
- limited
guidance for high-risk AI systems
As global standards evolve, India will need to decide
whether to:
- adapt
the DPDP framework incrementally, or
- introduce
dedicated AI legislation
The current silence is unlikely to remain sustainable.
VIII. Conclusion: Toward a Coherent AI–Data Governance
Framework
The application of the DPDP Act to AI training reveals a
deeper truth: data protection law, in its current form, is necessary but
insufficient for governing artificial intelligence.
The Act succeeds in establishing foundational principles of
accountability and user rights. However, its interaction with AI systems
exposes:
- conceptual
gaps
- technical
incompatibilities
- policy
trade-offs
Rather than viewing these as failures, they should be
understood as signals of transition.
India now faces a critical choice:
- interpret
existing law in ways that balance innovation and protection, or
- develop
a more tailored regulatory architecture for AI
Either path will require moving beyond binary positions—such
as “all data use requires consent” or “public data is free”—toward a more context-sensitive,
risk-based framework.
The future of AI governance in India will not be determined
by statutory text alone, but by how these unresolved questions are negotiated
in practice.
“The future of AI won’t be
decided by algorithms—it will be decided by ethics.”
Footnotes
[1] Digital Personal Data Protection Act, 2023, § 2(i).
[2] See e.g., European Data Protection Board, Guidelines on AI and Data Processing (2024).
[3] General Data Protection Regulation, Arts. 4, 6.
[4] DPDP Act, §§ 7, 17.
[5] Id., § 6.
[6] Wachter, Sandra et al., “Why a Right to Explanation of Automated Decision-Making Does Not Exist in the GDPR,” (2017).
[7] DPDP Act, §§ 11–13.
[8] Veale, Michael & Borgesius, Frederik Zuiderveen, “Demystifying the Right to Erasure in Machine Learning,” (2021).

.jpg)





