Pages

Friday, April 17, 2026

Training Artificial Intelligence Under India’s Data Protection Regime: Navigating the DPDP Act’s Silent Fault Lines

 




I. Introduction: The Data–AI Collision

The rapid expansion of artificial intelligence systems has fundamentally altered how data is collected, processed, and repurposed. At the center of this transformation lies a legal question that India has only begun to confront: how should personal data used in AI training be regulated?

India’s Digital Personal Data Protection Act, 2023 (“DPDP Act”) establishes a foundational framework for personal data governance. However, it was not drafted with modern machine learning pipelines in mind. This creates a structural tension: a law designed for transactional data processing is now being applied to probabilistic, large-scale, and often opaque AI systems.

This essay argues that while the DPDP Act clearly extends to aspects of AI training, its application is neither straightforward nor absolute. Instead, it exposes a set of unresolved legal, technical, and policy fault lines that will define India’s AI regulatory trajectory.

II. AI Training as “Processing”: A Doctrinal Starting Point

At a formal level, AI training appears to fall squarely within the Act’s definition of “processing,” which includes collection, storage, use, and adaptation of personal data. Training datasets—especially those scraped from the internet, often contain identifiable or inferable personal information.

Where an entity determines the purpose and means of such processing, it qualifies as a data fiduciary, triggering obligations of:

  • purpose limitation
  • data minimization
  • accuracy
  • security safeguards

This classification is doctrinally sound. However, it raises a deeper question: what exactly is being regulated, the dataset, the model, or the outputs?

The DPDP Act is largely silent on whether:

  • trained model weights derived from personal data remain “personal data,” or
  • downstream inferences constitute fresh processing events

This ambiguity is not incidental, It reflects a broader mismatch between legal categories and technical architectures.

III. The Myth of “Public Data” in AI Training

A persistent assumption in AI development is that publicly available data is freely usable. The DPDP framework complicates this view.

The mere accessibility of data does not strip it of its character as personal data. If information relates to an identifiable individual, its reuse—particularly at scale—can still fall within regulatory scope. This position aligns with global privacy norms, including those under the General Data Protection Regulation.

However, a categorical rejection of public data reuse would be equally flawed.

The DPDP Act leaves room—albeit ambiguously—for:

  • reasonable uses consistent with context
  • potential exemptions for research or statistical purposes
  • processing of anonymised data

The real issue, therefore, is not whether public data can be used, but under what conditions such use remains lawful. The article’s strongest contribution lies in dismantling the “free data” myth, but a complete analysis must also acknowledge the spectrum of permissible uses.

IV. Consent, Scale, and the Limits of Traditional Compliance

A strict reading of the DPDP Act suggests that personal data processing generally requires consent. Applied literally, this would render most large-scale AI training exercises legally untenable.

But this interpretation quickly encounters practical limits:

  • Training datasets may contain billions of data points from diffuse sources
  • Data subjects are often unidentifiable or uncontactable
  • Models cannot easily “unlearn” specific data once trained

This creates a structural incompatibility between individual-centric consent frameworks and aggregate, statistical learning systems.

If enforced rigidly, consent requirements could:

  • significantly constrain domestic AI development
  • incentivize regulatory arbitrage
  • push innovation into less accountable jurisdictions

Conversely, a diluted interpretation risks undermining the very privacy protections the Act seeks to guarantee.

The law, as it stands, offers no clear resolution—only a policy dilemma.

V. The Problem of Data Subject Rights in Machine Learning Systems

The DPDP Act grants individuals rights such as:

  • access to their data
  • correction and erasure
  • grievance redressal

In conventional systems, these rights are administratively manageable. In AI systems, they are technically fraught.

For instance:

  • Erasure: Removing an individual’s data from a trained model may require retraining or complex machine unlearning techniques, which are still experimental.
  • Access: It is unclear how a model can meaningfully disclose whether and how a specific individual’s data influenced its outputs.

These challenges are not merely operational—they call into question whether existing rights frameworks are conceptually compatible with machine learning systems.

Without interpretive guidance, compliance risks becoming either:

  • superficial (formal but ineffective), or
  • prohibitively burdensome

VI. Regulatory Ambiguity and the Risk of Overcorrection

A defining feature of the current landscape is uncertainty.

Key aspects remain unsettled:

  • the scope of “legitimate uses”
  • the treatment of inferred or derived data
  • enforcement priorities and thresholds

In such an environment, two risks emerge:

  1. Overcompliance: Firms adopt excessively restrictive practices, stifling innovation unnecessarily
  2. Undercompliance: Firms exploit ambiguity, leading to privacy harms and eventual regulatory backlash

The absence of AI-specific provisions in the DPDP Act suggests that much will depend on:

  • subordinate legislation
  • regulatory guidance
  • judicial interpretation

Until then, the law operates less as a rulebook and more as a framework for contestation.

VII. India in Comparative Perspective

Unlike jurisdictions that are developing AI-specific regulatory regimes, India currently relies on a horizontal data protection framework.

This approach has advantages:

  • flexibility
  • technology neutrality
  • reduced regulatory fragmentation

But it also has limitations:

  • lack of clarity on automated decision-making
  • no explicit provisions on algorithmic accountability or bias
  • limited guidance for high-risk AI systems

As global standards evolve, India will need to decide whether to:

  • adapt the DPDP framework incrementally, or
  • introduce dedicated AI legislation

The current silence is unlikely to remain sustainable.

VIII. Conclusion: Toward a Coherent AI–Data Governance Framework

The application of the DPDP Act to AI training reveals a deeper truth: data protection law, in its current form, is necessary but insufficient for governing artificial intelligence.

The Act succeeds in establishing foundational principles of accountability and user rights. However, its interaction with AI systems exposes:

  • conceptual gaps
  • technical incompatibilities
  • policy trade-offs

Rather than viewing these as failures, they should be understood as signals of transition.

India now faces a critical choice:

  • interpret existing law in ways that balance innovation and protection, or
  • develop a more tailored regulatory architecture for AI

Either path will require moving beyond binary positions—such as “all data use requires consent” or “public data is free”—toward a more context-sensitive, risk-based framework.

The future of AI governance in India will not be determined by statutory text alone, but by how these unresolved questions are negotiated in practice.

“The future of AI won’t be decided by algorithms—it will be decided by ethics.”

Footnotes

[1] Digital Personal Data Protection Act, 2023, § 2(i).
[2] See e.g., European Data Protection Board, Guidelines on AI and Data Processing (2024).
[3] General Data Protection Regulation, Arts. 4, 6.
[4] DPDP Act, §§ 7, 17.
[5] Id., § 6.
[6] Wachter, Sandra et al., “Why a Right to Explanation of Automated Decision-Making Does Not Exist in the GDPR,” (2017).
[7] DPDP Act, §§ 11–13.
[8] Veale, Michael & Borgesius, Frederik Zuiderveen, “Demystifying the Right to Erasure in Machine Learning,” (2021).

No comments:

Post a Comment