Back to Publications

PDPC update: Advisory guidelines on use of personal data in generative AI now open for public consultation

05 Jun 2026

1. On 2 June 2026, the Personal Data Protection Commission ("PDPC") issued its proposed Advisory Guidelines on use of Personal Data in Generative AI (the "Guidelines") for public consultation.

2. In this update, we digest the proposed Guidelines and reframe them in an FAQ-style, so that you can easily see what they mean for organisations that develop or deploy Generative AI involving the use of personal data.

3. If you would like to discuss any aspects of the proposed Guidelines, or require assistance in framing your response to the public consultation, please get in touch with us.

Q1: What do the Guidelines cover?

The Guidelines clarify how the Personal Data Protection Act 2012 ("PDPA") applies where the development and deployment of Generative AI involve the use of personal data.

The Guidelines build on and should be read in conjunction with the PDPC’s Advisory Guidelines on use of Personal Data in AI Recommendation and Decision Systems (which we covered in a previous update), Advisory Guidelines on Key Concepts in the PDPA, the Privacy Enhancing Technologies Adoption Guide and the Guide to Basic Anonymisation.

To ensure a common understanding, the Guidelines define, “Generative AI Models” and “Generative AI Systems” as:

"Generative AI Models" Models, including those trained on a large amount of data using self-supervision at scale, that display significant generality and can competently perform a wide range of distinct tasks regardless of how they are placed on the market, and that can be integrated into a variety of systems or applications.
 
"Generative AI Systems" AI systems or applications based on Generative AI Models, both for direct use and for integration in other AI systems.
 

The Guidelines are intended to address some of the key data protection issues relating to Generative AI today, namely:

  1. the collection and use of personal data to develop Generative AI Models;
  2. the allocation of data protection responsibilities across the Generative AI lifecycle; and
  3. the handling of individuals' requests concerning the processing of their personal data for Generative AI.
The Guidelines are organised according to the typical stages of the Generative AI lifecycle as follows:
Stage of the Generative AI Lifecycle Topics Covered Reference in Guidelines
Development
Collecting and using personal data to develop Generative AI Models
 
  • Publicly Available Exception
  • Notification Obligation
  • Consent Obligation
  • Protection Obligation
 
Part II
Deployment
Processing personal data in deployed Generative AI Models and/or Systems
 
  • Retention Limitation Obligation
  • Protection Obligation
  • Purpose Limitation Obligation
  • Accountability Obligation
 
Part III
Post-Deployment
Addressing individuals’ requests about personal data
 
  • Access and Correction Obligations
Part IV


Q2: I want to use web-scraped data to develop a Generative AI model. Can I rely on the Publicly Available Exception instead of seeking consent from individuals?

Developing a Generative AI Model requires a large amount of data, and web-scraping is a common way of compiling datasets at the pre-training and fine-tuning stages. Where web-scraped datasets include personal data, the Guidelines state that you may consider relying on the Publicly Available Exception in lieu of consent.

  Publicly Available Exception
What it is Found in Part 2 of the First Schedule to the PDPA.
 
It enables organisations to collect, use or disclose, without consent, personal data that is publicly available, meaning personal data generally available to the public and obtainable or accessible with few or no restrictions.
 
When should I consider it Where the personal data you are using to develop a Generative AI model is online data that is publicly accessible without any restrictions.
 
This remains subject to a reasonable person considering that the use of the personal data in developing the Generative AI model is appropriate in the circumstances.
 


Q3: Some of the data I want to use sits behind a paywall or requires authentication mechanisms to access it. Is it still considered "publicly available"?

The PDPC addresses "digital barriers", defined as any technical and/or financial measure that meaningfully restricts data access in whole or in part. Examples of digital barriers include:

Categories Examples
Paywalls or subscriptions Hard, metered or dynamic paywalls
 
Registration requirements Sign-up processes requiring information like the user’s name, email address and contact details
 
Authentication mechanisms Passwords, Application Programming Interface keys, one-time codes
 
Tools, systems or configurations that detect and prevent automated programs from accessing website data
 
AI bot blockers, Completely Automated Public Turing tests to tell Computers and Humans Apart

The existence of a digital barrier does not, by itself, prevent personal data from becoming publicly available. Instead, this depends on the facts of each case, and the following factors should be considered:

  1. The purpose of the digital barrier (e.g. to enable data monetisation);
  2. The effect of the barrier (e.g. whether the online data remains accessible to the public at large or only to a specific group of persons);
  3. The steps needed to access the personal data (e.g. their number and complexity); and
  4. Whether the personal data can be accessed without any restrictions from other online sources.

The PDPC provides examples of personal data it considers publicly available even though it sits behind a digital barrier:

Examples Reasoning
Registers administered by public agencies that are intended to be generally available, including registers where the data is provided only after the payment of a fee.
 
Such registers are still obtainable by any member of the public.
News or media websites that allow access to a limited number of articles before requiring payment or a subscription.
 
These paywalls function as monetisation mechanisms rather than substantive barriers to access.
Large online forums that anyone may join but that require registration details from users as a condition of access and participation.
 
Such requirements are not so complex or burdensome as to suggest the data is not generally available.


Q4: If I can rely on the Publicly Available Exception under the PDPA to scrape personal data from a website, does that mean I am fully compliant with the website’s terms of use and other legal obligations?

No. The PDPC states that the applicability of the Publicly Available Exception, and compliance with the PDPA, are distinct from your contractual obligations to respect online terms of use or service and licensing agreements, and from requirements under general law, including criminal law. These should be assessed separately.

Q5: I want to use my own users' personal data to develop or improve a Generative AI Model. Do I need consent from my users?

Personal data that an individual provides to your organisation or that is created during or as a result of the individual's use of your products or services (“User Data”), is a common source of training data used to develop Generative AI Models. An organisation may decide to use Generative AI to improve its service offerings or answer common user queries.

Two obligations may apply to you:

Obligation What it requires
Consent Obligation (Section 13 of the PDPA) Unless deemed consent or relevant exceptions to consent apply (e.g. the Business Improvement Exception or the Research Exception), you are required to obtain consent for the use of User Data to develop a Generative AI Model.
 
Notification Obligation (Section 20(1) of the PDPA) You are required to inform the individual of:
 
  1. The purposes for the collection, use and disclosure of their personal data, on or before collecting the personal data; and
  2. Any other purpose of the use or disclosure of the personal data of which the individual has not been informed under paragraph (a), before use or disclosure of personal data for that purpose.
 

The PDPC highlights two approaches (General Notifications and AI-Specific Notifications) to notifying and obtaining consent for the use of User Data for the development of Generative AI Models, and considers that only AI-Specific Notifications are adequate:

Approach What it involves Is it adequate to obtain consent for the use of User Data for developing Generative AI Models?
General Notifications A general statement of purpose on the purpose of processing, citing the use of personal data for "new product development" without specifying AI or Generative AI Model development. No. The PDPC considers General Notifications are an insufficient means of obtaining consent to use User Data for the development of Generative AI Models, specifically, for large-scale AI model training and/or fine-tuning.
 
AI-Specific Notifications An explicit statement that the purpose of use includes AI and/or Generative AI Model development, which may also be accompanied by an explicit mechanism allowing individuals to decline or withdraw their consent.
 
Yes. You must provide AI-Specific Notifications when obtaining users’ consent to use User Data for the development of Generative AI Models.

The objective of the Consent and Notification Obligations is to enable individuals to provide meaningful consent. While notifications need not be overly technical or detailed, they must at the minimum include:

  • The function(s) of the Generative AI Model that require the use of personal data;
  • A clear description of the type(s) of personal data used to develop the model;
  • How the personal data will be used to train and/or fine-tune the model;
  • How individuals can decline or withdraw consent to the use of their personal data for AI training.

This is so that individuals can understand the potential risks involved as well – e.g. data of a sensitive nature may be disclosed to third parties, or used for purposes that are different from the original purpose for which they shared their personal data.


Q6: Anything else I should be doing when developing a Generative AI model with personal data?

The PDPC encourages organisations to practise data minimisation when developing Generative AI Models, to reduce unnecessary risks. Where personal data is necessary for development, you are reminded to implement appropriate technical, process and legal controls to protect it, and may refer to the PDPC’s Advisory Guidelines on use of Personal Data in AI Recommendation and Decision Systems for further guidance.

Q7: I develop or make available Generative AI Models for distribution and use. What are my responsibilities relating to personal data?

A Model Provider may act as an organisation, as a data intermediary, or as both.

Capacity When it applies Key obligation and what to do
Organisation Where you process personal data to develop and deploy Generative AI Models, including personal data collected from downstream systems (such as end-user prompts and inputs) for model development. You need to comply with all PDPA obligations.
 
In particular, under the Retention Limitation Obligation (section 25 of the PDPA), you should cease retaining documents containing personal data, or remove the means of associating it with individuals, once the collection purpose is no longer served and retention is no longer necessary for legal or business purposes.
 
If you preserve training data to develop or enhance future models, maintain a data retention policy setting out the rationale, and review the data held regularly to confirm it is still needed.
 
Data intermediary Where you process personal data on behalf of downstream users as part of service(s) provided to users by System Deployers, e.g. by running inference to deliver real-time outputs or by hosting data on your infrastructure.
 
Comply with the applicable PDPA provisions for data intermediaries, namely the Retention Limitation Obligation and the Protection Obligation (section 24 of the PDPA) to make reasonable security arrangements to protect personal data in your possession or control.
 
As good practice, document and make available the measures taken to safeguard data from downstream sources, e.g. your data access controls and data residency and retention policies.


Q8: I develop or supply Generative AI Systems for others. What are my responsibilities?

You are a System Provider if you are engaged to develop bespoke, customisable Generative AI Systems, or if you develop retail commercial off-the-shelf systems. This section does not apply to entities that develop and deploy Generative AI Systems in-house.

Capacity When it applies Key obligation and what to do
Organisation Where you process personal data as part of your own datasets to develop systems
 
You need to comply with all PDPA obligations.
 
Under the Retention Limitation Obligation (section 25 of the PDPA), you should cease retaining documents containing personal data, or remove the means of associating it with individuals, once the collection purpose is no longer served and retention is no longer necessary for legal or business purposes.
 
If you preserve training data to develop or enhance future models, maintain a data retention policy setting out the rationale, and review the data held regularly to confirm it is still needed.
 
Data intermediary Where you process data on behalf of downstream deployers, for e.g. to customise systems for specific use cases or in delivering Software as a Service (“SaaS”). System Providers, being the ones to provide the system that interfaces with users, often face greater risks such as prompt injection attacks. To comply with the Protection Obligation, you are expected to review periodically whether additional security arrangements are needed as you develop and make available new types of systems.
 
As good practice, share information on the system-level safeguards you have implemented with downstream deployers, to facilitate the protection of personal data processed by your systems. Examples include:
 
  • Data security and protection measures around the development environment (e.g. access controls, input or output filters, privacy enhancing technologies); and
  • Testing and performance metrics (e.g. likelihood of data leakage).
 


Q9: I use Generative AI Systems in my organisation (whether built in-house or developed by third parties). What are my responsibilities?

System Deployers bear primary responsibility for ensuring that the Generative AI Systems you have chosen to use can meet your obligations under the PDPA.

The PDPC highlights three obligations in particular:

Obligation What it requires
Purpose Limitation Obligation (section 18 of the PDPA)
  • Collection, use or disclosure of personal data of an individual is limited to purposes, and to an extent, that a reasonable person would consider appropriate in the circumstances.
 
  • Although Generative AI Systems can perform many tasks, be disciplined about specifying the intended purpose of processing and the amount of personal data required for it.
 
Protection Obligation (section 24 of the PDPA)
  • Safeguard personal data in your possession or under your control, including new categories of data sources collected through your systems, such as end-user prompts, inputs and generated outputs, agent or tool activity data, and internal enterprise data.
 
  • Additionally, track and designate responsibilities over these new data sources and implement corresponding safeguards. This includes educating your end users, whether internal or external, on:
 
  • the specific types of data that should be input into the system (for example, only pre-defined categories of personal data); and
  • how data will be processed once collected.
 
Accountability Obligation (sections 11 and 12 of the PDPA)
  • You are encouraged to develop clear written policies and to document your processes in relation to the safeguards undertaken.
  • Pre-emptively making such policies available, for example on your website, will demonstrate accountability in compliance with the PDPA.
 


Q10: We are deploying Generative AI with "agentic" functionality. Is there anything further to consider?

Yes. As AI risks continue to evolve, System Deployers should regularly review whether their safeguards remain sufficient, and this is especially relevant where their systems have agentic functionalities, which include independent planning and action-taking across multiple steps to achieve user-defined objectives, with minimal human intervention unless set by your organisation. Because these capabilities can exacerbate data protection risks, you should carefully consider the privacy-utility trade-offs when scoping your agentic use cases. Please refer to IMDA's Model AI Governance Framework for Agentic AI for further guidance on managing agentic risks.

Q11: An individual asks to access or correct their personal data used in our model or system. Do we have to comply, given how difficult this is in practice?

In general, yes, where reasonable. Sections 21, 22 and 22A of the PDPA give individuals the right to request access to, and correction of, their personal data in an organisation's possession or control, with corresponding obligations on the organisation to respond (the Access and Correction Obligations). In the Generative AI context, these obligations apply to personal data in the form(s) in which it has been collected, used and disclosed for model and system development, and you must accede to a request unless a PDPA exception applies. Examples of those exceptions are:

Request You may decline where
Access The burden or expense of providing access would be unreasonable to the organisation or disproportionate to the individual's interests.
 
Correction You are satisfied on reasonable grounds that the correction should not be made.
 

The PDPC recognises present-day challenges in meeting these requests for Generative AI, namely the:

  1. sheer volume of training data (making it hard to identify, verify and correct a specific individual's data);
  2. the nature of the technology (training data held as embeddings rather than in a traditional repository, and User Data held only temporarily in context windows); and
  3. other technical limits, such as the difficulty of removing specific information from models

Notwithstanding these challenges, the Commission expects organisations to adopt the following best practices, where reasonable:

Best practice What it involves
Upstream data handling measures
  • Verifying data accuracy at the point of collection;
  • Using data cleaning techniques such as de-duplication and outlier detection; and
  • Maintaining data provenance records to document the lineage of training data.
 
Case-by-case review
  • Reviewing requests individually and acceding where reasonable (for example, where the personal data is stored in a Retrieval-Augmented Generation database); and
  • Removing personal data, including inaccurate data, from training datasets before future AI training runs.
 
Appropriate technical measures
  • Tracking the maturity of, and adopting, suitable techniques such as machine unlearning to remove inaccurate personal data from models and systems.
 

4. Please do not hesitate to contact any members of our Artificial Intelligence & Digital Trust Practice if you would like to discuss how the proposed Guidelines may apply to your business operations.
 

This newsflash is intended to provide general information and may not be reproduced or transmitted in any form or by any means, in whole or by part, without prior written approval. It is not intended to be a comprehensive study of the subjects covered, nor is it intended to provide any legal advice. It should not be treated as a substitute for specific advice on specific situations.

Get in touch