Skip to main content

Britain’s data watchdog has issued a warning to tech firms about the use of people’s personal information to develop chatbots after concerns that the underlying technology is trained on large quantities of unfiltered material scraped from the web.

The intervention from the Information Commissioner’s Office came after its Italian counterpart temporarily banned ChatGPT over data privacy concerns.

The ICO said firms developing and using chatbots must respect people’s privacy when building generative artificial intelligence systems. ChatGPT, the best-known example of generative AI, is based on a system called a large language model (LLM) that is “trained” by being fed a vast trove of data culled from the internet.

“There really can be no excuse for getting the privacy implications of generative AI wrong. We’ll be working hard to make sure that organisations get it right,” said Stephen Almond, the ICO’s director of technology and innovation.

In a blogpost, Almond pointed to the Italy decision and a letter signed by academics last week, including Elon Musk and the Apple co-founder Steve Wozniak, that called for an immediate pause in the creation of “giant AI experiments” for at least six months. The letter said there were concerns that tech firms were creating “ever more powerful digital minds” that no one could “understand, predict, or reliably control”.

Almond said his own conversation with ChatGPT had led to the chatbot telling him generative AI had “the potential to pose risks to data privacy if not used responsibly”. He added: “It doesn’t take too much imagination to see the potential for a company to quickly damage a hard-earned relationship with customers through poor use of generative AI.”

Referring to the LLM training process, Almond said data protection law still applied when the personal information being processed came from publicly accessible sources.

A checklist published by the ICO on Monday stated that under UK General Data Protection Regulation (GDPR), there must be a lawful basis for processing personal data, such as an individual giving their “clear consent” for their data to be used. There were also other alternatives that did not require consent, such as having a “legitimate interest”, the checklist said.

It added that companies had to carry out a data protection impact assessment and mitigate security risks such as personal data leaks and so-called membership inference attacks, whereby rogue actors try to identify whether a certain individual was used in the training data for an LLM.

The Italian data protection watchdog announced a temporary ban on ChatGPT on Friday, citing a data leak last month and concerns about the use of personal data in the system underpinning the chatbot. The watchdog said there appeared to be “no legal basis underpinning the massive collection and processing of personal data in order to ‘train’ the algorithms on which the platform” relied.

skip past newsletter promotion

Sign up to Business Today

Get set for the working day – we’ll point you to all the business news and analysis you need every morning

Privacy Notice: Newsletters may contain info about charities, online ads, and content funded by outside parties. For more information see our Privacy Policy. We use Google reCaptcha to protect our website and the Google Privacy Policy and Terms of Service apply.

after newsletter promotion

In response to the Italian ban, Sam Altman, the chief executive of ChatGPT-developer OpenAI, said: “We think we are following all privacy laws.” But the company has refused to share any information about what data was used to train GPT-4, the latest version of the underlying technology that powers ChatGPT.

The previous version, GPT-3, was trained on 300bn words scraped from the public internet, as well as the contents of millions of ebooks and the whole of English-language Wikipedia.


About SDX

Leave a Reply

%d bloggers like this: