Tech & Startup

Researchers develop method to erase sensitive data from AI models

AI data
Beyond regulatory compliance, the researchers believe the technology could help media outlets, healthcare institutions, and other organisations that handle sensitive data. Image: Collected

A group of researchers at the University of California, Riverside, has developed a new way to make artificial intelligence models "forget" sensitive or copyrighted information - even when the original training data is no longer available.

The technique, described in a paper titled 'A Certified Unlearning Approach without Access to Source Data' presented at the International Conference on Machine Learning in Vancouver in July, tackles one of the most pressing issues in AI: once personal or copyrighted data is embedded in a model, it is extremely difficult to remove. This has raised concerns as privacy laws, including the EU's GDPR and California's CCPA, increasingly demand stronger safeguards around personal information.

Traditionally, developers would need access to the full dataset to retrain a model without the unwanted material, a process that is costly and energy-intensive. The UC Riverside team's approach avoids this problem by using a "surrogate" dataset - data that mimics the statistical properties of the original. By carefully adjusting model parameters and introducing calibrated random noise, their system ensures the targeted information is erased without dismantling the model's overall usefulness.

"In real-world situations, you can't always go back and get the original data," said lead author Ümit Yiğit Başaran, a doctoral student in electrical and computer engineering, according to an article by UC Riverside on the study. "We've created a certified framework that works even when that data is no longer available."

The researchers demonstrated their method on both synthetic and real-world datasets, finding that it offered privacy protection comparable to full retraining but with far lower computational demands. Early results suggest the framework could eventually extend to large-scale systems like ChatGPT, though the current work applies mainly to simpler models.

Beyond regulatory compliance, the researchers believe the technology could help media outlets, healthcare institutions, and other organisations that handle sensitive data. It may also give individuals a way to demand the removal of personal or copyrighted information from AI systems.

"People deserve to know their data can be erased from machine learning models—not just in theory, but in provable, practical ways," said assistant professor Başak Güler, a co-author of the study, as per the UC Riverside article.

Comments

প্রধান উপদেষ্টার ঘোষিত সময়েই নির্বাচন হবে: প্রেস সচিব

আপনারা জানেন, সেপ্টেম্বরের শেষ সপ্তাহে দুর্গাপূজা। দুর্গাপূজা ঘিরে দেশে যেন কোনো ধরনের ষড়যন্ত্র, কেউ যেন অস্থিতিশীল পরিস্থিতি সৃষ্টি করতে না পারে, সে বিষয়ে সকল রাজনৈতিক দলকে সজাগ থাকার এবং সকলের...

৯ ঘণ্টা আগে