LSE Proposes PETLP Framework for Ethical AI Research on Social Media
TL;DR:
- Researchers at LSE have developed the PETLP framework to address the “triple bind” of GDPR compliance, copyright law, and platform terms when using social media data
- Recent high-profile research retractions highlight the urgent need for systematic approaches to data ethics beyond legal compliance
- The framework extends traditional data extraction processes with privacy-by-design principles and living Data Protection Impact Assessments
Researchers at the London School of Economics have introduced a practical framework addressing the complex legal and ethical challenges of using social media data in AI research. The PETLP (Privacy-Enhanced Transform, Load, Present) framework responds to increasing scrutiny following recent research retractions involving Reddit and other platform data.
Context and Background
Recent retractions of studies using social media data—including a Reddit r/schizophrenia analysis and the University of Zurich’s ChangeMyView project—have exposed significant gaps in how researchers navigate what the LSE team calls the “triple bind”. GDPR treats social media posts as personal data requiring protection, copyright laws safeguard user-generated content, and platform terms of service impose additional restrictions. These legal frameworks operate independently whilst researchers must satisfy all three simultaneously.
The PETLP framework extends traditional Extract, Transform, Load, Present (ETLP) data processing by incorporating privacy-by-design principles throughout the research lifecycle. Key innovations include living Data Protection Impact Assessments that evolve with the research, upfront mapping of data extraction methods, and community engagement before data collection begins. The framework emphasises that community trust matters beyond strict legal compliance.
Key Insight: The framework recognises that legal compliance alone is insufficient—maintaining community trust requires transparency and engagement throughout the research process, not just at publication stage.
Looking Forward
The LSE researchers are developing RedditHarbor, a practical tool demonstrating PETLP principles for Reddit-based research. The framework provides concrete implementation steps including establishing controller relationships under GDPR, selecting appropriate legal bases, understanding text-and-data-mining rights, and designing dissemination strategies before data collection.
The approach represents a shift from treating ethics and legal compliance as separate checkboxes towards integrated privacy-by-design thinking. For AI researchers working with social media data, the framework offers a systematic method for balancing innovation with responsible data stewardship whilst navigating an increasingly complex regulatory landscape.
Source Attribution:
- Source: LSE Impact Blog
- Original: https://blogs.lse.ac.uk/impactofsocialsciences/2025/10/21/a-practical-blueprint-for-legal-and-ethical-ai-research/
- Published: 21 October 2025