To address the privacy risks associated with biometric data, we propose a method of pseudonymizing social media profile pictures. Unlike existing approaches that focus on text and structured data, profile images can directly identify users. Our pipeline uses FaceNet and DeepFace to extract facial attributes, such as age, gender, and expression. It then formats these attributes as JSON and converts them to descriptive text using Mistral (7B). A multimodal model called Janus then uses this text to generate synthetic, identity-free profile images that retain facial features. All processing is done locally to ensure compliance with the GDPR and avoid data exposure. These pseudonymized images support research and machine learning tasks while protecting privacy. To evaluate image quality and privacy, we cluster the original and pseudonymized images and compare them using the Adjusted Rand Index and Normalized Mutual Information metrics. These metrics assess semantic consistency and identity separation. Our method securely and ethically analyzes social media data by combining facial de-identification, large language models, and generative image synthesis in a unified workflow.
«
To address the privacy risks associated with biometric data, we propose a method of pseudonymizing social media profile pictures. Unlike existing approaches that focus on text and structured data, profile images can directly identify users. Our pipeline uses FaceNet and DeepFace to extract facial attributes, such as age, gender, and expression. It then formats these attributes as JSON and converts them to descriptive text using Mistral (7B). A multimodal model called Janus then uses this text to...
»