Race and Gender in LLM-Generated Personas: A Large-Scale Audit of 41 Occupations
Ilona van der Linden, Sahana Kumar, Arnav Dixit, Aadi Sudan, Smruthi Danda, David C. Anastasiu, Kai Lukoff
Stop using LLM-generated personas without demographic auditing. If you're using ChatGPT or Claude to create user archetypes, cross-check against real labor statistics for your domain. Best for teams without budget for recruitment but high stakes for representation accuracy.
LLMs generate occupational personas for design research, but nobody's checking if they're systematically skewing demographics. Over 1.5 million personas later, the bias patterns are clear and consistent.
Method: Audited four LLMs (U.S., China, France origins) against Bureau of Labor Statistics data across 41 occupations. Found two patterns: systematic shifts where some groups are consistently over- or under-represented (e.g., women overrepresented in male-dominated fields), and systematic erasure where certain race-gender intersections vanish entirely. The bias persists across models with different safety commitments, suggesting it's baked into training data, not just guardrails.
Caveats: Only covers U.S. occupations. Doesn't test non-occupational personas (hobbyists, patients, students) where no ground-truth data exists.
Reflections: Can fine-tuning on BLS data correct these biases without introducing new ones? · Do these biases compound when personas are used in downstream design decisions? · How do international LLMs handle occupational demographics outside their home countries?