The AI Memory Gap: Users Misremember What They Created With AI or Without
Tim Zindulka, Sven Goller, Daniela Fernandes, Robin Welsch, Daniel Buschek
Stop assuming users will self-report AI use accurately. Build provenance tracking into the editor itself—version history that tags LLM contributions at the sentence level. Treat attribution as a system responsibility, not a memory test.
Disclosure of AI-generated content depends on people remembering what they wrote versus what the chatbot wrote. Memory fails at this task.
Method: 184 participants generated ideas with and without an LLM chatbot, then returned one week later. They misattributed 27% of their own ideas to AI and 29% of AI-generated ideas to themselves—near-chance accuracy. The confusion was symmetric: people couldn't distinguish their voice from the machine's, even when they'd personally elaborated on the content.
Caveats: Tested only text generation with a single chatbot interface. Visual or code generation may show different memory patterns.
Reflections: Does inline attribution (e.g., highlighting AI sentences in real-time) improve week-later recall? · Do professional writers show better source memory than students? · Can users distinguish AI contributions when the task is creative versus analytical?