Week 15 / April 2026

Systems Optimize Actions While Missing Intent

Automation, AI agents, and interfaces fail when user behavior diverges from underlying goals

Synthesized using AI

Analyzed 125 papers. AI models can occasionally hallucinate, please verify critical details.

Three papers this week converge on a fundamental architectural problem: systems that optimize what users do fail when observable actions diverge from underlying intent. A 136.6-hour naturalistic driving study found that predicting when drivers will transition control matters more than automation capability itself—takeover events develop gradually and require longer warning horizons, while handover events depend on immediate cues, an asymmetry current systems ignore entirely. Camera-only prediction fails because it can't integrate vehicle telemetry and route context. Separately, work on LLM-generated what-if interfaces found that 52% of specifications contained errors when systems tried to compile natural language directly into interactive tools—conversational interfaces conflate goal expression with parameter manipulation, producing inconsistent results as conversations progress. A third study showed AI agents that infer user motivations across interaction history outperform systems that simply predict next actions, because optimizing behavior without understanding intent causes systems to automate counterproductive patterns. All three respond to the same pressure: interaction design has optimized for observable behavior while ignoring the judgment, context, and goals that make behavior meaningful.

Elsewhere, a measurement study of lead marketing platforms documented systematic privacy violations at industrial scale—health information shared to over 70 third parties, consumer data sold to unvetted buyers and often fabricated with attributes like health status, and over 8,000 telemarketing calls beginning within seconds of form submission. Phone-based opt-outs helped but none stopped marketing completely. The work exposes how consent frameworks exist but infrastructure is designed for extraction first. On a more constructive note, an IDE-integrated assistant that estimates frontend code energy at write-time reduced per-website consumption by 13-16% without slowing developers—showing that making externalities visible during the workflow changes behavior without adding friction. And computational analysis of 1,239 TED Talks found discourse clarity predicted engagement more powerfully than delivery or novelty, contributing 10 percentage points of explained variance and contradicting practitioner focus on stage presence.

The week's character is methodological rigor applied to problems previously discussed abstractly. Researchers are building measurement infrastructure—naturalistic driving benchmarks, lead ecosystem instrumentation, large-scale transcript analysis—that quantifies phenomena we've theorized about but never captured at scale.

Featured(1/6)

2604.07263

BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving

Yuhang Wang, Yiyao Xu, Chaoyun Yang, Lingyao Li, Jingran Sun, Hao Zhou

Preprint·2026-04-08

Don't rely on camera-only systems to predict driver transitions. Integrate vehicle telemetry and route context—they capture complementarity that vision misses. Design takeover warnings with longer lead time than handover prompts.

Drivers must decide when to engage driving automation and when to take back control—a judgment that imposes steep cognitive load and creates safety risks from both over-reliance and delayed intervention.

Method: Synchronized 136.6 hours of naturalistic driving across 127 drivers, capturing front-view video, in-cabin video, CAN bus signals, radar, and GPS context around each control transition. Visual input alone failed: front-view video missed driver state, in-cabin video missed road context. Adding CAN and route signals substantially improved prediction over video-only baselines. Takeover events developed gradually and benefited from longer horizons; handover events depended on immediate cues—an asymmetry with direct HMI implications.

Caveats: Benchmark tasks defined but baseline performance leaves room for improvement. Real-time deployment latency not tested.

Reflections: What prediction horizon minimizes false alarms while maintaining safety margins for takeover events? · Do transition patterns generalize across vehicle types and automation levels (L2 vs L3)? · Can driver-specific models trained on individual transition history outperform population-level baselines?

evaluation-methodsmobile-interfacestrust-safety

2604.04332

EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development

André Barrocas, Nuno Jardim Nunes, Valentina Nisi, Nikolas Martelaro

CHI 2026·2026-04-06

Integrate energy estimation into your AI coding pipeline. EcoAssist shows you can cut frontend energy by double digits without slowing developers down. Best for high-traffic sites where replication multiplies impact.

AI coding assistants optimize for developer speed, not energy impact. Frontend code replicated across millions of page views consumes significant energy, yet existing sustainability guidelines see limited practitioner adoption.

Method: An IDE-integrated assistant that analyzes AI-generated frontend code, estimates energy footprint, and proposes optimizations. Benchmarked on 500 websites and tested with 20 developers. Reduced per-website energy by 13-16% on average while maintaining developer productivity. Increased developers' awareness of energy use through actionable, in-context feedback during the coding workflow.

Caveats: Tested only on frontend code. Backend, mobile, and embedded systems may have different energy profiles and optimization opportunities.

Reflections: Do energy savings persist as developers internalize optimization patterns, or does awareness decay without continuous feedback? · How do energy-optimized implementations affect user-perceived performance and accessibility? · Can similar approaches extend to backend code where energy attribution is more complex?

programming-toolssustainabilityai-interactiondesign-tools

2604.06759

Understanding Data Collection, Brokerage, and Spam in the Lead Marketing Ecosystem

Yash Vekaria, Nurullah Demir, Konrad Kollnig, Zubair Shafiq

Preprint·2026-04-08

Don't assume lead platforms vet buyers or respect data minimization. If you're building forms that feed this ecosystem, you're exposing users to aggressive, often fabricated targeting. Opt-out mechanisms are largely performative.

Lead marketing platforms collect and sell personal data submitted via web forms, yet the ecosystem's privacy and spam risks remain largely unmeasured despite handling sensitive health and financial information.

Method: Instrumented over 100 health-related lead-generation websites and monitored 200 controlled phone numbers and emails. Observed sharing of highly personal health information to more than 70 distinct third parties. Purchased leads from three major platforms, uncovering deceptive brokerage where consumer data was sold to unvetted buyers and often augmented or fabricated with attributes like health status. Received over 8,000 telemarketing calls, 600 texts, and 200 emails—calls often began within seconds of form submission. Phone-based opt-outs helped most but none completely stopped marketing.

Caveats: Focused on health-related lead generation. Other verticals (insurance, education) may have different broker practices and regulatory compliance.

Reflections: What technical or regulatory interventions could enforce data minimization and buyer vetting in lead platforms? · How do consumers perceive the trade-off between quote convenience and downstream marketing harassment? · Can browser-level interventions detect and warn users before submitting data to lead-generation forms?

privacy-securitytrust-safetyethics

1 / 6

Featured

Findings(1/5)

AI coding assistants shift from developer velocity to operational externalities·Interaction design moves from what users do to why they do it·Conversational interfaces fail at multi-step analytical work; hybrid architectures emerge·Wearable audio moves from sealed to open-ear architectures, forcing ANC redesign·Platform support communities form outside official channels when response times fail developers

GitHub Copilot and Amazon CodeWhisperer optimize for developer speed, but frontend code replicated across millions of page views creates measurable energy costs that current tools ignore. EcoAssist embeds sustainability metrics directly into the coding workflow, making energy impact visible at write-time rather than post-deployment. The shift matters because optimization targets are moving from individual productivity to systemic resource consumption—a reframing that applies wherever AI accelerates replication at scale.

2604.04332

EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development

Surprises(1/3)

Linguistic clarity predicts TED Talk engagement more than delivery or novelty·Automation handoff prediction matters more than automation capability·Zero-shot LLM code generation fails on spatial reasoning, not language understanding

Computational analysis of 1,239 TED Talk transcripts found that speech clarity—presenting complex ideas accessibly—drives audience engagement. This contradicts practitioner focus on speaker charisma and topic novelty. The reframe: processing fluency matters more than performance. Audiences reward speakers who reduce cognitive load, not those who maximize presence or originality. The implication extends to any context where experts communicate to non-experts: accessibility isn't dumbing down, it's the optimization target.

2604.04583

Computational Analysis of Speech Clarity Predicts Audience Engagement in TED Talks

TOOLBOX(4)

EcoAssist

Tool

Energy-aware IDE assistant that analyzes AI-generated frontend code, estimates energy footprint, and proposes optimizations. Evaluated on 500 websites and 20 developers, achieving 13-16% average per-website energy reduction while maintaining developer productivity. Integrates directly into AI-assisted coding workflows to embed sustainability considerations.

2604.04332

BATON

Dataset

Large-scale naturalistic driving automation dataset capturing 127 drivers across 136.6 hours. Synchronizes front-view video, in-cabin video, decoded CAN bus signals, radar-based lead-vehicle interaction, and GPS-derived route context around control transitions. Defines three benchmark tasks: driving action understanding, handover prediction, and takeover prediction for assisted driving systems.

2604.07263

Praxa Specification Language (PSL)

Framework

Declarative intermediate representation language for what-if analysis that bridges natural language questions and interactive visual interfaces. Enables LLM-generated specifications capturing analytical intent and logic, supporting validation and repair before compilation. Benchmarked on 405 WIA questions across 11 types, 5 datasets, and 3 LLMs, achieving 52.42% correct generation without intervention and 80.42% with targeted repairs.

2604.07652

PRISM (Privacy Rules for Inclusive Social Media)

Framework

Rule-based, scenario-driven classroom educational intervention for social media privacy literacy tailored to autistic young adults with level 2 support needs. Deployed over 14 weeks with 29 participants across 6 course topics, achieving statistically significant increases in safer social media privacy decisions through contextual, rule-based scenarios and neuro-affirming pedagogy.

2604.07531

AnyUser: Translating Sketched User Intent into Domestic Robots

Turns free-form sketches on camera images into executable robot actions, letting users draw what they want instead of programming it. The multimodal parsing is the interesting bit.

2604.04418

Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

Introduces a measure for whether LLM justifications actually help users catch errors or just make wrong answers sound right. Hits the delegation paradox head-on.

2604.05368

AI and Collective Decisions: Strengthening Legitimacy and Losers' Consent

Examines how AI-mediated group decisions affect whether people who lose the vote still accept the outcome as fair. Rare focus on procedural legitimacy, not just efficiency.

2604.06489

Language-Guided Multimodal Texture Authoring via Generative Models

Converts natural-language prompts into coordinated haptic and visual textures for surfaces. Wild methodology: bridging language models with tactile feedback generation.

2604.07643

Narrix: Remixing Narrative Strategies from Examples for Story Writing

Decomposes published stories into reusable narrative strategies that novices can adapt for their own writing. Treats storytelling as pattern recognition, not just inspiration.

2604.04307

MagicCopy: Bring my data along with me beyond boundaries of apps

Tackles the copy-paste nightmare when moving data between apps with different table formats. Automatically translates representations so your spreadsheet talks to your design tool.

2604.04703

Bounded Autonomy: Controlling LLM Characters in Live Multiplayer Games

Explores how LLM-driven NPCs can participate in real-time multiplayer without breaking game state or player agency. The control problem is surprisingly thorny.

2604.05896

Dialogue based Interactive Explanations for Safety Decisions in Human Robot Collaboration

Makes robot safety decisions intelligible through dialogue, explaining why it stopped or switched modes. Focuses on trust-building in shared physical spaces.

REFLECTION(4)

Assistance that erodes the ability to refuse it

Across coding, education, and healthcare, AI systems are becoming more seamlessly integrated precisely as users lose the skills to verify, override, or reject their outputs. The research reveals a structural trap: the smoother the assistance, the faster competence atrophies, leaving users dependent on systems they can no longer evaluate.

Students report that AI scaffolding feels more helpful than traditional friction-based learning, yet skill transfer and retention collapse. If perceived helpfulness predicts adoption but actual learning predicts long-term capability, which signal should design prioritize—and what happens when they diverge?

1 / 4

Week 14April 2026

Week 16April 2026

ABOUT THIS ISSUE

How was this newsletter synthesized?

Methodology

This newsletter is generated by an AI pipeline (leveraging Anthropic Sonnet 4.5 & Haiku 4.5) that processes the metadata and abstracts of every new arXiv HCI paper from the past week—125 this issue. Each paper is scored on three dimensions: Practice (applicability for practitioners), Research (scientific contribution), and Strategy (industry implications), with scores from 1-5. Papers passing threshold are grouped into topic clusters, and each cluster is summarized to capture what that body of research is exploring.

Selection Criteria

The pipeline builds a curated selection that balances high scores with topic diversity—and deliberately includes at least one 'contrarian' paper that challenges prevailing assumptions. This selection is then analyzed to identify key findings (patterns across multiple papers) and surprises (results that contradict conventional wisdom). A narrative synthesis ties the week's research together under a unifying frame.

Key Themes Discovered

Field Report: ai-interaction

Trust, Control, and Cognitive Offloading

This cluster examines how users calibrate trust and maintain agency when delegating cognitive work to AI. Core tensions emerge: users struggle to verify AI outputs while relying on them, mental models shape oversight behavior unpredictably, and AI assistance risks deskilling without deliberate scaffolding. Research spans trust calibration across task difficulty, context management in collaboration, and design interventions (gaze-awareness, preference tracking, dialogue-based explanations) that surface decision-critical moments. Governance questions dominate: how do platforms distribute accountability for harms, and what learning mechanisms preserve human capability under persistent delegation?

1/10

Top Papers in this Theme

2604.04418

Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

2604.06071

Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles

2604.04307

MagicCopy: Bring my data along with me beyond boundaries of apps

2604.05360

OGA-AID: Clinician-in-the-loop AI Report Drafting Assistant for Multimodal Observational Gait Analysis in Post-Stroke Rehabilitation

2604.06134

Synthesized using AI

BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving

EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development

Understanding Data Collection, Brokerage, and Spam in the Lead Marketing Ecosystem

EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development

Computational Analysis of Speech Clarity Predicts Audience Engagement in TED Talks

EcoAssist

BATON

Praxa Specification Language (PSL)

PRISM (Privacy Rules for Inclusive Social Media)

AnyUser: Translating Sketched User Intent into Domestic Robots

Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

AI and Collective Decisions: Strengthening Legitimacy and Losers' Consent

Language-Guided Multimodal Texture Authoring via Generative Models

Narrix: Remixing Narrative Strategies from Examples for Story Writing

MagicCopy: Bring my data along with me beyond boundaries of apps

Bounded Autonomy: Controlling LLM Characters in Live Multiplayer Games

Dialogue based Interactive Explanations for Safety Decisions in Human Robot Collaboration

Assistance that erodes the ability to refuse it

How was this newsletter synthesized?

Methodology

Selection Criteria

Key Themes Discovered

Field Report: ai-interaction

Trust, Control, and Cognitive Offloading

Top Papers in this Theme

Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles

MagicCopy: Bring my data along with me beyond boundaries of apps

OGA-AID: Clinician-in-the-loop AI Report Drafting Assistant for Multimodal Observational Gait Analysis in Post-Stroke Rehabilitation

MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs