Fourteen papers by CSE researchers at CHI 2025

CSE authors are presenting new research in human-computer interaction, on topics including mixed-reality medical training, student use of LLMs, and surface-based object tracking.

Fourteen papers authored by researchers affiliated with CSE are being presented at the 2025 Conference on Human Factors in Computing Systems (CHI), widely considered the top conference in the field of human-computer interaction (HCI). Held annually since 1982, CHI attracts thousands of the world’s brightest researchers to share the latest innovations in HCI and related areas. CHI 2025 is taking place April 26-May 1 in Yokohama, Japan.

CSE researchers are presenting papers on a range of topics at the conference, from sound awareness systems for deaf and hard-of-hearing people to AI-driven chatbots facilitating pediatric communication. The papers being presented are as follows, the names of authors affiliated with CSE in bold:

Main Track:

Development of the Critical Reflection and Agency in Computing IndexBest Paper Honorable Mention
Aadarsh Padiyath, Mark Guzdial, Barbara Ericson

Abstract: As computing’s societal impact grows, so does the need for computing students to recognize and address the ethical and sociotechnical implications of their work. While there are efforts to integrate ethics into computing curricula, we lack a standardized tool to measure those efforts, specifically, students’ attitudes towards ethical reflection and their ability to effect change. This paper introduces the novel framework of Critically Conscious Computing and reports on the development and content validation of the Critical Reflection and Agency in Computing Index, a novel instrument designed to assess undergraduate computing students’ attitudes towards practicing critically conscious computing. The resulting index is a theoretically grounded, expert-reviewed tool to support research and practice in computing ethics education. This enables researchers and educators to gain insights into students’ perspectives, inform the design of targeted ethics interventions, and measure the effectiveness of computing ethics education initiatives.

eXplainMR: Generating Real-time Textual and Visual eXplanations to Facilitate UltraSonography Learning in MR
Jingying Wang, Jingjing Zhang, Juana Nicoll Capizzano, Matthew Sigakis, Xu Wang, Vitaliy Popov

Abstract: Mixed-Reality physical task guidance systems have the benefit of providing virtual instructions while enabling learners to interact with the tangible world. However, they are mostly built around single-path tasks and often employ visual cues for motion guidance without explanations on why an action was recommended. In this paper, we introduce eXplainMR, a mixed-reality tutoring system that teaches medical trainees to perform cardiac ultrasound. eXplainMR automatically generates subgoals for obtaining an ultrasound image that contains clinically relevant information, and textual and visual explanations for each recommended move based on the visual difference between the two consecutive subgoals. We performed a between-subject experiment (N=16) in one US teaching hospital comparing eXplainMR with a baseline MR system that offers commonly used arrow and shadow guidance. We found that after using eXplainMR, medical trainees demonstrated a better understanding of anatomy and showed more systematic reasoning when deciding on the next moves, which was facilitated by the real-time explanations provided in eXplainMR.

A person uses a virtual reality headset and controller to simulate ultrasound scanning on a mannequin, with a large screen displaying cardiac ultrasound guidance. A labeled diagram shows four key feedback components: subgoals for task steps, textual anatomical tips, real-time annotated ultrasound images, and 3D visual cues for anatomical orientation.
eXplainMR is a Mixed Reality tutoring system designed for basic cardiac surface ultrasound training. Trainees wear a head-mounted display (HMD) and hold a controller, mimicking a real ultrasound probe, while treating a desk surface as the patient’s body for low-cost and anywhere training. eXplainMR engages trainees with troubleshooting questions and provides automated feedback through four key mechanisms: 1) subgoals that break down tasks into single-movement steps, 2) textual explanations comparing the current incorrect view with the target view, 3) real-time segmentation and annotation of ultrasound images for direct visualization, and 4) the 3D visual cues provide further explanations on the intersection between the slicing plane and anatomies.

Weaving Sound Information to Support Real-Time Sensemaking of Auditory Environments: Co-Designing with a DHH User
Jeremy Zhengqi Huang, Jaylin Herskovitz, Liang-Yuan Wu, Cecily Morrison, Dhruv Jain

Abstract: Current AI sound awareness systems can provide deaf and hard of hearing people with information about sounds, including discrete sound sources and transcriptions. However, synthesizing AI outputs based on DHH people’s ever-changing intents in complex auditory environments remains a challenge. In this paper, we describe the co-design process of SoundWeaver, a sound awareness system prototype that dynamically weaves AI outputs from different AI models based on users’ intents and presents synthesized information through a heads-up display. Adopting a Research through Design perspective, we created SoundWeaver with one DHH co-designer, adapting it to his personal contexts and goals (e.g., cooking at home and chatting in a game store). Through this process, we present design implications for the future of “intent-driven” AI systems for sound accessibility.

“Here the GPT made a choice, and every choice can be biased”: How Students Critically Engage with LLMs through End-User Auditing Activity
Snehal Prabhudesai, Ananya Prashant Kasi, Anmol Mansingh, Anindya Das Antar, Hua Shen, Nikola Banovic

Abstract: Despite recognizing that Large Language Models (LLMs) can generate inaccurate or unacceptable responses, universities are increasingly making such models available to their students. Existing university policies defer the responsibility of checking for correctness and appropriateness of LLM responses to students and assume that they will have the required knowledge and skills to do so on their own. In this work, we conducted a series of user studies with students (N=47) from a large North American public research university to understand if and how they critically engage with LLMs. Our participants evaluated an LLM provided by the university in a quasi-experimental setup; first by themselves, and then with a scaffolded design probe that guided them through an end-user auditing exercise. Qualitative analysis of participant think-aloud and LLM interaction data showed that students without basic AI literacy skills struggle to conceptualize and evaluate LLM biases on their own. However, they transition to focused thinking and purposeful interactions when provided with structured guidance. We highlight areas where current university policies may fall short and offer policy and design recommendations to better support students.

Two side-by-side radar charts compare AI literacy scores across 17 competencies for classroom participants (left) and workshop participants (right). The classroom group shows consistently high scores across most areas, while the workshop group displays lower, more varied scores with notable weaknesses in areas like sensors and action/reaction.
The radar chart for (a) classroom participants and (b) workshop participants with distribution of scores across 17 AI-related competencies. Classroom participants (a) generally scored higher than workshop participants across all competencies, except for “Interdisciplinarity.” Workshop participants (b) had low overall AI literacy, with a mean score of 66.86 (D grade) and high variability, highlighting the need for targeted educational interventions. In contrast, classroom participants (a) showed reasonable AI literacy, with a mean score of 84.78 (B grade) and more consistent performance, indicating general proficiency. This underscores the importance of enhancing AI literacy to ensure effective engagement with AI-related topics.

Evaluating Non-AI Experts’ Interaction with AI: A Case Study In Library Context
Qingxiao Zheng, Minrui Chen, Hyanghee Park, Zhongwei Xu, Yun Huang

Abstract: Public libraries in the U.S. are increasingly facing labor shortages, tight budgets, and overworked staff, creating a pressing need for conversational agents to assist patrons. The democratization of generative AI has empowered public service professionals to develop AI agents by leveraging large language models. To understand the needs of non-AI library professionals in creating their own conversational agents, we conducted semi-structured interviews with library professionals (n=11) across the U.S. Insights from these interviews informed the design of EvalignUX, a prototype tool that enables non-AI experts to create conversational agents without coding skills. We then conducted think-aloud sessions and follow-up interviews to evaluate the prototype experience and identify the key evaluation criteria emphasized by library professionals (n=12) when developing conversational agents. Our findings highlight how these professionals perceive the prototype experience and reveal five essential evaluation criteria: interpreting user intent, faithful paraphrasing, proper alignment with authoritative sources, tailoring the tone of voice, and handling unknown answers effectively. These insights provide valuable guidance for designing AI-supported “end-user AI creation tools” in public service domains beyond libraries.

Are We On Track? AI-Assisted Active and Passive Goal Reflection During Meetings
Xinyue Chen, Lev Tankelevitch, Rishi Vanukuru, Ava Elizabeth Scott, Payod Panda, Sean Rintel

Abstract: Meetings often suffer from a lack of intentionality, such as unclear goals and straying off-topic. Identifying goals and maintaining their clarity throughout a meeting is challenging, as discussions and uncertainties evolve. Yet meeting technologies predominantly fail to support meeting intentionality. AI-assisted reflection is a promising approach. To explore this, we conducted a technology probe study with 15 knowledge workers, integrating their real meeting data into two AI-assisted reflection probes: a passive and active design. Participants identified goal clarification as a foundational aspect of reflection. Goal clarity enabled people to assess when their meetings were off-track and reprioritize accordingly. Passive AI intervention helped participants maintain focus through non-intrusive feedback, while active AI intervention, though effective at triggering immediate reflection and action, risked disrupting the conversation flow. We identify three key design dimensions for AI-assisted reflection systems, and provide insights into design trade-offs, emphasizing the need to adapt intervention intensity and timing, balance democratic input with efficiency, and offer user control to foster intentional, goal-oriented behavior during meetings and beyond.

A three-panel diagram shows the progression of a dynamic meeting visualization system, with panel A showing a basic topic and goal layout, panel B adding expandable topic details, and panel C illustrating a full topic-to-goal flow. The timeline spans from the start to the end of a meeting, emphasizing how the visualization becomes more detailed over time.
Ambient Visualization: The visualization evolves over time as users observe it at different moments during the meeting (e.g., panels A-C), with the content becoming richer as the discussion progresses.

A Law of One’s Own: The Inefficacy of the DMCA for Non-Consensual Intimate Media
Li Qiwei, Shihui Zhang, Samantha Paige Pratt, Andrew Timothy Kasper, Eric Gilbert, Sarita Schoenebeck

Abstract: Non-consensual intimate media (NCIM) presents internet-scale harm to individuals who are depicted. One of the most powerful tools for requesting its removal is the Digital Millennium Copyright Act (DMCA). However, the DMCA was designed to protect copyright holders rather than to address the problem of NCIM. Using a dataset of more than 54,000 DMCA reports and over 85 million infringing URLs spanning over a decade, this paper evaluates the efficacy of the DMCA for NCIM takedown. Results show that for non-commercial requests, while more than half of URLs are deindexed from Google Search within 48 hours, the actual removal of content from website hosts is much slower. The median infringing URL takes more than 45 days to be removed from website hosts, and only 5.39% URLs are removed within the first 48 hours. Additionally, the most frequently reported domains for non-commercial NCIM are smaller websites, not large platforms. We stress the need for new laws that ensure a shorter time to takedown that are enforceable across big and small platforms alike.

Enhancing Pediatric Communication: The Role of an AI-Driven Chatbot in Facilitating Child-Parent-Provider Interaction
Woosuk Seo, Young-Ho Kim, Ji Eun Kim, Megan Tao Fan, Mark S. Ackerman, Sung Won Choi, Sun Young Park

Abstract: Communication with child patients is challenging due to their developing ability to express emotions and symptoms. Additionally, healthcare providers often have limited time to offer resources to parents. By leveraging AI to facilitate free-form conversations, our study aims to design an AI-driven chatbot to bridge these gaps in child-parent-provider communication. We conducted two studies: 1) design sessions with 12 children with cancer and their parents, which informed the development of our chatbot, ARCH, and 2) an interview study with 15 pediatric care experts to identify potential challenges and refine ARCH’s role in pediatric communication. Our findings highlight three key roles for ARCH: providing an expressive outlet for children, offering reassurance to parents, and serving as an assessment tool for providers. We conclude by discussing design considerations for AI-driven chatbots in pediatric communication, such as creating communication spaces, balancing the expectations of children and parents, and addressing potential cultural differences.

Who Reaps All the Superchats? A Large-Scale Analysis of Income Inequality in Virtual YouTuber Livestreaming
Ruijing Zhao, Brian Diep, Jiaxin Pei, Dongwook Yoon, David Jurgens, Jian Zhu

Abstract: The explosive growth of Virtual YouTubers (VTubers)—streamers who perform behind virtual anime avatars—has created a unique digital economy with profound implications for content creators, platforms, and viewers. Understanding the economic landscape of VTubers is crucial for designing equitable platforms, supporting content creator livelihoods, and fostering sustainable digital communities. To this end, we conducted a large-scale study of over 1 million hours of publicly available streaming records from 1,923 VTubers on YouTube, covering tens of millions of dollars in actual profits. Our analysis reveals stark inequality within the VTuber community and characterizes the sources of income for VTubers from multiple perspectives. Furthermore, we also found that the VTuber community is increasingly monopolized by two agencies, driving the financial disparity. This research illuminates the financial dynamics of VTuber communities, informing the design of equitable platforms and sustainable support systems for digital content creators.

Screenshot of a YouTuber playing Luigi's mansion with a column on the side with a screenshot of the corresponding chat, with different levels of membership highlighted.
Screenshot of video game streaming by the Hololive VTuber Gawr Gura, the most subscribed VTuber on YouTube.

LADICA: A Large Shared Display Interface for Generative AI Cognitive Assistance in Co-located Team Collaboration
Zheng Zhang, Weirui Peng, Xinyue Chen, Luke Cao, Toby Jia-Jun Li

Abstract: Large shared displays, such as digital whiteboards, are useful for supporting co-located team collaborations by helping members perform cognitive tasks such as brainstorming, organizing ideas, and making comparisons. While recent advancement in Large Language Models (LLMs) has catalyzed AI support for these displays, most existing systems either only offer limited capabilities or diminish human control, neglecting the potential benefits of natural group dynamics. Our formative study identified cognitive challenges teams encounter, such as diverse ideation, knowledge sharing, mutual awareness, idea organization, and synchronization of live discussions with the external workspace. In response, we introduce LADICA, a large shared display interface that helps collaborative teams brainstorm, organize, and analyze ideas through multiple analytical lenses, while fostering mutual awareness of ideas and concepts. Furthermore, LADICA facilitates the real-time extraction of key information from verbal discussions and identifies relevant entities. A lab study confirmed LADICA’s usability and usefulness.

TeachTune: Reviewing Pedagogical Agents Against Diverse Student Profiles with Simulated Students
Hyoungwook Jin, Minju Yoo, Jeongeon Park, Yokyung Lee, Xu Wang, Juho Kim

Abstract: Large language models (LLMs) can empower teachers to build pedagogical conversational agents (PCAs) customized for their students. As students have different prior knowledge and motivation levels, teachers must review the adaptivity of their PCAs to diverse students. Existing chatbot reviewing methods (e.g., direct chat and benchmarks) are either manually intensive for multiple iterations or limited to testing only single-turn interactions. We present TeachTune, where teachers can create simulated students and review PCAs by observing automated chats between PCAs and simulated students. Our technical pipeline instructs an LLM-based student to simulate prescribed knowledge levels and traits, helping teachers explore diverse conversation patterns. Our pipeline could produce simulated students whose behaviors correlate highly to their input knowledge and motivation levels within 5% and 10% accuracy gaps. Thirty science teachers designed PCAs in a between-subjects study, and using TeachTune resulted in a lower task load and higher student profile coverage over a baseline.

Late-Breaking Work:

MagDeck: Surface-Based Object Recognition and Tracking via Passive Magnetic Sensing
Kunpeng Huang, Yasha Iravantchi, Alanson Sample

Abstract: Enabling arbitrary surfaces to understand the properties and locations of objects in their surrounding environments can facilitate a wide range of applications, including collaborative tangible interactions and mixed-reality 3D interfaces. However, limitations of current tracking mechanisms, e.g., line-of-sight occlusion for vision-based approaches, requiring instrumenting the objects with tags for RF-based tracking, ultimately restrict their adoption. This paper introduces MagDeck, an object recognition and tracking approach based on passive magnetic sensing. Utilizing an array of 112 low-cost magnetometers positioned underneath a table, our custom-designed signal processing and machine-learning pipeline enables accurate and robust identification and localization of unmodified daily objects. MagDeck is able to classify 10 household and workplace objects in real-time with 99.0% accuracy while tracking them with a 3.27 cm average error. Through instrumenting a regular table surface, MagDeck presents a low-cost and accurate approach to detecting passive objects and a step forward toward context-aware computing in real-world environments.

A schematic of MagDeck. The first image shows a tabletop with four rows of regularly spaced holes, on which a laptop sits. The next image shows the underside of the table, with sensors fixed to correspond to these holes. Image C shows a close-up view of the hardware used.
(a) Hardware schematic of MagDeck, showing the magnetometer array and data flow. (b) Sensor PCBs installed underneath the tabletop using 3D-printed fixtures. (c) Chainable PCB with a zoomed-in view of the MLX90393 magnetometer. (d) 3D-printed PCB holder positioning the magnetometers 3cm beneath the table surface.

SceneGenA11y: How can Runtime Generative tools improve the Accessibility of a Virtual 3D Scene?
Xinyun Cao, Kexin Phyllis Ju, Chenglin Li, Dhruv Jain

Abstract: With the popularity of virtual 3D applications, from video games to educational content and virtual reality scenarios, the accessibility of 3D scene information is vital to ensure inclusive and equitable experiences for all. Previous work include information substitutions like audio description and captions, as well as personalized modifications, but they could only provide predefined accommodations. In this work, we propose SceneGenA11y, a system that responds to the user’s natural language prompts to improve accessibility of a 3D virtual scene in runtime. The system primes LLM agents with accessibility-related knowledge, allowing users to explore the scene and perform verifiable modifications to improve accessibility. We conducted a preliminary evaluation of our system with three blind and low-vision people and three deaf and hard-of-hearing people. The results show that our system is intuitive to use and can successfully improve accessibility. We discussed usage patterns of the system, potential improvements, and integration into apps. We ended with highlighting plans for future work.

VisQuestions: Constructing Evaluations for Communicative Visualizations
Ruijia Guan, Elsie Lee-Robbins, Xu Wang, Eytan Adar

Abstract: Communicative visualization designers devote few resources to evaluating their designs. Although general guidelines may guide them, the uniqueness of each communicative visualization makes evaluation challenging. Recent research has suggested that modeling designer intents as learning objectives can guide the construction of evaluative assessments. In this paper, we present VisQuestions, a system that streamlines the creation of multiple-choice questions for visualization evaluation. Using source material (e.g., data, notes, prototypes) and contextual data (e.g., captions, annotations, and other text), VisQuestions guides the creation of questions. The system’s mixed-initiative features can help create or improve learning objective definitions, questions, and distractors. We report on the performance of the questions by deploying different visualization designs and quizzes through an online experiment. Our findings support the utility of the tool for crafting assessments that can identify how much visualizations support designer intents.