Selected Publications
My Google Scholar page contains a full list of publications.
2025
- Understanding Industry Practitioners’ Experiences in Generative AI GovernanceHyo Jin Do, Swati Babbar, Wenjing Li, Laura Walks, and Shayenna MiskoCHI Late-Breaking Work 2025, 2025
AI governance has become critical, especially as generative AI technology introduces new complexities and uncertainties that require robust risk management. While the need for frameworks and solutions to support AI governance is widely recognized, understanding and addressing the real-world needs of AI practitioners in operationalizing governance remains underexplored. To bridge this gap, we conducted semi-structured interviews using a design probe with AI governance practitioners across various industry sectors. Our findings provide insights into the experiences and pain points of industry practitioners in AI governance, highlighting key challenges in achieving performance goals, assessing societal impact, securing user data, and navigating technical difficulties. We also identified their technical and explainability needs, including practical guidance on addressing violations, as well as more detailed explanations of AI models, data, and evaluation. We discuss design guidelines for AI governance tools that effectively support practitioners’ needs.
- Exploring Industry Practices and Perspectives on AI Attribution in Co-Creative Use CasesJessica He, and Hyo Jin DoJoint Proceedings of the ACM IUI Workshops 2025, 2025
The increasing adoption of generative AI in human-AI co-creative workflows has led to the development of new policies and design guidelines for disclosing the usage of AI, promoting transparency and accountability in the collaborative process. However, it remains unclear how these policies are being translated into practice in product development. Through semi-structured interviews with 12 industry practitioners, we investigated current approaches and challenges in implementing AI attribution in business products. Our results reveal high variability in AI attribution approaches across products, as they consider factors such as the type of content produced by AI, the presence of human reviewers, stakeholder needs, and regulatory requirements. We also identified technical, user, and product-level challenges of implementing AI attribution in products, including difficulty tracing and discerning the significance of AI contributions, negative impacts on user experience and sense of ownership, and a lack of precedent in product-specific contexts. Our findings offer practical design implications for effective AI attribution strategies in co-creative business use cases.
2024
- Grounding with Structure: Exploring Design Variations of Grounded Human-AI Collaboration in a Natural Language InterfaceHyo Jin Do, Michelle Brachman, Casey Dugan, James M. Johnson, Julia Lauer, and 2 more authorsProc. ACM Hum.-Comput. Interact., Nov 2024
Selecting an effective utterance among countless possibilities that match a user’s intention poses a challenge when using natural language interfaces. To address the challenge, we leveraged the principle of least collaborative effort in communication grounding theory and designed three grounded conversational interactions: 1) a grounding interface allows users to start with a provisional input and then invite a conversational agent to complete their input, 2) a multiple grounding interface presents multiple inputs for the user to select from, and 3) a structured grounding interface guides users to write inputs in a structure best understood by the system. We compared our three grounding interfaces to an ungrounded control interface in a crowdsourced study (N=80) using a natural language system that generates small programs. We found that the grounding interfaces reduced cognitive load and improved task performance. The structured grounding interface further reduced speaker change costs and improved technology acceptance, without sacrificing the perception of control. We discuss the implications of designing grounded conversational interactions in natural language systems.
- Evaluating What Others Say: The Effect of Accuracy Assessment in Shaping Mental Models of AI SystemsHyo Jin Do, Michelle Brachman, Casey Dugan, Qian Pan, Priyanshu Rai, and 2 more authorsProc. ACM Hum.-Comput. Interact., Nov 2024
Forming accurate mental models that align with the actual behavior of an AI system is critical for successful user experience and interactions. One way to develop mental models is through information shared by other users. However, this social information can be inaccurate and there is a lack of research examining whether inaccurate social information influences the development of accurate mental models. To address this gap, our study investigates the impact of social information accuracy on mental models, as well as whether prompting users to validate the social information can mitigate the impact. We conducted a between-subject experiment with 39 crowdworkers where each participant interacted with our AI system that automates a workflow given a natural language sentence. We compared participants’ mental models between those exposed to social information of how the AI system worked, both correct and incorrect, versus those who formed mental models through their own usage of the system. Specifically, we designed three experimental conditions: 1) validation condition that presented the social information followed by an opportunity to validate its accuracy through testing example utterances, 2) social information condition that presented the social information only, without the validation opportunity, and 3) control condition that allowed users to interact with the system without any social information. Our results revealed that the inclusion of the validation process had a positive impact on the development of accurate mental models, especially around the knowledge distribution aspect of mental models. Furthermore, participants were more willing to share comments with others when they had the chance to validate the social information. The impact of inaccurate social information on altering user mental models was found to be non-significant, while 69.23% of participants incorrectly judged the social information accuracy at least once. We discuss the implications of these findings for designing tools that support the validation of social information and thereby improve human-AI interactions.
- Facilitating Human-LLM Collaboration through Factuality Scores and Source AttributionsHyo Jin Do, Rachel Ostrand, Justin D. Weisz, Casey Dugan, Prasanna Sattigeri, and 3 more authorsNov 2024
While humans increasingly rely on large language models (LLMs), they are susceptible to generating inaccurate or false information, also known as "hallucinations". Technical advancements have been made in algorithms that detect hallucinated content by assessing the factuality of the model’s responses and attributing sections of those responses to specific source documents. However, there is limited research on how to effectively communicate this information to users in ways that will help them appropriately calibrate their trust toward LLMs. To address this issue, we conducted a scenario-based study (N=104) to systematically compare the impact of various design strategies for communicating factuality and source attribution on participants’ ratings of trust, preferences, and ease in validating response accuracy. Our findings reveal that participants preferred a design in which phrases within a response were color-coded based on the computed factuality scores. Additionally, participants increased their trust ratings when relevant sections of the source material were highlighted or responses were annotated with reference numbers corresponding to those sources, compared to when they received no annotation in the source material. Our study offers practical design guidelines to facilitate human-LLM collaboration and it promotes a new human role to carefully evaluate and take responsibility for their use of LLM outputs.
- Multi-Level Explanations for Generative Language ModelsLucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, and 6 more authorsNov 2024
Perturbation-based explanation methods such as LIME and SHAP are commonly applied to text classification. This work focuses on their extension to generative language models. To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms. To handle text output, we introduce the notion of scalarizers for mapping text to real numbers and investigate multiple possibilities. To handle long inputs, we take a multi-level approach, proceeding from coarser levels of granularity to finer ones, and focus on algorithms with linear scaling in model queries. We conduct a systematic evaluation, both automated and human, of perturbation-based attribution methods for summarization and context-grounded question answering. The results show that our framework can provide more locally faithful explanations of generated outputs.
2023
- Inform, Explain, or Control: Techniques to Adjust End-User Performance Expectations for a Conversational Agent Facilitating Group Chat DiscussionsHyo Jin Do, Ha-Kyung Kong, Pooja Tetali, Karrie Karahalios, and Brian P. BaileyProc. ACM Hum.-Comput. Interact., Oct 2023
A conversational agent (CA) effectively facilitates online group discussions at scale. However, users may have expectations about how well the CA would perform that do not match with the actual performance, compromising technology acceptance. We built a facilitator CA that detects a member who has low contribution during a synchronous group chat discussion and asks the person to participate more. We designed three techniques to set end-user expectations about how accurately the CA identifies an under-contributing member: 1)information: explicitly communicating the accuracy of the detection algorithm, 2)explanation: providing an overview of the algorithm and the data used for the detection, and 3)adjustment: enabling users to gain a feeling of control over the algorithm. We conducted an online experiment with 163 crowdworkers in which each group completed a collaborative decision-making task and experienced one of the techniques. Through surveys and interviews, we found that the explanation technique was the most effective strategy overall as it reduced user embarrassment, increased the perceived intelligence of the CA, and helped users better understand the detection algorithm. In contrast, the information technique reduced members’ contributions and the adjustment technique led to a more negative perceived discussion experience. We also discovered that the interactions with other team members diluted the effects of the techniques on users’ performance expectations and acceptance of the CA. We discuss implications for better designing expectation-setting techniques for AI-team collaboration such as ways to improve collaborative decision outcomes and quality of contributions.
- To Err is AI: Imperfect Interventions and Repair in a Conversational Agent Facilitating Group Chat DiscussionsHyo Jin Do, Ha-Kyung Kong, Pooja Tetali, Jaewook Lee, and Brian P. BaileyProc. ACM Hum.-Comput. Interact., Apr 2023
Conversational agents (CAs) can analyze online conversations using natural language techniques and effectively facilitate group discussions by sending supervisory messages. However, if a CA makes imperfect interventions, users may stop trusting the CA and discontinue using it. In this study, we demonstrate how inaccurate interventions of a CA and a conversational repair strategy can influence user acceptance of the CA, members’ participation in the discussion, perceived discussion experience between the members, and group performance. We built a CA that encourages the participation of members with low contributions in an online chat discussion in which a small group (3-6 members) performs a decision-making task. Two types of errors can occur when detecting under-contributing members: 1) false-positive (FP) errors happen when the CA falsely identifies a member as under-contributing and 2) false-negative (FN) errors occur when the CA misses detecting an under-contributing member. We designed a conversational repair strategy that gives users a chance to contest the detection results and the agent sends a correctional message if an error is detected. Through an online study with 175 participants, we found that participants who received FN error messages reported higher acceptance of the CA and better discussion experience, but participated less compared to those who received FP error messages. The conversational repair strategy moderated the effect of errors such as improving the perceived discussion experience of participants who received FP error messages. Based on our findings, we offer design implications for which model should be selected by practitioners between high precision (i.e., fewer FP errors) and high recall (i.e., fewer FN errors) models depending on the desired effects. When frequent FP errors are expected, we suggest using the conversational repair strategy to improve the perceived discussion experience.
- Follow the Successful Herd: Towards Explanations for Improved Use and Mental Models of Natural Language SystemsMichelle Brachman, Qian Pan, Hyo Jin Do, Casey Dugan, Arunima Chaudhary, and 9 more authorsIn Proceedings of the 28th International Conference on Intelligent User Interfaces, Sydney, NSW, Australia, Apr 2023
While natural language systems continue improving, they are still imperfect. If a user has a better understanding of how a system works, they may be able to better accomplish their goals even in imperfect systems. We explored whether explanations can support effective authoring of natural language utterances and how those explanations impact users’ mental models in the context of a natural language system that generates small programs. Through an online study (n=252), we compared two main types of explanations: 1) system-focused, which provide information about how the system processes utterances and matches terms to a knowledge base, and 2) social, which provide information about how other users have successfully interacted with the system. Our results indicate that providing social suggestions of terms to add to an utterance helped users to repair and generate correct flows more than system-focused explanations or social recommendations of words to modify. We also found that participants commonly understood some mechanisms of the natural language system, such as the matching of terms to a knowledge base, but they often lacked other critical knowledge, such as how the system handled structuring and ordering. Based on these findings, we make design recommendations for supporting interactions with and understanding of natural language systems.
2022
- How Should the Agent Communicate to the Group? Communication Strategies of a Conversational Agent in Group Chat DiscussionsHyo Jin Do, Ha-Kyung Kong, Jaewook Lee, and Brian P. BaileyProc. ACM Hum.-Comput. Interact., Nov 2022
In online group discussions, balanced participation can improve the quality of discussion, members’ satisfaction, and positive group dynamics. One approach to achieve balanced participation is to deploy a conversational agent (CA) that encourages participation of under-contributing members, and it is important to design communication strategies of the CA in a way that is supportive to the group. We implemented five communication strategies that a CA can use during a decision-making task in a small group synchronous chat discussion. The five strategies include messages sent to two types of recipients (@username vs. @everyone) crossed by two separate channels (public vs. private), and a peer-mediated strategy where the CA asks a peer to address the under-contributing member. Through an online study with 42 groups, we measured the balance of participation and perceptions about the CA by analyzing chat logs and survey responses. We found that the CA sending messages specifying an individual through a private channel is the most effective and preferred way to increase participation of under-contributing members. Participants also expressed that the peer-mediated strategy is a less intrusive and less embarrassing way of receiving the CA’s messages compared to the conventional approach where the CA directly sends a message to the under-contributing member. Based on our findings, we discuss trade-offs of various communication strategies and explain design considerations for building an effective CA that adapts to different group dynamics and situations.
2021
- Do You Have Time for a Quick Chat? Designing a Conversational Interface for Sexual Harassment Prevention TrainingHyo Jin Do, Seon Hye Yang, Boo-Gyoung Choi, Wayne T. Fu, and Brian P. BaileyIn Proceedings of the 26th International Conference on Intelligent User Interfaces, College Station, TX, USA, Nov 2021
Sexual harassment (SH) incidents are increasing and call into question the effectiveness of traditional SH prevention training. In this paper, we introduce a proof-of-concept design of a conversational interface (CI) for understanding SH cases. Key features of the interface include that it engages the learner in a dyadic conversation, prompts the learner for guidance, and tells a story of SH from a first-person perspective. From a mixed-methods study (N=32), learners experiencing a SH vignette using the conversational interface reported feeling less overwhelmed with the content, more engaged with the situation, and more comfortable discussing the topic compared to reading the same vignette online. Participants also reported that using a first-person narrative made the vignette feel realistic and relatable. However, there was no difference in empathy between the conditions. We discuss these results and implications for designing effective SH prevention training.