LLM Explainability | Hyo Jin (Gina) Do

LLMs can generate factually incorrect information that appears plausible, a phenomenon known as “hallucination”. Hallucinations pose risks not only for companies that leverage LLMs, but also for end users. For example, Google’s stock value dropped after its AI-powered product generated factual errors during a public demonstration; Air Canada was sued due to false information given by its AI-powered chatbot; and a lawyer was reprimanded by judges for referencing hallucinated case law.

There have been technical advancements in calculating factuality estimates, i.e., the assessment of how factual an AI-generated response is. Presenting an AI-generated response with factuality estimates allows users to recognize incorrect information, thus helping them to cross-check verified sources and make better decisions. We conducted a series of survey-based experiments aimed at identifying the most effective strategy for communicating the factuality estimates of an LLM’s response to users in a way that helps them comprehend the accuracy of the model’s response and calibrate their trust while aligning with their preferences.

In (Do et al., 2025), we found that highlighting every phrase in a response using a color scale for its factuality estimate was the most preferred and trusted design.

In a follow-up study (Do & Geyer, 2025), we found a promising alternative: hiding content estimated to be less factual, either by removing it or replacing it with ambiguous statements, can enhance user trust while maintaining perceived quality and transparency.

References

2025

Highlight All the Phrases: Enhancing LLM Transparency through Visual Factuality Indicators

Hyo Jin Do, Rachel Ostrand, Werner Geyer, Keerthiram Murugesan, Dennis Wei, and 1 more author

In Proceedings of the Eighth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES 2025), 2025

Abs PDF

Large language models (LLMs) are susceptible to generating inaccurate or false information, often referred to as "hallucinations" or "confabulations." While several technical advancements have been made to detect hallucinated content by assessing the factuality of the model’s responses, there is still limited research on how to effectively communicate this information to users. To address this gap, we conducted two scenario-based experiments with a total of 208 participants to systematically compare the effects of various design strategies for communicating factuality scores by assessing participants’ ratings of trust, ease in validating response accuracy, and preference. Our findings reveal that participants preferred and trusted a design in which all phrases within a response were color-coded based on factuality scores. Participants also found it easier to validate accuracy of the response in this style compared to a baseline with no style applied. Our study offers practical design guidelines for LLM application developers and designers, aimed at calibrating user trust, aligning with user preferences, and enhancing users’ ability to scrutinize LLM outputs.
Hide or Highlight: Understanding the Impact of Factuality Expression on User Trust

Hyo Jin Do, and Werner Geyer

In Proceedings of the Eighth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES 2025), 2025

PDF