HUE: bridging AI representations to Human-Understandable Explanations

When considering explanations for the behavior of an AI model, for example explanations of the kind “what did the model consider important when producing this output”, confirmation bias can lead us to believe a machine is trustworthy because a few explanations comply with our beliefs.

To prevent these situations, the HUE project will attempt to mitigate confirmation bias by investigating how explanations connect to human-understandable concepts. If successful, our method would allow us to ‘x-ray’ AI and verify whether it complies with our requirements, or rather exhibits harmful behaviors. Building on an existing conceptual framework, this work will connect different disciplines by testing and extending the framework from Medical AI to Natural Language Processing and Computer Vision. Its application to medical use cases is already of interest to industrial partners.

Project team:

Giovanni Cinà (FdG)
Sandro Pezzelle (FNWI)

HUE: bridging AI representations to Human-Understandable Explanations

Cookie Consent