3 February 2025
The first conversation between a human and a machine occurred in 1966 with the development of ELIZA, the first chatbot using natural language processing (NLP). ELIZA could identify keywords from the user input and match them to a pre-programmed answer. However, it didn't really understand the conversation, often generating strange answers. Nowadays, it's normal to talk to a chatbot and have a functional conversation. How did these tools improve and how can we make them even better?
This is what Raquel Fernández, Professor at our UvA Institute for Logic, Language & Computation (ILLC), is working on. Her research group mainly focusses on “language in context”. Fernández explains: ‘We look at how language is used together with other types of information, for example visual information.’ Her team investigates how to computationally model these interactions.
An application of this is creating a model that can automatically describe images. Fernández: ‘For users that aren't able to see things around them because of a visual impairment for example, we need a system that automatically tells them in natural language what's there. We are developing models to enable this.’
While advanced systems can already generate image descriptions, it's still challenging. ‘When you see an image and you’re asked what's in it, you are not going to describe everything you see. So, selecting what is worth saying is already a challenge,’ says Fernández Rovira. Additionally, the style in which you say something varies and depends on the context.
So how do you tackle these challenges? A machine learning model can learn how to describe images by training it on different types of information. The most common type is descriptions given by people. Fernández: ‘We designed a system that also learns from eye tracking data, so information about where people are looking when they describe the image. This reveals what they find important.’
Gesturing is another type of visual information, which is very important in communication. Fernández: ‘When we’re talking face-to-face, we use a lot of clues that go beyond what we're saying, such as moving your hands or nodding. Gestures come very naturally to us, and it's part of our communication. A virtual avatar which doesn't gesture, for example, would be very unnatural.’
Her research group has therefore developed a system that can automatically detect gestures in videos of conversations. The researchers created this system together with cognitive scientists from the Max Planck Institute for Psycholinguistics in Nijmegen. This technology is a very useful tool for the scientists in Nijmegen, because they can now study gestures without having to keep track of them manually.
Fernández: ‘I think this was a really nice collaboration, where we were creating something for fundamental research, but it was transferred to another field, cognitive sciences, where it has more practical use in their studies.’
To create these computational models, Fernández's research group uses machine learning as a core tool. Fernández explains: ‘We rely on data, for example human descriptions of images, and machine learning systems learn from this data. Before machine learning was so widely used, the approaches would be more manual, so the analysis would be on smaller data sets.’
In recent years, there has been a huge improvement in computer-generated text. Fernández notes: ‘In natural language processing, we have seen an amazing improvement. Previously, creating a system that generates language in any natural way was very difficult. Now we have all these systems that generate very fluent language.’
Although systems like ChatGPT generate fluent text, the output isn't always correct or appropriate. The systems can also behave differently across languages. They are typically more proficient in English, which can create disadvantages for speakers of other languages.
Despite this, many people trust these technologies, with some even using ChatGPT as a search engine. However, it remains unclear how confident the model is in its answers or whether they are accurate. Fernández: ‘My research group is working on capturing the level of uncertainty of a model and figuring out the best way for the model to express that. It is very important to give that information to the user, so that the technology can be trusted.’
These language technologies are widely used, making it crucial to ensure they are trustworthy and robust – a goal Fernández's research group is dedicated to contributing to. ‘I hope that our researchers can have impact by making these technologies more fair, trustworthy, and overall better.’