4 October 2023
‘Video-AI, also referred to as computer vision, is a field of scientific inquiry that aims to develop techniques for computers to automate tasks that the human visual system can also do, and maybe even tasks that we are incapable of doing. These tasks include processing, analysing, and understanding sequences of digital images (videos)’, explains Marie Rosenkrantz Lindegaard.
Tobias Blanke adds: ‘Video-AI is a fast-growing research field that showcases how much AI has advanced. When I was a student of computer science 20 years ago, we mainly worked with text data. Letting computers ‘see’ was definitely quite a distant dream. Now AI is getting better and better at the automated analysis of video data, which is key to e.g. self-driving cars or creating whole new films and animations. For cultural studies, video-AI is at the heart of work on the relations of human creativity and new types of computational creativity. How computers see and create, helps us understand a lot about how we see and create ourselves.’
Tobias shares an example that illustrates the importance of ethics in video-AI: ‘Automated facial recognition from video surveillance, for instance, stands maybe like no other technology for the ethical and political challenges of AI. Its history offers us many fascinating examples of the limits of techno-solutionism and how we need new ways of co-creating and evaluating AI work, as we plan to research in HAVA-Lab. The 2017 Champions League final in Cardiff got infamous in ethical AI circles, because their automated facial recognition produced over 90% false positives. While automated facial recognition has advanced more since then, the problem how we evaluate such performance has not changed. The police then were happy as the technology led to hundreds of arrests, and they reassured the public that their agents were overseeing the AI’s outputs. But can we really be happy with thousands of wrong identifications as potential criminals in such cases, or do we need new ways of looking at this? These are the things we will research.’
‘Another set of famous controversies around video-AI is related to drone technologies in military applications. Project Maven is a Pentagon AI project to, among other things, create computer-vision algorithms to help military analysts. But the project is less famous for that, and rather for a letter that over 3,000 Google employees sent to their senior management, warning of the dangers of automated warfare and demanding that Google should not be in the business of war and step back from Project Maven. Google then indeed followed their employees and, at least for a while, took a step back.’
Marie: ‘The way I see it, computers can take over some tasks from humans, but humans are key for understanding what is going on. The problem is that most knowledge in the social sciences relies on what people tell us about it, and we know that this is terribly biased. For example, if you look at crime statistics, we know that in fact they tell us little about crime and the people committing crimes. They tell us what gets reported and who becomes a suspect and prosecuted for crime. I therefore started using video observations of real-life behaviour to find out how crime is really happening. A pioneering study from the US using this methodology showed that, unlike official police statistics proposing that young men of colour are the most common shop lifters, actually white middle-aged women were the ones stealing most. This was a small study. My hope is that if social scientists, who know about real-life behaviour, work together with computer scientists, we might be able to find aspects of crime, like for example stealing, that the computer can recognize, and then be able to construct larger representative samples of shop lifting for our analyses.’
Having all 7 faculties involved, makes this project truly unique. ‘Video-AI is a hard enough technological challenge in itself, but trying to at the same time address societal and application questions is simply extraordinary’, says Tobias.
Marie adds: ‘Cees Snoek [Principal Investigator of the HAVA-Lab] and I started working on questions of behavioural detection 5-10 years ago because we are interested in the same thing: understanding human behaviour in videos. But we approached it entirely differently. The main challenge is language: across disciplines, we use different languages, and even the same word can have a different meaning. Bridging these differences can be challenging. Our ambitions within the process of figuring out what is going on in videos are different, but our goals, like for example detecting subtle movements, are ultimately the same. We become enthusiastic about the same things, and that makes it fun. Maybe that is actually the most important thing when you work in complex teams.’
Tobias: ‘Controversies demonstrate that video-AI cannot be done without an interdisciplinary commitment. It starts with societal, ethical and cognitive perspectives on video-AI. How does it relate to fundamental values and incorporate privacy? But also: where are the human decisions in the production processes that make video-AI? The human-machine relationship is often difficult to disentangle in AI productions. This is where traditional humanities and social science expertise comes in, and where some of the most exciting research in Humane AI is currently happening. How are training datasets constructed and what might be missing in them? What decisions are based on which algorithms, and what are their limitations when brought into the real world? HAVA addresses a range of direct application domains for video-AI, from diagnostics in medical training to responsible marketing. These are again very much driven by the diverse needs of various disciplines.’
Marie gives an example: ‘Let’s say that computer vision scientists would like to develop an algorithm that can detect robberies. They need to know what a robbery looks like before they develop the algorithm. Robberies do not happen as they are portrayed in movies. They are more complex than that. Selecting unbiased training material is key when you develop such tools, and when the tools are applied, they potentially infer societal biases that are immense and problematic. If we start detecting violence in public with AI, but only include attention to youth violence, we might forget that there are other types too, and that youth might actually not be the main problem for public safety. What kind of aggression is getting detected and what not? Computer vision scientists need people from other faculties to figure out what they should be detecting, and to study the implications for what gets detected and what not. This can only be done if we work together and keep discussing what kinds of tools we want to invest in and why.’
The discussion of responsible digital transformations and the actual development of video-AI algorithms, are still very much separate spheres. When asked what the HAVA-Lab will accomplish, Tobias answers: ‘As far as I can tell, there is very little research yet into the production processes of video-AI from the perspective of ethical orientations. The HAVA-Lab will hopefully contribute to changing the production processes of video-AI, and make them more ethically aligned.’
Marie: ‘I hope we will figure out how humans and computers can work together better, to detect behaviour that we as society find important and relevant to be concerned about. We should not allow the technological industry to determine this, and claim a seat at the table instead of crying and being scared in the corner. Public surveillance and AI will not go away, and it should not because we can use it to make societies better for everyone. We just need to figure out how. The HAVA-Lab is an attempt to figuring that out together.’