Voor de beste ervaring schakelt u JavaScript in en gebruikt u een moderne browser!
Je gebruikt een niet-ondersteunde browser. Deze site kan er anders uitzien dan je verwacht.
The Spotlight introduces a different Data Science Centre Affiliate Member every month. This month: Melvin Wevers, Assistant Professor in Digital History at the Faculty of Humanities.

Can you tell us more about your role and how you apply data science to your projects?

I work a lot with historical data sets. This brings to the surface all kinds of complications that you often don't have when you work with contemporary data or very “clean” data sets that are used to train machine learning algorithms.

For me, the application of data science comes from two directions. One, I use it as a historian to ask questions like: what is in the data? What is not in the data? Can we explain why something might not be in the historical data? It's basically this notion of bias, but using bias as a window into something that happened in the past. Two, I try to look at how historians in the past have done their work and think about whether I could verify or quantify their approach through computational methods.

Is there a project from this past year that you are most proud of?

One of my projects involves constructing a dataset based on 20th century housing registry cards of the city of Amsterdam. These handwritten or typed cards exist for every single street and address in the city, and captures the people that lived there, how old they were when they moved in, when they moved out, and where they moved to. This allows us to trace how people move throughout the city, so migration is one of the things you can map, and to bring clarity to unfolding debates on whether the Amsterdam City government pushed certain people to certain neighborhoods. It’s a very promising approach, and something I'm quite proud of right now. 

What do you like most about being a DSC member?

I occasionally collaborate with people in the Science faculties, so the DSC was a great way to get to know people. We often forget that people in different disciplines work on similar problems, and I appreciate the work of the DSC in making bridges between disciplines.

What is your favourite data science method?

I like to use metrics that were developed for other disciplines like physics or medicine, even something like, say, time series analysis used in geography that one could then transpose to the historical domain.

So, on the one hand, you have these big, complicated neural networks that we can run on archives to do classification tasks, but on the other, these simple metrics have a lot of intricate science behind them. In time series analysis, there’s something called barycenter averaging that can cluster time series data effectively. There's a whole toolbox of these kinds of specific approaches to specific tasks that I quite like.

You recently gave a talk called “Rethinking Digital Infrastructures: What a Computation Historian Needs”, where you identified some emergent issues or bottlenecks in the Digital Humanities. Could you tell us more about that?

There is remarkable work being done to create technical infrastructures that allow researchers to explore data sets with simple search queries. But once people want to do something more analytical with the data other than, say, reading a historical source, these infrastructures often do not cater to these people, or they’ve become so bloated that it prevents people from pursuing this approach.

In the Digital Humanities, there is this binary between “distant reading” and “close reading”, where distant reading is a broader overview that generates all kinds of descriptive statistics about the data, and its opposite which is actually going to the sources and reading them. But there is no connecting layer, and I think these research infrastructures can sometimes be a bottleneck preventing people from finding mechanisms or applying methods to connect what we see in the distant reading level to the micro level or vice versa.

Are you camp Python/R/or something else?

Team Python, but if I do statistics, then I turn to R. That’s not because I like R, because I hate it and it's a horrible language, but there's just more resources for statistics.