DSC Member Spotlight: Jan-Christoph Kalo

12 juni 2025

The Spotlight introduces a different Data Science Centre Affiliate Member every month. This month: Jan-Christoph Kalo, Assistant Professor at the Intelligent Data Engineering Lab (INDElab) of the Informatics Institute, Faculty of Science.

Can you tell us more about your role and how you apply data science to your projects?

My research focuses on how knowledge can be meaningfully represented, integrated, and accessed across different modalities, languages, and structures. As more knowledge is encoded in unstructured and opaque formats, ranging from natural language text to LLMs, I study how we can restore semantic clarity, ensure consistency, and bridge representational gaps. LLMs play a central role in my work, both as a method and as a data source. I use them to extract structure from text, interface with complex data like knowledge graphs or statistical tables, and support tasks like classification, reasoning, and data alignment. This allows me to build systems that do not just retrieve information, but integrate and make sense of it in a more transparent and robust way.

Is there a project from this past year that you are most proud of?

Since last year, I have been supervising a PhD student in collaboration with CBS, the Dutch national statistics office. CBS provides thousands of open datasets on topics like population, economy, education, and healthcare. These datasets are valuable but hard to use, as they are structured in large and complex tables that often require expert knowledge to interpret.

Our goal is to make this information more accessible through natural language interfaces. We are developing methods that allow people to ask questions in plain language and receive meaningful, data-driven answers. This is typically a task carried out by trained statisticians, so automating it involves both technical and linguistic challenges. The data is rich but highly specific, and understanding the meaning of the columns often requires external background knowledge. For me, this project is exciting because it combines technically challenging problems in semantic parsing and reasoning with a clear societal goal: improving public access to governmental data.

What do you like most about being a DSC member?

I like seeing how others use data science methods to solve real problems in different fields. It helps me step outside the technical details of my own work and think more broadly. Earlier this year, I gave a workshop on extracting structured data from text, and I really appreciated the practical questions people brought in. These kinds of interdisciplinary exchanges keep me grounded and often lead to new directions in my own research.

What is your favourite data science method?

One method I find especially powerful is using language models for classification. Many tasks in data integration, such as detecting errors or aligning schema elements, can be reframed as simple classification problems. This allows us to leverage the background knowledge embedded in these models in surprisingly effective ways. It has shifted how I think about knowledge representation and opened up new approaches to building systems that are both flexible and robust.

Are you camp Python/R/or something else?

Mostly Python. In applied data science and NLP, Python has become the standard, with a broad ecosystem of tools and libraries that make experimentation fast and reproducible. But I am not tied to any particular language. I often use whatever tool fits the task best - whether that is SPARQL for structured queries, SQL for data preparation, or even simple shell scripts for automation.

Dr. rer. nat. J.C. (Jan-Christoph) Kalo

Faculteit der Natuurwetenschappen, Wiskunde en Informatica

Informatics Institute

j.c.kalo@uva.nl