My Research Statement

As data volumes grow, the boundary between information and noise becomes harder to draw; my work addresses this by developing languages, systems, and formal foundations for making sense of unbounded information with correctness guarantees. Specifically, I work on continuous and declarative processing of rapidly evolving, theoretically unbounded data streams, combining system building with rigorous theoretical work.

I lead an independent research program supported by competitive funding. I am the principal investigator of the French national ANR JCJC project POLYFLOW, which develops a unifying framework for continuous information integration across heterogeneous streaming data models.

I also lead a smaller funded project on the Knowledge Graph creation and usage, which studies knowledge evolution in volatile, multi-modal media and healthcare. In parallel, I collaborate with industry partners such as EsperTech, Neo4j, Confluent, and Bloomberg, both through applied research projects and by helping them clarify the formal underpinnings of their systems. These collaborations inform my research questions and ground my abstractions in real-world constraints.

Research Approach

My work sits between systems and formal foundations. My engineering background pushes me to prototype ideas quickly and use running systems as tools for thinking. A typical project proceeds in three stages. First, we build an artifact, i.e., a prototype engine, library, but also a dataset or a benchmark, to test and refine intuition. Second, we observe its behavior and use these observations to sharpen the problem formulation. Third, we capture the refined problem in a precise mathematical model and develop formal properties such as soundness, completeness, or decidability. The artifacts, software, datasets, and knowledge graphs produced along the way are research contributions and provide a concrete entry point for students and collaborators.

I maintain a consistent commitment to open artifacts. Beyond software libraries, I share datasets, ontologies, and knowledge graphs, in line with the original vision of the semantic web. These artifacts make my work reproducible, reusable, and extensible by others, and they serve as a bridge between theoretical results and applied systems.

My research reflects this philosophy: I developed systems (Tommasini et al., 2017; Tommasini et al., 2021; Tommasini et al., 2016), and data catalogs and knowledge graphs (Tommasini et al., 2023; Tommasini et al., 2018; Tommasini et al., 2020). I participate in the development of demonstrations and tutorials, and I encourage my students to do the same. Indeed, they are great tools to consolidate work and disseminate results.

Stream Reasoning and Semantic Web

During my PhD, I joined the stream reasoning community in the Semantic Web context and worked on methods and tools for Streaming Linked Data. This area tackles the problem of publishing, discovering, and processing streaming data on the web using semantic technologies. Three milestones in this line are:

A research line on big data benchmarking, developed further with my first PhD student, Mohamed Ragab, at the University of Tartu. This work provides systematic ways to assess and compare streaming semantic web systems under realistic workloads.

A Springer monograph on stream reasoning, co-authored with my PhD supervisor Emanuele Della Valle and long-term collaborator Pieter Bonte, which consolidates the state of the art and articulates open problems. Moreover, RSP4J, a Java library for rapid prototyping of Streaming Linked Data systems that aggregates ideas and abstractions from stream reasoning research. It received the Best Resource Paper award at ESWC 2021 and evolved in the PolyFlow project as a tool for multi-modal stream processing.

Streaming Property Graphs and Knowledge Evolution

After completing my PhD, my focus shifted towards the core of data management. A pivotal moment was my participation in the 2019 Dagstuhl seminar on Big Graph Systems, which triggered a transition away from RDF-centric streaming to property graphs.

In collaboration with Neo4j and Bloomberg, I contributed to designing an extension of Cypher for continuous querying over property graphs, bridging the gap between graph query languages and stream processing. This work addresses the expression and evaluation of continuous queries over evolving graph structures, with applications in monitoring, fraud detection, and real-time analytics.

I have continued to contribute to the semantic web community by constructing and maintaining knowledge graphs for complex, volatile information. The Internet Meme Knowledge Graph (IMKG) (Tommasini et al., 2023) models internet memes as evolving entities with rich semantics. This work is part of a broader inquiry into knowledge evolution (Polleres et al., 2023): how meaning changes over time and how to represent such changes for querying.

Event Processing, Streaming Systems, and Language Semantics

My interest in systems has led me to study the internals of stream processing engines from a distributed systems perspective and to investigate the interaction between system guarantees and the semantics of continuous query languages. This has two main strands.

First, I analyse how assumptions about time, ordering, and delivery in distributed systems shape the behaviour of event processing engines. Second, I work on formalising the semantics of continuous query languages, drawing on tools from logic and programming languages. Recent work within the PolyFlow project reifies high-level language abstractions in the RSP4J library (Tommasini et al., 2021), enabling us to experiment with alternative semantics and evaluate their impact on system design and performance. These efforts are complemented by tutorials and demonstrations, where I present principled views on stream processing to both academic and practitioner audiences.

Current and Future Research Programme

In the short term, my focus is the POLYFLOW project (terminating in 2027), which studies continuous information integration for complex data models such as graphs, nested objects, and unstructured data, under streaming conditions. The goal is to identify the fundamental building blocks of continuous query languages that operate over heterogeneous data, and to develop a theory that supports their evaluation under continuous semantics. Although theoretical in nature, this work is tightly linked to system design; the aim is not only to understand what is possible, but also to provide guidelines that implementers can apply when building next-generation stream processors.

Looking further ahead, I plan to deepen my work on knowledge evolution, taking complex media such as internet memes as a challenging form of knowledge. This line is inherently interdisciplinary, touching linguistics, psychology, and creativity studies. I lead a small, funded project that aims at maintaining the IMKG and extends it to capture meme semantics using LLMs and semantic frames. The “fruit fly” nature of memes, i.e., their rapid mutation and short lifespan, makes them an ideal probe for modeling the evolution of meaning under social and temporal pressure. The broader objective is to derive general principles for representing and querying evolving data in volatile domains.

A third thread concerns correctness guarantees for continuous computations, with a particular focus on stream segmentation. Streaming systems rely on window operators to cut infinite streams into finite segments. In practice, users must choose window parameters via trial and error, with a limited understanding of the correctness implications.

I plan to investigate a theory of segmentation that reduces the burden on users by deriving or adapting windowing strategies from high-level requirements and data properties. This idea builds on recent theoretical work that has shown that the “window validity” problem for forward-looking Datalog becomes decidable under mild assumptions on input data. My goal is to design an adaptive streaming system whose segmentation behavior is not hard for the user but adjusted automatically, while providing correctness guarantees.

Research Approach

Stream Reasoning and Semantic Web

Streaming Property Graphs and Knowledge Evolution

Event Processing, Streaming Systems, and Language Semantics

Current and Future Research Programme

References

Enjoy Reading This Article?