My Research Statement

As data volumes grow, the boundary between information and noise becomes harder to draw; my work addresses this by developing languages, systems, and formal foundations for making sense of unbounded information with correctness guarantees. Specifically, I work on continuous and declarative processing of rapidly evolving, theoretically unbounded data streams, combining system building with rigorous theoretical work.

I lead an independent research program supported by competitive funding. I am the principal investigator of the French national ANR JCJC project POLYFLOW, which develops a unifying framework for continuous information integration across heterogeneous streaming data models.

I also lead a smaller funded project on the Knowledge Graph creation and usage, which studies knowledge evolution in volatile, multi-modal media and healthcare. In parallel, I collaborate with industry partners such as EsperTech, Neo4j, Confluent, and Bloomberg, both through applied research projects and by helping them clarify the formal underpinnings of their systems. These collaborations inform my research questions and ground my abstractions in real-world constraints.

Research Approach

My work sits between systems and formal foundations. My engineering background pushes me to prototype ideas quickly and use running systems as tools for thinking. A typical project proceeds in three stages. First, we build an artifact, i.e., a prototype engine, library, but also a dataset or a benchmark, to test and refine intuition. Second, we observe its behavior and use these observations to sharpen the problem formulation. Third, we capture the refined problem in a precise mathematical model and develop formal properties such as soundness, completeness, or decidability. The artifacts, software, datasets, and knowledge graphs produced along the way are research contributions and provide a concrete entry point for students and collaborators.

I maintain a consistent commitment to open artifacts. Beyond software libraries, I share datasets, ontologies, and knowledge graphs, in line with the original vision of the semantic web. These artifacts make my work reproducible, reusable, and extensible by others, and they serve as a bridge between theoretical results and applied systems.

My research reflects this philosophy: I developed systems (Tommasini et al., 2017; Tommasini et al., 2021; Tommasini et al., 2016), and data catalogs and knowledge graphs (Tommasini et al., 2023; Tommasini et al., 2018; Tommasini et al., 2020). I participate in the development of demonstrations and tutorials, and I encourage my students to do the same. Indeed, they are great tools to consolidate work and disseminate results.

Stream Reasoning and Semantic Web

During my PhD, I joined the stream reasoning community in the Semantic Web context and worked on methods and tools for Streaming Linked Data. This area tackles the problem of publishing, discovering, and processing streaming data on the web using semantic technologies. Three milestones in this line are:

A research line on big data benchmarking, developed further with my first PhD student, Mohamed Ragab, at the University of Tartu. This work provides systematic ways to assess and compare streaming semantic web systems under realistic workloads.

A Springer monograph on stream reasoning, co-authored with my PhD supervisor Emanuele Della Valle and long-term collaborator Pieter Bonte, which consolidates the state of the art and articulates open problems. Moreover, RSP4J, a Java library for rapid prototyping of Streaming Linked Data systems that aggregates ideas and abstractions from stream reasoning research. It received the Best Resource Paper award at ESWC 2021 and evolved in the PolyFlow project as a tool for multi-modal stream processing.

Streaming Property Graphs and Knowledge Evolution

After completing my PhD, my focus shifted towards the core of data management. A pivotal moment was my participation in the 2019 Dagstuhl seminar on Big Graph Systems, which triggered a transition away from RDF-centric streaming to property graphs.

In collaboration with Neo4j and Bloomberg, I contributed to designing an extension of Cypher for continuous querying over property graphs, bridging the gap between graph query languages and stream processing. This work addresses the expression and evaluation of continuous queries over evolving graph structures, with applications in monitoring, fraud detection, and real-time analytics.

I have continued to contribute to the semantic web community by constructing and maintaining knowledge graphs for complex, volatile information. The Internet Meme Knowledge Graph (IMKG) (Tommasini et al., 2023) models internet memes as evolving entities with rich semantics. This work is part of a broader inquiry into knowledge evolution (Polleres et al., 2023): how meaning changes over time and how to represent such changes for querying.

Event Processing, Streaming Systems, and Language Semantics

My interest in systems has led me to study the internals of stream processing engines from a distributed systems perspective and to investigate the interaction between system guarantees and the semantics of continuous query languages. This has two main strands.

First, I analyse how assumptions about time, ordering, and delivery in distributed systems shape the behaviour of event processing engines. Second, I work on formalising the semantics of continuous query languages, drawing on tools from logic and programming languages. Recent work within the PolyFlow project reifies high-level language abstractions in the RSP4J library (Tommasini et al., 2021), enabling us to experiment with alternative semantics and evaluate their impact on system design and performance. These efforts are complemented by tutorials and demonstrations, where I present principled views on stream processing to both academic and practitioner audiences.

Current and Future Research Programme

In the short term, my focus is the POLYFLOW project (terminating in 2027), which studies continuous information integration for complex data models such as graphs, nested objects, and unstructured data, under streaming conditions. The goal is to identify the fundamental building blocks of continuous query languages that operate over heterogeneous data, and to develop a theory that supports their evaluation under continuous semantics. Although theoretical in nature, this work is tightly linked to system design; the aim is not only to understand what is possible, but also to provide guidelines that implementers can apply when building next-generation stream processors.

Looking further ahead, I plan to deepen my work on knowledge evolution, taking complex media such as internet memes as a challenging form of knowledge. This line is inherently interdisciplinary, touching linguistics, psychology, and creativity studies. I lead a small, funded project that aims at maintaining the IMKG and extends it to capture meme semantics using LLMs and semantic frames. The “fruit fly” nature of memes, i.e., their rapid mutation and short lifespan, makes them an ideal probe for modeling the evolution of meaning under social and temporal pressure. The broader objective is to derive general principles for representing and querying evolving data in volatile domains.

A third thread concerns correctness guarantees for continuous computations, with a particular focus on stream segmentation. Streaming systems rely on window operators to cut infinite streams into finite segments. In practice, users must choose window parameters via trial and error, with a limited understanding of the correctness implications.

I plan to investigate a theory of segmentation that reduces the burden on users by deriving or adapting windowing strategies from high-level requirements and data properties. This idea builds on recent theoretical work that has shown that the “window validity” problem for forward-looking Datalog becomes decidable under mild assumptions on input data. My goal is to design an adaptive streaming system whose segmentation behavior is not hard for the user but adjusted automatically, while providing correctness guarantees.

References

Conference Articles

  1. RSPLab: RDF Stream Processing Benchmarking Made Easy
    Riccardo Tommasini, Emanuele Della Valle, Andrea Mauri , and 1 more author
    In The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II , 2017
  2. RSP4J: An API for RDF Stream Processing
    Riccardo Tommasini, Pieter Bonte, Femke Ongenae , and 1 more author
    In The Semantic Web - 18th International Conference, ESWC 2021, Virtual Event, June 6-10, 2021, Proceedings , 2021
  3. Heaven: A Framework for Systematic Comparative Research Approach for RSP Engines
    Riccardo Tommasini, Emanuele Della Valle, Marco Balduini , and 1 more author
    In The Semantic Web. Latest Advances and New Domains - 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Proceedings , 2016
  4. IMKG: The Internet Meme Knowledge Graph
    Riccardo Tommasini, Filip Ilievski, and Thilini Wijesiriwardene
    In The Semantic Web - 20th International Conference, ESWC 2023, Hersonissos, Crete, Greece, May 28 - June 1, 2023, Proceedings , 2023
  5. VoCaLS: Vocabulary and Catalog of Linked Streams
    Riccardo Tommasini, Yehia Abo Sedira, Daniele Dell’Aglio , and 5 more authors
    In The Semantic Web - ISWC 2018 - 17th International Semantic Web nce, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part II , 2018
  6. A First Step Towards a Streaming Linked Data Life-Cycle
    Riccardo Tommasini, Mohamed Ragab, Alessandro Falcetta , and 2 more authors
    In The Semantic Web - ISWC 2020 - 19th International Semantic Web Conference, Athens, Greece, November 2-6, 2020, Proceedings, Part II , 2020

Journal Articles

  1. How Does Knowledge Evolve in Open Knowledge Graphs?
    Axel Polleres, Romana Pernisch, Angela Bonifati , and 11 more authors
    TGDK, 2023



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Public Speaking
  • Docker 101
  • My Teaching Statement
  • Overview of Snowflake Dedicated Services
  • Data engineering beyond just managing data