<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://riccardotommasini.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://riccardotommasini.com/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-01-13T08:26:54+00:00</updated><id>https://riccardotommasini.com/feed.xml</id><title type="html">blank</title><subtitle>I am Associate Professor (Maître de conférences) at INSA Lyon, France. Member of BD Team, LIRIS Lab. I have a PhD in Computer Science from Politecnico di Milano, Italy.  My research interestes include Stream Processing, Semantic Web, Stream Reasoning, Linked Data, and Graph Databases Systems
</subtitle><entry><title type="html">Docker 101</title><link href="https://riccardotommasini.com/blog/2026/docker-101/" rel="alternate" type="text/html" title="Docker 101" /><published>2026-01-02T00:00:00+00:00</published><updated>2026-01-02T00:00:00+00:00</updated><id>https://riccardotommasini.com/blog/2026/docker-101</id><content type="html" xml:base="https://riccardotommasini.com/blog/2026/docker-101/"><![CDATA[<p>!! This post is a working in progress !!</p>

<p>In my <a href="/courses/">courses</a>, I use <a href="https://www.docker.com">Docker</a> a lot. Usually, I rely on the extremely well done material by Jerome Petazzoni at <a href="https://container.training">container.training</a>. However, I decided that it might be useulful to outline here the class I usually deliver to my students, to they can follow in a self-paced manner.</p>

<h2 id="what-is-docker">What Is Docker</h2>

<p>Docker is a command line interfaces plus a bunch of processes that abastract and simplify the usage of Linux Containers. It has the goal of automate and standardize software delivery and deployment. Initially, Docker leveraged LXC containers, but later one the runtime was replaced by <em>containerd</em>.</p>

<h3 id="why-is-docker-not-a-lightweight-virtual-machine">Why is docker NOT a “lightweight virtual machine”</h3>

<p>Generally speaking, containers are a virtualisation technique. However, this is far from being a virtual machine. The missing piece is… the hypervisor.</p>

<p><img src="https://raw.githubusercontent.com/collabnix/dockerlabs/master/beginners/images/difference-vm-containers.png" alt="Image description" width="600px" /></p>

<h3 id="why-do-we-care">Why do we care?</h3>

<p>This means that docker is architecture specific. Running dockersized software in an intel machine is different than running it on a ARM one.</p>

<h3 id="the-next-steps">The next Steps</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Launch our first container</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Creating our first image</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Dockerfiles</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Docker Networking (Very Important for Data Engineering Classes)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Volumes</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Docker Compose</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Practice Exercise</li>
</ul>

<!-- ## Launch our first container

## Creating our first image

## Dockerfiles

## Docker Networking (Very Important for Data Engineering Classes)

## Volumes

## Docker Compose

Docker compose is an automation tool. It can, under some circurstances, act as a lightweight orchestrator. However, AFAIK, it lacks the advanced scheduling that an orchestrator, e.g., Kubernetes, needs.

## Practice Exercise -->]]></content><author><name></name></author><category term="lecture" /><category term="teaching" /><category term="courses" /><category term="master classes" /><category term="docker" /><category term="lectures" /><summary type="html"><![CDATA[Everything you Need to Know About docker for my Courses]]></summary></entry><entry><title type="html">Public Speaking</title><link href="https://riccardotommasini.com/blog/2026/public-speaking/" rel="alternate" type="text/html" title="Public Speaking" /><published>2026-01-02T00:00:00+00:00</published><updated>2026-01-02T00:00:00+00:00</updated><id>https://riccardotommasini.com/blog/2026/public-speaking</id><content type="html" xml:base="https://riccardotommasini.com/blog/2026/public-speaking/"><![CDATA[<ul>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#why-me">Why Me?</a></li>
  <li><a href="#my-five-rules-for-public-speaking">My Five Rules for Public Speaking</a>
    <ul>
      <li><a href="#the-medium-is-the-message">The Medium is the Message</a>
        <ul>
          <li><a href="#what-is-the-difference">What is the difference?</a></li>
          <li><a href="#how-can-we-use-this-information">How can we use this information?</a></li>
        </ul>
      </li>
      <li><a href="#the-role-of-narrative">The role of Narrative</a></li>
      <li><a href="#the-need-for-a-method">The need for a Method</a></li>
      <li><a href="#you-need-a-jargon">You need a Jargon</a>
        <ul>
          <li><a href="#terminology">Terminology</a></li>
          <li><a href="#on-synonyms-truth-is-that-they-do-not-exist">On Synonyms: truth is that they do not exist</a></li>
          <li><a href="#syntax--semantics-reads-syntax-implies-semantics">Syntax =&gt; Semantics (reads, syntax implies semantics)</a></li>
          <li><a href="#formatting-important-vs-non-important">Formatting: Important vs Non-Important</a>
            <ul>
              <li><a href="#on-highlighting">On Highlighting</a>
                <ul>
                  <li><a href="#bad">Bad</a></li>
                  <li><a href="#good">Good</a></li>
                </ul>
              </li>
              <li><a href="#fonts-why-they-matter">Fonts: why they matter</a></li>
            </ul>
          </li>
        </ul>
      </li>
      <li><a href="#have-fun">Have Fun!</a></li>
    </ul>
  </li>
</ul>

<p>Since 2021, I have been delivering a yearly class at Lyon1 University on public speaking. The summary of the class is available below.</p>

<p>In this document, I will try to condense the essence of the lecture. That, in turn, is based on my experience as a speaker. To give it structure, I will present <em>my five rules for public speaking</em>; but do not worry, you’ll know I like to break my own rules.</p>

<p>Note: the class includes an exercise based on <a href="https://sefirot.it/fr/cicero-deck">Cicero</a>, a card deck that helps organize your thoughts for presenting. I am not affiliated nor paid by Sefirot, so this is a genuine recommendation. Given that the tool is copyright protected, I will not include here the material we use in class, but just the abstract concepts.</p>

<h2 id="introduction">Introduction</h2>

<p>Why is public speaking important in academia? The answer lies in the <em>modern</em> research cycle. For simplicity, I distinguish three phases: Vision, Execution, and Dissemination.</p>

<p><img src="/assets/img/cycle.png" alt="cycle" class="align-center" width="400px" /></p>

<ul>
  <li>
    <p><strong>Vision</strong> is about projecting your research ideas into the future. Try to guess how the scientific world will unfold and what ideas will succeed. Developing a vision is a maturity step in a researcher’s career. Many people have spoken about it, so I will just refer you to their words. I believe we should try to have a vision as soon as we feel confident about it. However, this is a step that typically occurs later in a researcher’s career path.</p>
  </li>
  <li>
    <p><strong>Execution</strong> is the technical part of research. For computer scientists, this includes coding, experiment design, statistical analysis, but also technical writing. Execution is, in my experience, where most of us start. Indeed, we borrow our initial ideas from our supervisors, who guide us in transforming them into an action plan. Students with a good technical background excel in the execution given their deep understanding of technology and the principles behind them.</p>
  </li>
  <li>
    <p><strong>Dissemination</strong> is about telling others what you did. In my experience, this is the first obstacle in the work of a student with a strong technical background. Mostly because they face two challenges: (1) they are not used to speak about something to someone who has no idea what they are talking about, and (2) they are not used to avoid unnecessary details and distill the essence of their work. In my career, I encountered several types of dissemination activities, but they can all be clustered into three main groups:</p>

    <ul>
      <li>Papers: long, curated, technical essays about the work, that aim at easing scientific validation</li>
      <li>Presentations: medium-long lectures (30 min or more) about a topic, that aim at one or more pedagogical objectives</li>
      <li>Pitches: short encounters with or without visual support that aim at being persuasive.</li>
    </ul>
  </li>
</ul>

<p>This lecture is about Presentations, but some of its goals can be passed for the other two.</p>

<h2 id="why-me">Why Me?</h2>

<p>I love speaking in public! I have sometimes defined the stage as “my safe place”. I consider myself a decent speaker. At least based on colleagues’ feedback.</p>

<p>You may ask, how did you become a decent speaker?</p>

<p>The truth is, back in 2016, I gave my first presentation at ESWC 2016, and… it was terrible. But practice makes perfection. Since then, thanks to my job, I have been giving talks everywhere, a lot. Also, as part of the researcher’s work, is to debate, discuss, etc., etc.</p>

<h2 id="my-five-rules-for-public-speaking">My Five Rules for Public Speaking</h2>

<h3 id="the-medium-is-the-message">The Medium is the Message</h3>

<p>Clearly, this is not my invention. This punchline comes from Marshall McLuhan’s seminal book “Understanding Media”. The book focuses on media, not the content that they carry, as the subject of study. McLuhan suggests that each medium affects the audience as much as the content. But what is McLuhan really talking about? Let’s proceed scientifically, and jot down some definitions.</p>

<blockquote>
  <p>A <strong>message</strong> is a discrete unit of communication intended by the source for consumption by some recipient or group of recipients.</p>
</blockquote>

<p>and</p>

<blockquote>
  <p>A <strong>Medium</strong> is an outlet that a sender uses to express meaning to their audience, and it can include written, verbal, or nonverbal elements.</p>
</blockquote>

<p>McLuhan highlights that a given unit of communication is biased by the outlet we use to share it. Let’s dig into it by examples.</p>

<p>Let’s say you want to break up with your partner. You choose these exact words, “I need to move on because I do not love you anymore”. You can share these words using any of the following mediums:</p>

<ul>
  <li>You write a letter, and you slip it under their door.</li>
  <li>You meet them, and you talk to them.</li>
  <li>You call them, and you talk to them.</li>
  <li>You send them a vocal message.</li>
  <li>You send them a written message.</li>
</ul>

<h4 id="what-is-the-difference">What is the difference?</h4>

<p>It is important not to fall into communication nihilism and to think “if communication is always <em>biased</em>, why do we even bother?” Indeed, your audience needs you. Your job is not to aim for unbiased communication but, while conscientiously avoiding logical fallacies, share your take on the topic.</p>

<p>McLuhan again, distinguishes between media such as print, photographs, radio, and movies that are said to be <strong>hot media</strong>, and media such as speech, cartoons, the telephone, and television that are considered <strong>cool media</strong>.</p>

<blockquote>
  <p>Hot media are ‘<strong>high definition</strong>’ because they are rich in sensory data. Cool media are ‘<strong>low definition</strong>’ because they provide less sensory data and consequently demand more participation or ‘completion’ by the audience (a useful mnemonic is to imagine that hot media are too hot to touch). <a href="https://www.oxfordreference.com/display/10.1093/oi/authority.20110810105107935">Reference</a>.</p>
</blockquote>

<p><a href="https://en.wikipedia.org/wiki/Understanding_Media">A clarification from wikipedia</a>: 
Film, for example, is defined as a hot medium, since in the context of a dark movie theater, the viewer is completely captivated, and one primary sense—visual—is filled in high definition. In contrast, television is a cool medium, since many other things may be going on and the viewer has to integrate all of the sounds and sights in the context.</p>

<p>Critics of McLuhan’s idea contend that the level of audience involvement is not primarily determined by the medium alone, although its capabilities may have some influence. Instead, they argue that audience engagement depends more on the content being presented and how the medium is utilized in particular situations and contexts.</p>

<h4 id="how-can-we-use-this-information">How can we use this information?</h4>

<p>Speech is considered (by McLuhan) a cold medium, because it involves multiple senses with low deﬁnition and, thus, it demands the audience participation.</p>

<ul>
  <li>The role of the presenter is “Warming it Up”;</li>
  <li>Predicting the questions;</li>
  <li>Echoing each main point multiple times.</li>
</ul>

<h3 id="the-role-of-narrative">The role of Narrative</h3>

<h3 id="the-need-for-a-method">The need for a Method</h3>

<h3 id="you-need-a-jargon">You need a Jargon</h3>

<p>As paradoxial as it sounds, it is up to you to define the language of your talk. This is not completely free choice, it highly depends on <em>who</em> your audience is.
Nevertheless, there are some foundamental rules that we can recall in order to set up our language.</p>

<h4 id="terminology">Terminology</h4>

<h4 id="on-synonyms-truth-is-that-they-do-not-exist">On Synonyms: truth is that they do not exist</h4>

<blockquote>
  <p>“The dictionary is based on the hypothesis — obviously an unproven one — that languages are made up of equivalent synonyms.” 
– Jorge Luis Borges</p>
</blockquote>

<h4 id="syntax--semantics-reads-syntax-implies-semantics">Syntax =&gt; Semantics (reads, syntax implies semantics)</h4>

<h4 id="formatting-important-vs-non-important">Formatting: Important vs Non-Important</h4>

<blockquote>
  <h5 id="titles">Titles</h5>
</blockquote>

<blockquote>
  <p class="block-tip"><img src="/assets/img/titles.png" alt="titles" /></p>
</blockquote>

<h5 id="on-highlighting">On Highlighting</h5>

<h6 id="bad">Bad</h6>

<p>The structure of <u> *scientiﬁc* </u>  inquiry often relies on a system of principles and rules known as a <strong>scaﬀold</strong>. This <strong>framework</strong> serves as a <em>blueprint</em>, guiding researchers towards the acquisition of <strong>knowledge</strong> in a <u>systematic</u> and <u>organized </u>manner.</p>

<h6 id="good">Good</h6>

<p>The process of <strong>scientiﬁc inquiry</strong> is often based on a set of principles and guidelines known as a framework. This framework provides a plan for researchers to gain <strong>knowledge</strong> in a <strong>systematic</strong> and <strong>organized</strong> way.</p>

<h5 id="fonts-why-they-matter">Fonts: why they matter</h5>

<p><img src="/assets/img/fonts.png" alt="fonts" /></p>

<h3 id="have-fun">Have Fun!</h3>

<p>Nobody wants to lisstens to someone who does not want to talk in the first place.</p>]]></content><author><name></name></author><category term="lecture" /><category term="master classes" /><category term="talks" /><category term="lectures" /><category term="notes" /><summary type="html"><![CDATA[An Introduction to Public Speaking]]></summary></entry><entry><title type="html">My Research Statement</title><link href="https://riccardotommasini.com/blog/2025/research-statement/" rel="alternate" type="text/html" title="My Research Statement" /><published>2025-12-17T17:39:00+00:00</published><updated>2025-12-17T17:39:00+00:00</updated><id>https://riccardotommasini.com/blog/2025/research-statement</id><content type="html" xml:base="https://riccardotommasini.com/blog/2025/research-statement/"><![CDATA[<p>As data volumes grow, the boundary between information and noise becomes harder to draw; my work addresses this by developing languages, systems, and formal foundations for making sense of unbounded information with correctness guarantees. Specifically, I work on continuous and declarative processing of rapidly evolving, theoretically unbounded data streams, combining system building with rigorous theoretical work.</p>

<p>I lead an independent research program supported by competitive funding. I am the principal investigator of the French national <a href="https://anr.fr/Project-ANR-22-CE23-0001">ANR JCJC project POLYFLOW</a>, which develops a unifying framework for continuous information integration across heterogeneous streaming data models.</p>

<p>I also lead a smaller funded project on the Knowledge Graph creation and usage, which studies knowledge evolution in volatile, multi-modal media and healthcare. In parallel, I collaborate with industry partners such as EsperTech, Neo4j, Confluent, and Bloomberg, both through applied research projects and by helping them clarify the formal underpinnings of their systems. These collaborations inform my research questions and ground my abstractions in real-world constraints.</p>

<h2 id="research-approach">Research Approach</h2>

<p>My work sits between systems and formal foundations. My engineering background pushes me to prototype ideas quickly and use running systems as tools for thinking. A typical project proceeds in three stages. First, we build an artifact, i.e., a prototype engine, library, but also a dataset or a benchmark, to test and refine intuition. Second, we observe its behavior and use these observations to sharpen the problem formulation. Third, we capture the refined problem in a precise mathematical model and develop formal properties such as soundness, completeness, or decidability. The artifacts, software, datasets, and knowledge graphs produced along the way are research contributions and provide a concrete entry point for students and collaborators.</p>

<p>I maintain a consistent commitment to open artifacts. Beyond software libraries, I share datasets, ontologies, and knowledge graphs, in line with the original vision of the semantic web. These artifacts make my work reproducible, reusable, and extensible by others, and they serve as a bridge between theoretical results and applied systems.</p>

<p>My research reflects this philosophy: I developed systems <a class="citation" href="#DBLP:conf/semweb/TommasiniVMB17">(Tommasini et al., 2017; Tommasini et al., 2021; Tommasini et al., 2016)</a>, and data catalogs and knowledge graphs <a class="citation" href="#DBLP:conf/esws/TommasiniIW23">(Tommasini et al., 2023; Tommasini et al., 2018; Tommasini et al., 2020)</a>. I participate in the development of demonstrations and tutorials, and I encourage my students to do the same. Indeed, they are great tools to consolidate work and disseminate results.</p>

<h2 id="stream-reasoning-and-semantic-web">Stream Reasoning and Semantic Web</h2>

<p>During my PhD, I joined the stream reasoning community in the Semantic Web context and worked on methods and tools for Streaming Linked Data. This area tackles the problem of publishing, discovering, and processing streaming data on the web using semantic technologies. Three milestones in this line are:</p>

<p>A research line on big data benchmarking, developed further with my first PhD student, Mohamed Ragab, at the University of Tartu. This work provides systematic ways to assess and compare streaming semantic web systems under realistic workloads.</p>

<p>A Springer monograph on stream reasoning, co-authored with my PhD supervisor Emanuele Della Valle and long-term collaborator Pieter Bonte, which consolidates the state of the art and articulates open problems. Moreover, RSP4J, a Java library for rapid prototyping of Streaming Linked Data systems that aggregates ideas and abstractions from stream reasoning research. It received the Best Resource Paper award at ESWC 2021 and evolved in the PolyFlow project as a tool for multi-modal stream processing.</p>

<h2 id="streaming-property-graphs-and-knowledge-evolution">Streaming Property Graphs and Knowledge Evolution</h2>

<p>After completing my PhD, my focus shifted towards the core of data management. A pivotal moment was my participation in the 2019 Dagstuhl seminar on Big Graph Systems, which triggered a transition away from RDF-centric streaming to property graphs.</p>

<p>In collaboration with Neo4j and Bloomberg, I contributed to designing an extension of Cypher for continuous querying over property graphs, bridging the gap between graph query languages and stream processing. This work addresses the expression and evaluation of continuous queries over evolving graph structures, with applications in monitoring, fraud detection, and real-time analytics.</p>

<p>I have continued to contribute to the semantic web community by constructing and maintaining knowledge graphs for complex, volatile information. The Internet Meme Knowledge Graph (IMKG) <a class="citation" href="#DBLP:conf/esws/TommasiniIW23">(Tommasini et al., 2023)</a> models internet memes as evolving entities with rich semantics. This work is part of a broader inquiry into knowledge evolution <a class="citation" href="#DBLP:journals/tgdk/PolleresPBDDDEF23">(Polleres et al., 2023)</a>: how meaning changes over time and how to represent such changes for querying.</p>

<h2 id="event-processing-streaming-systems-and-language-semantics">Event Processing, Streaming Systems, and Language Semantics</h2>

<p>My interest in systems has led me to study the internals of stream processing engines from a distributed systems perspective and to investigate the interaction between system guarantees and the semantics of continuous query languages. This has two main strands.</p>

<p>First, I analyse how assumptions about time, ordering, and delivery in distributed systems shape the behaviour of event processing engines. Second, I work on formalising the semantics of continuous query languages, drawing on tools from logic and programming languages. Recent work within the PolyFlow project reifies high-level language abstractions in the RSP4J library <a class="citation" href="#DBLP:conf/esws/0001BOV21">(Tommasini et al., 2021)</a>, enabling us to experiment with alternative semantics and evaluate their impact on system design and performance. These efforts are complemented by tutorials and demonstrations, where I present principled views on stream processing to both academic and practitioner audiences.</p>

<h2 id="current-and-future-research-programme">Current and Future Research Programme</h2>

<p>In the short term, my focus is the POLYFLOW project (terminating in 2027), which studies continuous information integration for complex data models such as graphs, nested objects, and unstructured data, under streaming conditions. The goal is to identify the fundamental building blocks of continuous query languages that operate over heterogeneous data, and to develop a theory that supports their evaluation under continuous semantics. Although theoretical in nature, this work is tightly linked to system design; the aim is not only to understand what is possible, but also to provide guidelines that implementers can apply when building next-generation stream processors.</p>

<p>Looking further ahead, I plan to deepen my work on knowledge evolution, taking complex media such as internet memes as a challenging form of knowledge. This line is inherently interdisciplinary, touching linguistics, psychology, and creativity studies. I lead a small, funded project that aims at maintaining the IMKG and extends it to capture meme semantics using LLMs and semantic frames. The “fruit fly” nature of memes, i.e., their rapid mutation and short lifespan, makes them an ideal probe for modeling the evolution of meaning under social and temporal pressure. The broader objective is to derive general principles for representing and querying evolving data in volatile domains.</p>

<p>A third thread concerns correctness guarantees for continuous computations, with a particular focus on stream segmentation. Streaming systems rely on window operators to cut infinite streams into finite segments. In practice, users must choose window parameters via trial and error, with a limited understanding of the correctness implications.</p>

<p>I plan to investigate a theory of segmentation that reduces the burden on users by deriving or adapting windowing strategies from high-level requirements and data properties. This idea builds on recent theoretical work that has shown that the “window validity” problem for forward-looking Datalog becomes decidable under mild assumptions on input data. My goal is to design an adaptive streaming system whose segmentation behavior is not hard for the user but adjusted automatically, while providing correctness guarantees.</p>]]></content><author><name></name></author><category term="essays" /><category term="continuous processing" /><category term="formal foundations" /><category term="information integration" /><category term="knowledge graphs" /><category term="stream reasoning" /><category term="semantic web" /><category term="property graphs" /><category term="event processing" /><category term="language semantics" /><category term="research approach" /><category term="open artifacts" /><category term="stream processing" /><category term="knowledge evolution" /><category term="data management" /><category term="distributed systems" /><summary type="html"><![CDATA[A short essay about my reseaerch phylosophy]]></summary></entry><entry><title type="html">My Teaching Statement</title><link href="https://riccardotommasini.com/blog/2025/teaching-statement/" rel="alternate" type="text/html" title="My Teaching Statement" /><published>2025-12-17T17:39:00+00:00</published><updated>2025-12-17T17:39:00+00:00</updated><id>https://riccardotommasini.com/blog/2025/teaching-statement</id><content type="html" xml:base="https://riccardotommasini.com/blog/2025/teaching-statement/"><![CDATA[<p>Since I completed my PhD, I have developed a teaching and supervision profile centred on data management, with a strong emphasis on fundamentals, deliberate practice, and structured student autonomy. My goal as an educator is to train students to be accurate judges of their own abilities and effective, independent problem solvers who can easily adapt to future technologies.</p>

<p>My teaching is guided by three principles adapted from John Wooden’s “Pyramid of Success”: industriousness, enthusiasm, and skill. Industriousness means that sustained work on fundamentals matters more than single high-stakes exams. I design courses as progressively demanding paths, with regular checkpoints and feedback, so that students build competence through deliberate practice rather than through last-minute effort. Enthusiasm is my responsibility as an instructor; I structure challenging but meaningful tasks and model the energy needed to navigate inevitable frustration, especially when students work outside their comfort zone. Skill, i.e., the ability to act quickly and correctly, drives my emphasis on mastering technical content (data models, query languages, algorithms, systems) and “soft” skills (writing, presenting).</p>

<p><img src="https://www.practicalkarate.com/wp-content/uploads/2025/09/John-Woodens-Pyramid-of-Success.jpg" alt="pyramid" /></p>

<p>I treat courses as a learning path rather than a sequence of evaluation points. In the early stages of a course, I emphasize core concepts and structured exercises; in later stages, I give students more autonomy to make choices, justify them, and, whenever possible, explore the subject outside the course boundaries. Assessment is designed to reward both mastery of fundamentals and the ability to generalize these fundamentals to new technologies and contexts. The aim is that, years after the course, students recall the principles behind the technologies they use, can articulate design decisions, and can learn new tools independently.</p>

<p>I have systematically drawn on teaching books, online resources, and formal training opportunities. I completed a leadership course at the University of Tartu, a Massive Open Online University Teaching during my first year at INSA Lyon, and a three-day seminar associated with my associate professor qualification. These experiences have informed how I think about leadership in the classroom, course design, and supervision. I have also learned by co-teaching with experienced colleagues such as Emanuele Della Valle, Matteo Pradella, Marco Colombetti, and Sherif Sakr, from whom I have gained practical strategies for organizing a class, structuring assignments, and maintaining a positive, respectful environment.</p>

<h2 id="teaching-experience-and-breadth">Teaching experience and breadth</h2>

<p>I began teaching in 2012 as a software engineering lab tutor at Politecnico di Milano. During my PhD (2015–2019) I was a teaching assistant and occasional lecturer on courses in Big Data, Data Management, Programming Languages, and Knowledge Engineering, typically with cohorts of 60–150 students. As an Assistant Professor at the University of Tartu (2019–2021 full time; 2022–present part-time) I designed a new course, “Foundations of Data Engineering”, which has run there every year since 2019. Since 2021 I have also taught this course at INSA Lyon as part of the Computer Science curriculum and the international MINDS master programme.</p>

<p>Since 2021, I have been teaching courses in data management, including relational, graph and NoSQL databases, data warehousing, data engineering, and streaming data engineering. At Politecnico di Milano, I have also taught modules on the Semantic Web Technologies (RDF, OWL, SHACL) and knowledge representation. In addition, I am comfortable teaching all basic computer science courses, e.g., for undergraduate teaching in databases, algorithms and data structures, programming, and systems, and for graduate-level courses in data engineering, data warehousing, stream processing, and semantic web technologies.</p>

<p>Incorporating “soft skills” into technical courses is a key aspect of my teaching approach. Since 2022, I have been instructing a public speaking module at Claude Bernard Lyon 1 University. This year, I introduced a research methods course on data system research at INSA Lyon. Additionally, I frequently integrate modules on technical writing into project-based courses, requiring students to compose concise technical reports.</p>

<p>I over the years, I obtained consistently positive student evaluations and detailed qualitative feedback. The course “Foundations of Data Engineering” has seen marked improvements in student evaluations over the years. Recent comments highlight the course’s impact on students’ motivation and career plans; for example, one student noted that it “gave me motivation back for studies in general” and “will enhance my professional career after university”.</p>

<p>On my website, I maintain an <a href="https://docs.google.com/spreadsheets/d/1Q8i6A7cmtTni04U774iV0jCavc1kaMbo_N45bT0RBts/edit?resourcekey=&amp;gid=1818451064#gid=1818451064">anonymous feedback form</a>. This continuous feedback has been valuable in refining aspects such as my pacing, clarity of explanations, and the organization of practical sessions. For example, based on student suggestions, I introduced a dedicated session on Docker and tooling.</p>

<h2 id="mentoring-and-supervision">Mentoring and supervision</h2>

<p>Mentoring is one of the most rewarding aspects of academic life. I see it as a privilege to influence students’ careers and intellectual development, and I approach it with the same values that shape my teaching, with the addition of Wooden’s “competitive greatness”: calibrating difficulty so that students are most engaged when their best effort is required.</p>

<p>Since 2020, I have supervised three PhD students who have successfully defended their theses. Mohamed Ragab (2021, I was the main supervisor, co-supervised with Ahmed Awad) is now a lecturer at the University of Birmingham; Kristo Raun (2022, co-supervised with Ahmed Awad) is a lecturer at the University of Tartu; and Samuele Langhi (2024, I was the main supervisor, co-supervised with Prof. Angela Bonifati) is a software engineer at Ververica, a leading European company in stream processing. I currently supervise two PhD students, Mauro Fama (I am the main supervisor, co-supervised with Prof. Angela Bonifati) and Gianluca Rossi (co-supervised with Prof. Angela Bonifati), and act as co-supervisor for additional students at other universities. Moreover, I maintain a mentoring relationship with some students outside France: Alessandro Ferri (TU Darmstadt, supervised by Prof. Carsten Binning) and Mouna Ammar (Leipzig University, supervised by Prof. Erhard Rahm). For both, I am involved in their PhD program as a collaborator.</p>

<p>When I supervise a master’s or PhD project, I invest substantial time in problem formulation. I use a <a href="https://en.wikipedia.org/wiki/Level_of_analysis">“Macro–Meso–Micro” breakdown</a>, where the student and I jointly articulate the high-level motivation and research question (Macro), the main design choices and trade-offs (Meso), and the concrete technical questions, experiments, and evaluation criteria (Micro). This structure helps students move between abstraction levels, avoid both vagueness and premature technical detail, and converge on research plans that are both ambitious and feasible.
Since my PhD, I supervised 18 master and bachelor students. The work of several of them resulted in international-level publications such as ESWC, ISWC, and VLDB; three received additional academic recognition for research results achieved during their master’s thesis.</p>

<h3 id="communication-collegiality-and-inclusive-practice">Communication, collegiality, and inclusive practice</h3>

<p>I place particular emphasis on clear, respectful communication and on contributing to a supportive and inclusive working environment. In my current role as Associate Professor, I supervise a diverse group of MSc and PhD students through structured one-to-one meetings and written follow-up notes that clarify expectations, decisions, and next steps. In the classroom, I combine enthusiasm with transparency about workload and assessment, and I aim to build an empathetic relationship with students while maintaining clear boundaries and control of the group.</p>

<p>I am attentive to interpersonal dynamics in group work and have, on several occasions, acted as a mediator when conflicts emerged. In one case, a group of students in Estonia attempted to exclude a colleague from their project. Together with my colleague Kristo Raun, I facilitated a resolution by clarifying expectations, revisiting assessment criteria, and ensuring that all students could contribute meaningfully. The group ultimately completed the course successfully. I view such interventions as integral to maintaining a fair, respectful environment rather than as ancillary to teaching.</p>

<h2 id="future-teaching">Future teaching</h2>

<p>My teaching and research fall  within the Data Management and Data Systems areas, with strong connections to AI and knowledge representation through stream and graph processing, logic programming, and semantic web technologies. At undergraduate level I can contribute immediately to basic computer science courses, e.g. databases, data structures and algorithms, programming, and software engineering, as well as more specialised courses in data warehousing and data engineering. At graduate level I can offer advanced modules on data systems, streaming and graph data processing, and semantic web technologies, building a coherent path from foundational data modelling to modern large-scale data systems.</p>]]></content><author><name></name></author><category term="essays" /><category term="data management" /><category term="databases" /><category term="graphs" /><category term="algorithms" /><category term="soft skills" /><category term="essays" /><summary type="html"><![CDATA[A short essay about my teaching phylosophy]]></summary></entry><entry><title type="html">Overview of Snowflake Dedicated Services</title><link href="https://riccardotommasini.com/blog/2025/maxime-marlin-snowflake/" rel="alternate" type="text/html" title="Overview of Snowflake Dedicated Services" /><published>2025-10-20T15:12:00+00:00</published><updated>2025-10-20T15:12:00+00:00</updated><id>https://riccardotommasini.com/blog/2025/maxime-marlin-snowflake</id><content type="html" xml:base="https://riccardotommasini.com/blog/2025/maxime-marlin-snowflake/"><![CDATA[<h1 id="overview-of-snowflake-dedicated-services">Overview of Snowflake Dedicated Services</h1>

<p>Monday 2025/10/20 at 10am there will be a talk by Maxime Merlin (Snowflake, Berlin)  in the context of the <a href="/courses/dataeng-insa-ot/">Foundation of Data Engineering</a> Course at INSA</p>

<h2 id="abstract">Abstract</h2>

<p>Maxime will present Snowflake, the Snowflake Berlin office and what we do there before diving deeper into one of the frameworks that my team owns in Berlin which is used by every single query run at Snowflake.</p>

<h2 id="bio">Bio</h2>

<p>I studied at ESILV in Paris and finished my studies in 2024. I’ve been with Snowflake ever since, first as an intern, then as a full time employee since early 2025.  I’m working on the Dedicated Services/Background Services team, which owns foundational frameworks which power all of Snowflake.</p>]]></content><author><name></name></author><category term="data" /><category term="engineering," /><category term="ot7" /><summary type="html"><![CDATA[External Talk]]></summary></entry><entry><title type="html">Data engineering beyond just managing data</title><link href="https://riccardotommasini.com/blog/2024/andrea-gioia/" rel="alternate" type="text/html" title="Data engineering beyond just managing data" /><published>2024-11-28T15:12:00+00:00</published><updated>2024-11-28T15:12:00+00:00</updated><id>https://riccardotommasini.com/blog/2024/andrea-gioia</id><content type="html" xml:base="https://riccardotommasini.com/blog/2024/andrea-gioia/"><![CDATA[<h1 id="data-engineering-beyond-just-managing-data">Data engineering beyond just managing data</h1>

<p>Monday 02/12 at 10am there will be a talk by Andrea Gioia (CTO Quantica, Milan)  in the context of the <a href="/courses/dataeng-insa-ot/">Foundation of Data Engineering</a> Course at INSA</p>

<h2 id="abstract">Abstract</h2>

<p>Data, on its own, is a liability. It only becomes a valuable asset when it drives visible outcomes that align with business strategy. To achieve this, it must be enriched with the right contextual information to make it actionable. This talk explores building an information architecture that enables data product interoperability, not just syntactically, but semantically. We’ll delve into transforming data into contextualized information using metadata, and then into knowledge by connecting it to an ontology, ultimately creating a knowledge graph. Finally, we’ll discuss the benefits of this data-centric architecture for diverse AI and analysis use cases.</p>

<h2 id="bio">Bio</h2>

<p>Andrea Gioia is a Partner and CTO at Quantyca, a consulting company specializing in data management. He is also a co-founder of blindata.io, a SaaS platform focused on data governance and compliance. With over two decades of experience in the field, Andrea has led cross-functional teams in the successful execution of complex data projects across diverse market sectors, ranging from banking and utilities to retail and industry. In his current role as CTO at Quantyca, Andrea primarily focuses on advisory, helping clients define and execute their data strategy with a strong emphasis on organizational and change management issues. Actively involved in the data community, Andrea is a regular speaker, writer, and author of ‘Managing Data as a Product,’ book. Currently, he is the main organizer of the Data Engineering Italian Meetup and leads the Open Data Mesh Initiative. Within this initiative, Andrea has published the data product descriptor open specification and is guiding the development of the open-source ODM Platform to support the automation of the data product lifecycle. Andrea is an active member of DAMA and, since 2023, has been part of the scientific committee of the DAMA Italian Chapter.</p>

<h2 id="info">Info</h2>

<p>The talk will be on zoom:</p>

<p><strong>Zoom Link</strong>: https://insa-lyon-fr.zoom.us/j/94849567368?pwd=mwwLQSty0IUIfDzIxYLsjsd5cra6se.1</p>]]></content><author><name></name></author><category term="data" /><category term="engineering," /><category term="ot7" /><summary type="html"><![CDATA[External Talk]]></summary></entry><entry><title type="html">Wide AI A (humble) Data Management Perspective</title><link href="https://riccardotommasini.com/blog/2022/aiisc/" rel="alternate" type="text/html" title="Wide AI A (humble) Data Management Perspective" /><published>2022-06-06T15:12:00+00:00</published><updated>2022-06-06T15:12:00+00:00</updated><id>https://riccardotommasini.com/blog/2022/aiisc</id><content type="html" xml:base="https://riccardotommasini.com/blog/2022/aiisc/"><![CDATA[<h2 id="abstract">Abstract</h2>

<p>The discussion around General Artificial Intelligence is now mainstream. The recent achievements of inductive reasoning research, e.g., GPT-3 and Dall-e, have raised several questions in the academic community that span from ethics to sustainability, passing by the remaining problem of <em>interpretability.</em> Arguably, the issue lies in the fragmentation of different areas of AI, which trends like Neuro-Symbolic Reasoning and Knowledge-Infused Learning are trying to fix.</p>

<p>Stressing on the role of context, the research initiatives above are rediscovering the value of interconnected data. In these regards, the data management community is partaking the debate, supporting the development of data systems and technologies like knowledge representation, automated reasoning, and (recently) knowledge graphs.
In this talk, I offer a humble data management perspective, which builds on the three pillars of data management: data (intuitively) to collect and model, questions (aka queries) to express and answer, and systems that allow storage of the former and answer the latter. I will illustrate my analysis throughout the Meme Analytics Project, an ongoing initiative that incarnates well the hardness of human-level intelligence.</p>

<h2 id="video">Video</h2>

<iframe width="560" height="315" src="https://www.youtube.com/embed/sGvNvI7aqvc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>]]></content><author><name></name></author><category term="videos" /><category term="links" /><category term="talks" /><summary type="html"><![CDATA[my talk at the AI Institute of University of South Carolina]]></summary></entry><entry><title type="html">Recommended Resources for PhD Students</title><link href="https://riccardotommasini.com/blog/2022/phdreadings/" rel="alternate" type="text/html" title="Recommended Resources for PhD Students" /><published>2022-06-06T15:12:00+00:00</published><updated>2022-06-06T15:12:00+00:00</updated><id>https://riccardotommasini.com/blog/2022/phdreadings</id><content type="html" xml:base="https://riccardotommasini.com/blog/2022/phdreadings/"><![CDATA[<p>This page includes some resources that I found during my PhD and right after. These resources helped me find my way through various obstacles that the PhD, as a journey, presents.</p>

<h2 id="books">Books</h2>

<ul>
  <li><a href="https://www.goodreads.com/book/show/13525945-so-good-they-can-t-ignore-you">So Good they Can’t Ignore You</a> by <a href="https://www.calnewport.com/">Cal Newport</a></li>
  <li><a href="https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions">The Structure of Scientific Revolutions </a> by <a href="https://en.wikipedia.org/wiki/Thomas_Kuhn">Thomas S. Kuhn</a></li>
  <li><a href="https://en.wikipedia.org/wiki/The_Elements_of_Style">The Elements of Style</a> by <a href="https://en.wikipedia.org/wiki/William_Strunk_Jr">William Strunk Jr</a>.</li>
  <li><a href="https://davidepstein.com/the-range/">Range</a> by David Epstein</li>
  <li><a href="https://en.wikipedia.org/wiki/Cal_Newport">Deep Work</a> by <a href="https://www.calnewport.com/">Cal Newport</a></li>
  <li><a href="https://en.wikipedia.org/wiki/The_Biggest_Bluff">The Biggest Bluff</a> by <a href="https://en.wikipedia.org/wiki/Maria_Konnikova">Maria Konnikova</a></li>
</ul>

<h2 id="articles">Articles</h2>

<ul>
  <li><a href="https://www.cs.virginia.edu/~robins/YouAndYourResearch.html">Richard Hamming’s You and Your Research</a></li>
  <li><a href="https://www.christolute.com/blog/2020/5/19/you-and-your-research?rq=research">Christo Lute’s You and Your Research</a></li>
  <li><a href="http://karpathy.github.io/2016/09/07/phd/">A survival guide to PhD by Andrej Karpathy</a></li>
  <li><a href="https://michaelnielsen.org/blog/principles-of-effective-research/">Principles of Effective Research By Michael A. Nielsen</a></li>
  <li><a href="https://www.youtube.com/watch?v=Rn1w4MRHIhc">How to have a bad career in acadademia</a>
    <h2 id="movies">Movies</h2>
  </li>
  <li><a href="https://en.wikipedia.org/wiki/Finding_Forrester">Finding Forrester</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Molly%27s_Game">Molly’s Game</a></li>
  <li><a href="https://en.wikipedia.org/wiki/The_Matrix">The Matrix</a></li>
</ul>

<h2 id="music">Music</h2>

<h3 id="for-thinking">For Thinking</h3>

<iframe style="border-radius:12px" src="https://open.spotify.com/embed/playlist/5nQ3jjc2eAOISVqc4u9sH2?utm_source=generator" width="100%" height="380" frameborder="0" allowfullscreen="" allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture"></iframe>

<h3 id="for-coding">For Coding</h3>

<iframe style="border-radius:12px" src="https://open.spotify.com/embed/playlist/7sDEwuBR6yfPGJ60doSrTx?utm_source=generator" width="100%" height="380" frameborder="0" allowfullscreen="" allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture"></iframe>

<h3 id="for-reading">For Reading</h3>

<iframe style="border-radius:12px" src="https://open.spotify.com/embed/playlist/37i9dQZF1DWY3X53lmPYk9?utm_source=generator" width="100%" height="380" frameborder="0" allowfullscreen="" allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture"></iframe>]]></content><author><name></name></author><category term="books" /><category term="movies" /><category term="music" /><category term="lists" /><category term="phd" /><summary type="html"><![CDATA[a comprehnsive list of books I enjoy]]></summary></entry><entry><title type="html">Streaming All the Things</title><link href="https://riccardotommasini.com/blog/2021/playtech/" rel="alternate" type="text/html" title="Streaming All the Things" /><published>2021-02-17T17:39:00+00:00</published><updated>2021-02-17T17:39:00+00:00</updated><id>https://riccardotommasini.com/blog/2021/playtech</id><content type="html" xml:base="https://riccardotommasini.com/blog/2021/playtech/"><![CDATA[<h2 id="abstract">Abstract</h2>

<p>We organise PlayTech Talks, a series of knowledge-sharing talks on interesting topics, technological or otherwise, at Playtech for some time already. Now we have decided to broadcast already the fourth PlayTech Talk live so that everyone can join in!
Playtech Talks: Streaming All the Things
will be held by Riccardo Tommasini (PhD), Assistant Professor of Data Management at the University of Tartu.
In recent years, the data landscape has changed. Big data are no longer a vision, and data systems evolve to support a new generation of data-intensive applications. Stream processing is playing a central role in this game where real-time decision making is a must. In this talk, Riccardo will walk you through 10 years of industrial and academic research in the area. Moreover, he will focus on
*state-of-the-art data streaming platforms, i.e. Apache Kafka, Flink, and Spark
*Stream Reasoning, i.e, when Stream Processing meets deductive and inductive Artificial Intelligence.</p>]]></content><author><name></name></author><category term="videos" /><category term="links" /><category term="talks" /><summary type="html"><![CDATA[my talk at Playetch about stream processing.]]></summary></entry><entry><title type="html">PhD, A guide for Enthusiasts</title><link href="https://riccardotommasini.com/blog/2020/phdguide/" rel="alternate" type="text/html" title="PhD, A guide for Enthusiasts" /><published>2020-12-06T15:12:00+00:00</published><updated>2020-12-06T15:12:00+00:00</updated><id>https://riccardotommasini.com/blog/2020/phdguide</id><content type="html" xml:base="https://riccardotommasini.com/blog/2020/phdguide/"><![CDATA[<p>This is the recording of a talk I gave in December 2020, during the  PhD Introduction Evening at the University of Tartu.
The slides of the presentation are also available <a href="/assets/files/slides/whyphd.pdf">here</a>. While below your can find the list of named books, references, and resources. Some of which are also linked in <a href="/blog/2022/phdreadings/">this blog post</a>.</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/WJsOSQ4rE4A?start=2926" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>

<p><br /></p>

<h3 id="references">References</h3>

<ul>
  <li><a href="https://en.wikipedia.org/wiki/Sherry_Turkle#Alone_Together">Alone Together</a></li>
  <li><a href="https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow">Thinking Fast and Slow</a></li>
  <li><a href="https://en.wikipedia.org/wiki/The_Biggest_Bluff">The Biggest Bluff</a> by <a href="https://en.wikipedia.org/wiki/Maria_Konnikova">Maria Konnikova</a></li>
  <li><a href="https://davidepstein.com/the-range/">Range</a> by David Epstein</li>
  <li><a href="https://en.wikipedia.org/wiki/Cal_Newport">Deep Work</a> by <a href="https://www.calnewport.com/">Cal Newport</a></li>
  <li><a href="https://www.principles.com/">Principles Life and Work</a></li>
  <li><a href="https://nyupress.org/9781479861392/the-public-professor/">The Public Professor</a></li>
  <li><a href="https://en.wikipedia.org/wiki/The_Elements_of_Style">The Elements of Style</a> by <a href="https://en.wikipedia.org/wiki/William_Strunk_Jr">William Strunk Jr</a>.</li>
  <li><a href="https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions">The Structure of Scientific Revolutions </a> by <a href="https://en.wikipedia.org/wiki/Thomas_Kuhn">Thomas S. Kuhn</a></li>
  <li><a href="https://www.goodreads.com/book/show/13525945-so-good-they-can-t-ignore-you">So Good they Can’t Ignore You</a> by <a href="https://www.calnewport.com/">Cal Newport</a></li>
  <li><a href="https://www.youtube.com/watch?v=Rn1w4MRHIhc">How to have a bad career in acadademia</a></li>
  <li><a href="https://en.wikipedia.org/wiki/The_Matrix">The Matrix</a></li>
  <li><a href="https://www.poetryfoundation.org/poems/44272/the-road-not-taken">The Road not Taken, Robert Frost</a></li>
  <li><a href="https://www.nature.com/articles/d41586-019-03459-7">PhDs: the tortuous truth</a></li>
</ul>]]></content><author><name></name></author><category term="videos" /><category term="talks" /><summary type="html"><![CDATA[my at the PhD Introduction Evening at the University of Tartu 2020]]></summary></entry></feed>