Database and Data Mining

IST-4-DBM1 - ECTS 3.0

Abstract Part 1: Databases

This course introduces the most important concepts concerning the design and the use of databases. It covers the main issues in data modeling when considering the relational data model (structure, languages like relational algebra and relational calculus) and the key features of attribute-oriented models with respect to constructor-oriented models (like, for instance, Extended Entity-Relationship models). The fundamental concepts like the constraints and the normal forms will be explained and their impact on the quality of designed databases will be discussed. Finally, practical aspects of relational database design and use will be discussed. It includes a presentation of the SQL query language but also a short introduction to data warehousing and On Line Analytical Processing queries (OLAP). To be concrete, the students will have to understand the design of a real relational database and to practice database querying with a professional DataBase Management System (DBMS).

Note: Slides are taken and partially modified from https://www.db-book.com/

Schedule

Date Day Type Content Material Comment
22/09 Thursday CM Introduction to DB Slides  
29/09 Thursday CM Relational Languages + ER Slides Slides  
30/09 Friday TP Entity-Relationship Model (TP) Slides  
06/10 Thursday TP SQL + Practice Slides  
07/10 Friday TP Docker + Project Setup Slides  
14/10 Friday TP Relational Database Design (normal forms) Slides  
20/10 Thursday CM Transactions Slides  
21/10 Friday CM Test Simulation Exercises  
27/10 Thursday EX Test Results  
14/11 Monday EX Project Presentations (con’d)    

The course exam consists into

  • 1h test including 10 questions and 3 exercises (ER Modelling, Relational Algebra, SQL)
    • [[../../DBDM/Exam Group A]]
    • [[../../DBDM/Exam Group B 2022]]
  • a small project to perform in pairs . Each project should have a presentation and 2-5 pages project report (mandatory) and a presentation.

Final grade is the weighted average between exam (0.4) and project (0.6)

Project Submission and Setup

Examples

Project Checklist

  • Identify the domain of choice, e.g., internet memes
  • formulate some questions in natural language that you would like to answer
    • submit the questions for approval
  • design the database and perform data entry in Postgres
  • ER Diagram and comments about the design decision (using appropriate terminology)
  • Implement the questions in SQL, analyse the query plan in relational algebra, experiment alternative implemetations
  • Basic queries should cover standard RA operators, Joins, and aggregation functions
  • Advanced features nice to show example of
    • Indexes
    • Transactions

NB: Full-stack application is not required, but encouraged.

Project Grading

  • 1-15
    • Natural Language Queries
    • ER Diagram Design
    • SQL Implementation
      • DDL + Data Entry
      • Basic SQL Usage (no joins) + Relational Algebra (should also go in the presentation)
    • Intermediate SQL usages
      • Joins and Aggregations
      • Integrity Constratints (check or queries) + Ad-hoc data to test
      • Indexes and Transaction
  • 15-20
    • 10 minutes Presentation ( you both have to speak) + 5 minutes Questions (Individual)
      • Use of appropriate terminology (also when speaking)
      • Clarity of the explanation
      • Appropriateness of the answer
    • What to put in the report
      • a general presentation
      • design decisions
      • ER + Implementation
      • problems that you solved
    • what to put in the presentation (4-8 slides):
      • a general presentation
      • ER + Comments of entity
      • Example of SQL + RA
      • Demo is welcomed
  • yes, you can decide not to present (max grade is 15)
  • up to 3 extra points can be assigned to reward intercation, support to other projects, initiative in project execution ()

Proposed Dataset

Suggested Material

Pre Lab Installations (Courtesy of Mohamed Ragab)

Lab Repository

  1. Install PostgreSQL
    1. On Docker ``
    2. on Windows
    3. on linux
  2. Install the psycopg2 Python Package to connecto to PostgreSQL
    • use th command
      1
      
      pip install psycopg2
      
      in your command line.
  3. Accessing PostgreSQL from Command Line:
    • Add the PostgreSQL installation \/home\ and \/bin\ directories to the enviroment variables.
    • use the command
      1
      
      psql -U postgres postgres
      
      to connect to the the by-default created database \postgres\ with the user \postgres.
    • Enter your set \postgres\ password (i.e password of the default password that you have been asked at the time of installtion).
  4. Connect To a PostgreSQL Database Server
    • show you how to connect to the PostgreSQL using psql tool and pgAdmin 4 GUI tool.

Part 2: Data Mining

Abstract

This course introduces the basics notions of datamining. It surveys techniques for data preprocessing, i.e., cleansing, integration, data reduction. Then, it jumps into foundational aspects of data mining like: Frequent Patterns mining, classification, clustering, outliers detection. The course examination consists of both a project and an exam with open and multiple choice questions and exercises.

Schedule

Date Lecture Chapters Slides
15/11/2022 Introduction to Data and Data Mining Chp 1, 2 Slides 1 Slides 2 skip part on data viz
17/11/2022 Data Pre Processing Chp 3 Slides
23/11/2022 Frequent Patterns Chp 6 Slides
24/11/2022 Classification Chp 8 Slides
28/11/2022 Clustering Chp 10 Slides
01/12/2022 Outliers Chp 12 Slides
13/12/2022 Exam   Example
15/12/2022 Project Presentation + Greetings    
       

Project Checklist

##

-REGISTER YOUR GROUP

  • Identify the domain of choice, e.g., dataset presented above
    • YOU DO NOT HAVE TO PICK ONLY ONE DATASET
    • BUT BE CAREFUL BECAUSE YOU MIGHT HAVE TO CLEAN MORE THAN ONE
    • YOU CAN SUBMIT THE TAKS FOR APPROVAL
  • you can pick out of problems (preprocessing + 2)
    • cleansing/preprocessing (mandatory)
    • frequent pattern mining
    • classification
    • clustering
  • Explain why why you’re happy with the results.
    • evaluate the associations rules
    • evaluate the classification
    • evaluate the clustering proceedure
  • present a notebook with comments and discussion of your result (github)

Suggested Material