San Jose State University : Site Name

Navigation

Main Content

Cats like to be read to.

Ronald Mak

Department of Computer Science
Department of Applied Data Science
Fall Semester 2023

Office hours: W 6:00 – 7:00 PM Clark Hall CL 325
Office location: Clark Hall CL 325 (mostly working from home)
E-mail: ron.mak@sjsu.edu

DATA 225 Database Systems for Analytics (section 21)


M 6:00 - 8:45 PM room BBC 102

Introduction to SQL Queries (website)

Assignments

# Assigned Due Assignment
1 Aug 21 Aug 28 Software installations

Database: school.sql
Python code: Assignment1.py
Jupyter notebook: Assignment1.ipynb
2 Aug 28 Sep 11 Create and load database tables with CSV data

How to access the school server with VPN
3 Sep 11 Sep 18 a. Data modeling (team)

b. SQL practice #1 (individual)
Database import file: ZAGIsales.sql
Solutions: Asssignment3b-solutions.ipynb
4 Sep 18 Sep 25 a. Constraints and normalization (team)

b. Normalization practice (individual)
Solutions: Asssignment4b-solutions.pdf
5 Sep 25 Oct 2 SQL practice #2 (individual)
Solutions: Assignment5-solutions.ipynb
6 Oct 2 Oct 9 SQL practice #3 (team)
7a Oct 9 Oct 23 a. GUI programming (team)

b. Stored procedures (individual)
Database import file: CarDealership.sql
Solutions: CarDealershipProcsInstall.ipynb    CarDealershipProcsCall.ipynb
8 Oct 23 Oct 30 Missing values (team)
9 Nov 6 Nov 13 Dimensional modeling and star schema
10 Nov 13 Nov 20 OLAP and Rollup operations
11 Nov 20 Nov 27 MongoDB

Lectures

Week Date Content
1 Jan 30 Slides: Course objectives; project teams; team semester project; postmortem report; grades; what is data?; structured vs. unstructured data; what is information?; what is a database?; types of database management systems; operational database; analytical database; database application architecture; table structure; primary key; foreign key; introduction to SQL; SQL syntax; table joins; how queries work
2 Aug 28 Slides: Connect to a database; fetch records; query results in a dataframe; dynamically generated queries; create table; insert records; ETL of CSV data; clear a table; drop databases and tables; export/import a database; create a user; mysql command; database requirements; conceptual, logical, and physical data models; entity-relationship (ER) diagrams; one-to-one, one-to-many, and many-to-many relationships

How to access the school server with VPN

Jupyter notebooks: 01-School.ipynb    02-FetchAll.ipynb    03-FetchOne.ipynb    04-FetchMany.ipynb    05-MabelFlynnStudents-df.ipynb    06-StudentsOf.ipynb    07-CreateTable.ipynb    08-Insert.ipynb    09-Insert-tuple.ipynb    10-Insert-dictionary.ipynb    11-CSV-read.ipynb    12-CSV-ETL.ipynb
Utility files: DATA225utils.py    employee.py
Configuration files: school.ini    school-remote.ini    personnel.ini    titanic.ini
CSV file: TitanicSurvival.csv
3 Sep 11 Slides: ER diagram (conceptual model); entities, attributes, and relationships; ERDPlus; mapping an ER diagram to a relational schema (logical model); entity integrity constraint; referential integrity constraint; mapping a relational schema to a database (physical model); EER diagram
4 Sep 18 Slides: Table creation with constraints; named constraints; table loading with constraints; ALTER TABLE command; DROP TABLE with constraints; functional dependencies; update anomalies; normalization; normal forms 1NF, 2NF, and 3NF
5 Sep 25 Checkpoint Exam #1
Solutions: Checkpoint-1-solutions.pptx

Slides: SELECT; aggregate functions; GROUP BY; HAVING; nested queries; IS NULL; EXISTS; NOT; INSERT INTO SELECT

Jupyter notebooks: Nested-SELECT-Sales.ipynb    INSERT-INTO-SELECT-Titanic.ipynb    SELECT-IN-School.ipynb    SELECT-EXISTS-Buiilding.ipynb
Databases: ZAGIsales.sql   building.sql
6 Oct 2 Slides: ALTER TABLE; cascading deletes and updates; windowing functions; MYSQL text, numeric, date and time functions; relational algebra; joins; join aliases; self, inner, left outer, right outer, and full outer joins; views; indexing

Jupyter notebooks: Windowing.ipynb    StringConcat.ipynb    Joins.ipynb    Views.ipynb    Indexes.ipynb
Example semester project: Overivew.pptx    TheDeciders_Report_DATA225.pdf    Demo.mov
7 Oct 9 Slides: Nested SELECTS and views; database design example; stored procedures; Python GUI-based database application

Notebooks: NestedSelect-View.ipynb    CarDealership.ipynb    StudentProcs.sql    StudentProcsInstall.ipynb    StudentProcsCall.ipynb    TitanicProcs.sql    TitanicProcsInstall.ipynb    TitanicProcsCall.ipynb    MonthProcs.sql    MonthProcsInstall.ipynb    MonthProcsCall.ipynb
GUI-based database application: SchoolApp.py    SchoolApp.ipynb    MainSchoolWindow.py    StudentClassesDialog.py    TeacherStudentsDialog.py    TeacherProcsInstall.ipynb    main_school_window.ui    student_classes_dialog.ui    teacher_students_dialog.ui    DemoGUI.zip

Remote database server: SchoolServer.pptx
8 Oct 16 Slides: Triggers and events; object-relational mapping (ORM); SQLAlchemy; transaction management; ACID properties; concurrency problems; transaction conflicts; locking; deadlocks; two-phase locking; recovery; checkpoints; data loss and recovery; reliability and RAID

Jupyter notebooks: Triggers.ipynb    Events.ipynb    ORM-school.ipynb    ORM-customer.ipynb    Transactions-commands.ipynb    Transactions-methods.ipynb
9 Oct 23 Slides: Generating bar charts; data wrangling; replace missing values with averages and by regression analysis; outliers; team projects

Jupyter notebooks: TitanicBarchart-Python.ipynb    TitanicBarChart-SQL.ipynb    TitanicReplaceAges-Python.ipynb    TitanicReplaceAges-SQL.ipynb    StudentBooks-Python.ipynb    StudentBooks-SQL.ipynb    HomePrices-SQL.ipynb    ThreeIndependents-SQL.ipynb
10 Oct 30 Checkpoint Exam #2
Solution: Checkpoint2-solutions.ipynb    TitanicWindow.py    titanic_window.ui

Slides: Operational vs. analytical databases; application-oriented vs. subject-oriented; data warehouses; data warehouse components; business intelligence (BI); dimensional modeling; dimension tables; fact tables; star schema
11 Nov 6 Slides: Dimensional modeling and star schemas; dimension and fact tables; ETL to create a dimensional model; queries against a star schema, transaction ID and times; multiple fact tables; detailed vs. aggregated fact tables; surrogate keys and slowly changing dimensions

Jupyter notebooks: zagi_warehouse.ipynb    zagi_display_tables.ipynb    zagi_queries.ipynb    zagi_aggregated.ipynb
Data warehouse: zagi_warehouse.sql
Data warehouse sources: zagi_sales.sql    zagi_facilities.sql    zagi_customers.sql
12 Nov 13 Slides: ETL diagrams; factless fact table; GROUP BY ... WITH ROLLUP; museum field trip; time-series analysis; SQL support for moving averages; SQL support for exponential smoothing; executive dashboard

Jupyter notebooks: Students-star.ipynb    Students-rollup.ipynb    MovingAverage.ipynb    ExponentialSmooting.ipynb
Databases: students.sql    stock_prices.sql
13 Nov 20 Slides: Project survey questions; NoSQL databases; structured vs. semi-structured data; document model; CAP theorem; MongoDB database; pymongo; collections and documents; queries with joins and matches

Jupyter notebooks (must be executed in this order): 13-1-SchoolDatabase.ipynb    13-2-ContactCollection.ipynb    13-3-StudentCollection.ipynb    13-4-TeacherCollection.ipynb    13-5-ClassCollection.ipynb    13-6-TakesCollection.ipynb    13-7-SchoolQueries.ipynb
14 Nov 27 Checkpoint Exam #3
Solutions: Checkpoint3-solutions.ipynb

Slides: Query optimization; public key cryptography; data virtualization; distributed databases; cloud computing

Bonus notebooks: Stress.ipynb    ChatGPT.ipynb

Goals

This class emphasizes developing operational and analytical data management applications in Python that use back-end databases and data warehouses. Project teams will develop interactive applications where users enter data on the command line and GUI-based (graphical user interface) applications where users enter data via dialogs and forms.

Course Learning Outcomes (CLO)

Prerequisites

Classified standing or instructor consent.

Recommended Books

Database Systems:
Introduction to Databases and Data Warehouses, Edition 2.0

Nenad Jukic, Susan Vrbsky, Svetlozar Nestorov, Abhishek Sharma
Prospect Press, 2021

eTextbook: 978-1-943153-67-1
available from Redshelf.com and VitalSource.com

Paperback: 978-1-943153-68-8
available from Redshelf.com

Many examples in class will come from this excellent book.
We will also use the book’s database modeling tools at https://erdplus.com
Principles of Database Management:
The Practical Guide to Storing, Managing and Analyzing Big and Small Data

Wilfried Lemahieu, Seppe vanden Broucke, Bart Baesens
Cambridge University Press, 20185
978-1107186125

A more advanced text.
Well-written with many examples and colorful diagrams.
The Data Warehouse Toolkit:
The Definitive Guide to Dimensional Modeling, 3rd Edition

Ralph Kimball and Margy
Wiley, 2013
978-1118530801

Ralph Kimball is a pioneer of data warehousing and dimensional data modeling.
I have worked closely with Dr. Kimball in the past.

Some Useful Websites