San Jose State University : Site Name

Navigation

Main Content

Cats like to be read to.

Ronald Mak

Department of Computer Science
Department of Applied Data Science
Engineering Extended Studies
Spring Semester 2025

Office hours: Tu Th 4:30 - 5:30 PM
Office location: Clark Hall CL 325
E-mail: ron.mak@sjsu.edu

DATA 201 Database Technologies for Data Analytics


Th 6:00 - 8:45 PM in room ISB 782

Introduction to SQL Queries (website)

Assignments

# Assigned Due Assignment
1 Jan 23 Jan 2309 Software installations

Database: school.sql
Python code: Assignment1.py
Jupyter notebook: Assignment1.ipynb
2 Jan 30 Feb 6 Create and load database tables with CSV data
3 Feb 6 Feb 13 a. Data modeling (team)
b. SQL practice #1 (individual)

Database import file: ZAGIsales.sql
Solutions: Asssignment3b-solutions.ipynb
4 Feb 13 Feb 20 a. Constraints and normalization (team)
b. Normalization practice (individual)

Solutions: Assignment4b-solutions.pdf
5 Feb 20 Feb 27 SQL practice #2 (individual)

Database: ZAGIsales.sql>
Solutions: Assignment5-solutions.ipynb
6 Feb 27 Mar 6 SQL practice #3 (team)
7 Mar 6 Mar 13 a. GUI programming (team)
b. Stored procedures (individual)

Database import file: CarDealership.sql
Solutions: CarDealershipProcsInstall.ipynb    CarDealershipProcsCall.ipynb
8 Mar 22 Mar 29 Team project proposal
9 Mar 27 Apr 10 Linear regression calculations on the database server

Solutions: LinearRegression-solutions.ipynb
10 Apr 10 Apr 17 Stored procedures for multiple regression analyses

Data :
ThreeIndependents.csv

Lectures

Week Date Content
1 Jan 23 Slides: Course objectives; project teams; team semester project; postmortem report; grades; what is data?; structured vs. unstructured data; what is information?; what is a database?; types of database management systems; operational database; analytical database; database application architecture; table structure; primary key; foreign key; introduction to SQL; SQL syntax; table joins; how queries work
2 Jan 30 Slides: Traditional vs. SQL-92 syntax; connect to a database; fetch records; query results in a dataframe; dynamically generated queries; create table; insert records; ETL of CSV data; clear a table; drop databases and tables; export/import a database; create a user; mysql command; database requirements; conceptual, logical, and physical data models; entity-relationship (ER) diagrams; one-to-one, one-to-many, and many-to-many relationships

Jupyter notebooks: 01-School.ipynb    02-FetchAll.ipynb    03-FetchOne.ipynb    04-FetchMany.ipynb    05-MabelFlynnStudents-df.ipynb    06-StudentsOf.ipynb    07-CreateTable.ipynb    08-Insert.ipynb    09-Insert-tuple.ipynb    10-Insert-dictionary.ipynb    11-CSV-read.ipynb    12-CSV-ETL.ipynb
Utility files: data201.py    employee.py
Configuration files: school.ini    sakila-remote.ini    personnel.ini    titanic.ini
CSV file: TitanicSurvival.csv
3 Feb 6 Slides: ER diagram (conceptual model); entities, attributes, and relationships; ERDPlus; mapping an ER diagram to a relational schema (logical model); entity integrity constraint; referential integrity constraint; mapping a relational schema to a database (physical model); remote database server; example project; EER diagram

Example project: Terra Bikes
4 Feb 13 Slides: Table creation with constraints; named constraints; table loading with constraints; ALTER TABLE command; DROP TABLE with constraints; functional dependencies; update anomalies; normalization; normal forms 1NF, 2NF, and 3NF
5 Feb 20 Checkpoint Exam #1
Solutions: solutions

Slides: SELECT; aggregate functions; GROUP BY; HAVING; nested queries; IS NULL; EXISTS; NOT; INSERT INTO SELECT

Jupyter notebooks: Nested-SELECT-Sales.ipynb    INSERT-INTO-SELECT-Titanic.ipynb    SELECT-IN-School.ipynb    SELECT-EXISTS-Building.ipynb
Databases: ZAGIsales.sql   building.sql
6 Feb 27 Tutorial: Slides: More ALTER TABLE; cascading deletes and updates; windowing functions; MYSQL text, numeric, date and time functions; relational algebra; join aliases; self, inner, left outer, right outer, and full outer joins; views; common table expressions (CTE); indexing

Jupyter notebooks: Windowing.ipynb    StringConcat.ipynb    Joins.ipynb    Views.ipynb    CTEs.ipynb    Indexes.ipynb
7 Mar 6 Slides: Database design example; stored procedures; triggers; events; Python GUI-based database applications

Notebooks: CarDealership.ipynb    StudentProcsInstall.sql    StudentProcsInstall.ipynb    StudentProcsCall.ipynb    TeacherProcsInstall.ipynb    TeacherProcsCall.ipynb    MonthProcs.sql    MonthProcsInstall.ipynb    MonthProcsCall.ipynb    Triggers.ipynb    Events.ipynb
GUI-based database application: DemoGUIApp.zip
8 Mar 13 Slides: SQL lab

Northwind database: northwind.sql
Query solutions: northwind-single.ipynb    northwind-multiple.ipynb
9 Mar 20 Checkpoint Exam #2
Solutions: solutions

Slides: Object-relational mapping (ORM); SQLAlchemy;

Jupyter notebook: ORM-school.ipynb
10 Mar 27 Slides: Checkpoint 2 solutions; transaction management; ACID properties; concurrency problems; transaction conflicts; locking; deadlocks; two-phase locking; recovery; checkpoints; data loss and recovery; reliability and RAID; linear regression calculations in SQL

Jupyter notebooks: CommitTest.ipynb    Transactions-commands.ipynb    Transactions-methods.ipynb    StudentBookWeights.ipynb
11 Apr 10 Slides: Assignment #9 solutions; data wrangling; missing, corrupted, irrelevant data; outliers; replace missing values with averages and by regression analysis; regression analysis in SQL; team projects

Jupyter notebooks: MissingWeightsPython.ipynb    MissingWeightsLoad.ipynb    MissingWeightsProcs.ipynb    HomePricesSklearn.ipynb    HomePricesMatrix.ipynb    HomePricesSQL.ipynb    ThreeIndependentsSklearn.ipynb    ThreeIndependentsSQL.ipynb    ThreeIndependents.csv
12 Apr 17 Slides: Calculating medians in SQL; time-series analysis: moving averages and exponential smoothing; Sakila example database; operational vs. analytical databases; application-oriented vs. subject-oriented; data warehouses; data warehouse components; business intelligence (BI); dimensional modeling; dimension tables; fact tables; star schema

Jupyter notebooks: MedianPython.ipynb    MedianSQL.ipynb    MovingAveragesPython.ipynb    MovingAveragesSQL.ipynb    ExponentialSmoothingsPython.ipynb    ExponentialSmoothingsSQL.ipynb
Databases: stock_prices.sql    sakila.sql

Goals

This class emphasizes developing operational and analytical data management applications in Python that use back-end databases and data warehouses. Project teams will develop interactive applications where users enter data on the command line and GUI-based (graphical user interface) applications where users enter data via dialogs and forms.

Course Learning Outcomes (CLO)

Prerequisites

Classified standing or instructor consent.

Recommended Books

Database Systems:
Introduction to Databases and Data Warehouses, Edition 2.0

Nenad Jukic, Susan Vrbsky, Svetlozar Nestorov, Abhishek Sharma
Prospect Press, 2021

eTextbook: 978-1-943153-67-1
available from Redshelf.com and VitalSource.com

Paperback: 978-1-943153-68-8
available from Redshelf.com

Many examples in class will come from this excellent book.
We will also use the book’s database modeling tools at https://erdplus.com
Principles of Database Management:
The Practical Guide to Storing, Managing and Analyzing Big and Small Data

Wilfried Lemahieu, Seppe vanden Broucke, Bart Baesens
Cambridge University Press, 20185
978-1107186125

A more advanced text.
Well-written with many examples and colorful diagrams.
Murach's MySQL, 4th edition
Joel Murach
Mike Murach & Associates, Inc.
978-1-943873-10-4

An excellent tutorial on the MySQL dialect of SQL and on database administration
using MySQL Workbench, plus a chapter on hosting a database on AWS.
The Data Warehouse Toolkit:
The Definitive Guide to Dimensional Modeling, 3rd Edition

Ralph Kimball and Margy
Wiley, 2013
978-1118530801

Ralph Kimball is a pioneer of data warehousing and dimensional data modeling.
I have worked closely with Dr. Kimball in the past.

Some Useful Websites