San Jose State University : Site Name

Navigation

Main Content

Cats like to be read to.

Ronald Mak

Department of Computer Science
Department of Applied Data Science
Spring Semester 2023

Office hours: MW 4:30 – 5:30 PM Clark Hall CL 325
Office location: Clark Hall CL 325 (mostly working from home)
E-mail: ron.mak@sjsu.edu

DATA 225 Database Systems for Analytics (section 22)


M 6:00 - 8:45 PM room BBC 224

Intro to SQL

Website hosted on an internal campus server.
You must be on campus wifi or VPN.

Assignments

# Assigned Due Assignment
1 Jan 30 Feb 6 Software installations

Database: school.sql
Python code: Assignment1.py
Jupyter notebook: Assignment1.ipynb
2 Feb 6 Feb 13 Create and load database tables with CSV data
3 Feb 13 Feb 20 a. Data modeling (team)

b. SQL practice #1 (individual)
Database import file: ZAGIsales.sql
Solutions: Asssignment3b-solutions.ipynb
4 Feb 20 Mar 1 a. Constraints and normalization

b. Normalization practice (individual)
Solutions: Asssignment4b-solutions.ipynb
5 Mar 2 Mar 6 SQL practice #2 (individual)
Solutions: Assignment5-solutions.ipynb
6 Mar 6 Mar 13 SQL practice #3 (team)
7a Mar 13 Mar 27 GUI programming (team)
7b Mar 13 Mar 20 Stored procedures (individual)
Database import file: CarDealership.sql
Solutions: CarDealershipProcsInstall.ipynb    CarDealershipProcsCall.ipynb

Lectures

Week Date Content
1 Jan 30 Slides: Course objectives; project teams; team semester project; postmortem report; grades; what is data?; structured vs. unstructured data; what is information?; what is a database?; types of database management systems; operational database; analytical database; database application architecture; table structure; primary key; foreign key; introduction to SQL; SQL syntax; table joins; how queries work
2 Feb 6 Slides: Connect to a database; fetch records; query results in a dataframe; dynamically generated queries; create table; insert records; ETL of CSV data; clear a table; drop databases and tables; export/import a database; create a user; mysql command; database requirements; conceptual, logical, and physical data models; entity-relationship (ER) diagrams; one-to-one, one-to-many, and many-to-many relationships

Jupyter notebooks: 01-School.ipynb    02-FetchAll.ipynb    03-FetchOne.ipynb    04-FetchMany.ipynb    05-MabelFlynnStudents-df.ipynb    06-StudentsOf.ipynb    07-CreateTable.ipynb    08-Insert.ipynb    09-Insert-tuple.ipynb    10-Insert-dictionary.ipynb    11-CSV-read.ipynb    12-CSV-ETL.ipynb
Utility files: DATA225utils.py    employee.py
Configuration files: school.ini    school-remote.ini    personnel.ini    titanic.ini
CSV file: TitanicSurvival.csv
3 Feb 13 Slides: ER diagram (conceptual model); entities, attributes, and relationships; ERDPlus; mapping an ER diagram to a relational schema (logical model); entity integrity constraint; referential integrity constraint; mapping a relational schema to a database (physical model); EER diagram
4 Feb 20 Slides: Table creation with constraints; named constraints; table loading with constraints; ALTER TABLE command; DROP TABLE with constraints; functional dependencies; update anomalies; normalization; normal forms 1NF, 2NF, and 3NF
5 Feb 27 Checkpoint Exam #1
Solutions: Checkpoint-1-solutions.pptx

Slides: SELECT; aggregate functions; GROUP BY; HAVING; nested queries; IS NULL; EXISTS; NOT; INSERT INTO SELECT

Jupyter notebooks: Nested-SELECT.ipynb    INSERT-INTO-SELECT.ipynb
Data: ZAGIsales.sql   building.sql   TitanicSurvival.csv
6 Mar 6 Tutorial: Access the Remote School Database Server

Slides: ALTER TABLE; cascading deletes and updates; windowing functions; MYSQL text, numeric, date and time functions; relational algebra; joins; join aliases; self, inner, left outer, right outer, and full outer joins; views; indexing

Jupyter notebooks: Windowing.ipynb    StringConcat.ipynb    Joins.ipynb    Views.ipynb    Indexes.ipynb
7 Mar 13 Slides: Nested SELECTS and views; database design example; stored procedures; Python GUI-based database application

Notebooks: NestedSelect-View.ipynb    CarDealership.ipynb    StudentProcs.ipynb    StudentProcsInstall.ipynb    StudentProcsCall.ipynb    TitanicProcs.ipynb    TitanicProcsInstall.ipynb    TitanicProcsCall.ipynb
GUI-based database application: GUIAppMain.py    GUIApp.ipynb    AppWindow.py    StudentDialog.py    TeacherDialog.py    TeacherProcsInstall.ipynb    gui_app.ui    student_dialog.ui    teacher_dialog.ui
8 Mar 20 Slides: Object-relational mapping (ORM); SQLAlchemy; transaction management; ACID properties; concurrency problems; transaction conflicts; locking; deadlocks; two-phase locking; recovery; checkpoints; data loss and recovery; reliability and RAID

Jupyter notebooks: ORM-school.ipynb    ORM-customer.ipynb    Transactions-commands.ipynb    Transactions-methods.ipynb

Goals

This class emphasizes developing operational and analytical data management applications in Python that use back-end databases and data warehouses. Project teams will develop interactive applications where users enter data on the command line and GUI-based (graphical user interface) applications where users enter data via dialogs and forms.

Course Learning Outcomes (CLO)

Prerequisites

Classified standing or instructor consent.

Recommended Books

Database Systems:
Introduction to Databases and Data Warehouses, Edition 2.0

Nenad Jukic, Susan Vrbsky, Svetlozar Nestorov, Abhishek Sharma
Prospect Press, 2021

eTextbook: 978-1-943153-67-1
available from Redshelf.com and VitalSource.com

Paperback: 978-1-943153-68-8
available from Redshelf.com

Many examples in class will come from this excellent book.
We will also use the book’s database modeling tools at https://erdplus.com
Principles of Database Management:
The Practical Guide to Storing, Managing and Analyzing Big and Small Data

Wilfried Lemahieu, Seppe vanden Broucke, Bart Baesens
Cambridge University Press, 20185
978-1107186125

A more advanced text.
Well-written with many examples and colorful diagrams.
The Data Warehouse Toolkit:
The Definitive Guide to Dimensional Modeling, 3rd Edition

Ralph Kimball and Margy
Wiley, 2013
978-1118530801

Ralph Kimball is a pioneer of data warehousing and dimensional data modeling.
I have worked closely with Dr. Kimball in the past.

Some Useful Websites