{ "cells": [ { "cell_type": "markdown", "id": "aacf6221-15ab-4ca1-b905-d0940640c955", "metadata": {}, "source": [ "# Rollup support" ] }, { "cell_type": "code", "execution_count": null, "id": "828baba8-de63-4b3b-bb7b-38af896541ff", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from pandas import DataFrame\n", "from DATA225utils import make_connection, dataframe_query\n", "\n", "pd.set_option('display.max_rows', None)" ] }, { "cell_type": "code", "execution_count": null, "id": "301bb68a-fc48-4225-864a-202bb4b0f4e3", "metadata": {}, "outputs": [], "source": [ "conn = make_connection(config_file='students.ini')\n", "cursor = conn.cursor()" ] }, { "cell_type": "markdown", "id": "cca89b38-212b-495b-97bb-b91aaafc9f42", "metadata": { "tags": [] }, "source": [ "## Window functions\n", "#### We use window functions to display all the subtotals and the grand total." ] }, { "cell_type": "code", "execution_count": null, "id": "6ff7e815-c2c5-4ce4-b1d8-2683932ca98c", "metadata": { "tags": [] }, "outputs": [], "source": [ "_, df = dataframe_query(conn, \n", " \"\"\"\n", " SELECT DISTINCT \n", " school_year, semester, school, department, level,\n", " SUM(count) OVER() \n", " AS grand_total,\n", " SUM(count) OVER(PARTITION BY school_year) \n", " AS yearly_total,\n", " SUM(count) OVER(PARTITION BY school_year, semester) \n", " AS semester_total,\n", " SUM(count) OVER(PARTITION BY school_year, semester, school) \n", " AS school_total,\n", " SUM(count) OVER(PARTITION BY school_year, semester, \n", " school, department) \n", " AS dept_total,\n", " SUM(count) OVER(PARTITION BY school_year, semester, \n", " school, department, level) \n", " AS level_total\n", " FROM calendar, classes, counts\n", " WHERE counts.when = calendar.key\n", " AND counts.which = classes.key\n", " ORDER BY school_year, semester, school, department, level\n", " \"\"\"\n", ")\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "f205cf8c-ae58-49d8-ad48-67dd2280100d", "metadata": {}, "source": [ "## Problems with using the window functions\n", "- #### Extra columns\n", "- #### Repeated subtotals values\n", "- #### Hard to read" ] }, { "cell_type": "markdown", "id": "dc940543-d7cb-46d5-ad1c-9bdb48a47c5e", "metadata": {}, "source": [ "## `GROUP BY ... WITH ROLLUP`\n", "#### The `GROUP BY` clause creates the drill-down hierarchy `school_year`, `semester`, `school`, `department`, `level`, and `class`. Adding `WITH ROLLUP` to the clause tells `SUM(count)` to calculate subtotals up the hierarchy: level subtotals, department subtotals, school subtotals, etc. In a row that contains a subtotal at a particular hierarchy level, the fields whose values were added are `NULL` (displayed as `None` in the dataframe).\n", "#### For example, in line 2 below, 41 is the subtotal of the Electrical Engineering graduate classes, and so the `class` field is `None`. In line 6, 88 is the subtotal of all the Electrical Engineering classes, and so both the `level` and `class` fields are `None`." ] }, { "cell_type": "code", "execution_count": null, "id": "b7f949c4-be3a-41e8-a421-917a0d515210", "metadata": {}, "outputs": [], "source": [ "_, df = dataframe_query(conn, \n", " \"\"\"\n", " SELECT school_year, semester, school, department, level, class, \n", " SUM(count) AS TOTAL\n", " FROM calendar, classes, counts\n", " WHERE counts.when = calendar.key\n", " AND counts.which = classes.key\n", " GROUP BY school_year, semester, school, department, level, class\n", " WITH ROLLUP\n", " \"\"\"\n", ")\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "37c15b50-86de-44f2-a4e4-1b29e20c9d8d", "metadata": {}, "source": [ "## `GROUPING()` function\n", "#### The `GROUPING()` function takes a field as its argument, and it returns 1 if the field is `NULL` due to `ROLLUP` subtotaling. Otherwise, it return 0 for any other value, including `NULL` ***not*** generated by subtotaling. We can use these 0 and 1 values in `IF` tests to enhance the dataframe display.\n", "#### The table below shows the values of the `GROUPING()` function. Each 1 corresponds to a `None` generated by subtotaling." ] }, { "cell_type": "code", "execution_count": null, "id": "5034bc2b-0bc4-4a73-82c0-9ebda549c162", "metadata": {}, "outputs": [], "source": [ "_, df = dataframe_query(conn, \n", " \"\"\"\n", " SELECT school_year, semester,\n", " school, department, level,\n", " GROUPING(school_year) AS GRAND,\n", " GROUPING(semester) AS YEAR,\n", " GROUPING(school) AS SEMESTER,\n", " GROUPING(department) AS SCHOOL,\n", " GROUPING(level) AS DEPT\n", " FROM calendar, classes, counts\n", " WHERE counts.when = calendar.key\n", " AND counts.which = classes.key\n", " GROUP BY school_year, semester, school, department, level \n", " WITH ROLLUP\n", " \"\"\"\n", ")\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "333c96c6-5a6d-4d03-a740-8147be0dbd64", "metadata": {}, "source": [ "## Total student counts with embedded labels\n", "#### We use the results of calling the `GROUPING()` function to display a subtotal label. In a row with more than one label, the leftmost label says at which hierarchy level the total was calculated.\n", "#### For example, at line 6 in the table below, 103 is the subtotal of the Information school, and in line 13, 352 is the subtotal of the fall semester." ] }, { "cell_type": "code", "execution_count": null, "id": "b5b2cc08-a35b-421e-8d73-750be7e84058", "metadata": {}, "outputs": [], "source": [ "_, df = dataframe_query(conn, \n", " \"\"\"\n", " SELECT \n", " -- --------- ---- -----\n", " -- CONDITION TRUE FALSE\n", " -- --------- ---- -----\n", " IF (GROUPING(school_year), '***** GRAND TOTAL', school_year) \n", " AS school_year,\n", " IF (GROUPING(semester), '**** YEARLY TOTAL', semester) \n", " AS semester,\n", " IF (GROUPING(school), '*** SEMESTER TOTAL', school) \n", " AS school,\n", " IF (GROUPING(department), '** SCHOOL TOTAL', department) \n", " AS department,\n", " IF (GROUPING(level), '* DEPARTMENT TOTAL', level) \n", " AS level,\n", " SUM(count) AS total\n", " FROM calendar, classes, counts\n", " WHERE counts.when = calendar.key\n", " AND counts.which = classes.key\n", " GROUP BY school_year, semester, school, department, level \n", " WITH ROLLUP\n", " \"\"\"\n", ")\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "952c3cac-9226-4b8e-be7e-ec251b12af89", "metadata": {}, "source": [ "## Total student counts with side labels\n", "#### The table below contains the same information as the table above, except that it puts the subtotal labels in a column at the right. All the fields of a subtotal row are blank." ] }, { "cell_type": "code", "execution_count": null, "id": "3462d47b-c435-49d0-aad4-a5ee5a1c9405", "metadata": {}, "outputs": [], "source": [ "_, df = dataframe_query(conn, \n", " \"\"\"\n", " SELECT \n", " -- --------- ---- -----\n", " -- CONDITION TRUE FALSE\n", " -- --------- ---- -----\n", " IF ( GROUPING(school_year)\n", " OR GROUPING(semester) \n", " OR GROUPING(school) \n", " OR GROUPING(department) \n", " OR GROUPING(level), '', school_year) AS school_year,\n", " \n", " IF ( GROUPING(school_year)\n", " OR GROUPING(semester) \n", " OR GROUPING(school) \n", " OR GROUPING(department) \n", " OR GROUPING(level), '', semester) AS semester,\n", " \n", " IF ( GROUPING(school_year)\n", " OR GROUPING(semester) \n", " OR GROUPING(school) \n", " OR GROUPING(department) \n", " OR GROUPING(level), '', school) AS school,\n", "\n", " IF ( GROUPING(school_year)\n", " OR GROUPING(semester) \n", " OR GROUPING(school) \n", " OR GROUPING(department) \n", " OR GROUPING(level), '', department) AS department,\n", " \n", " IF ( GROUPING(school_year)\n", " OR GROUPING(semester) \n", " OR GROUPING(school) \n", " OR GROUPING(department) \n", " OR GROUPING(level), '', level) AS level,\n", " \n", " SUM(count) AS TOTAL,\n", " \n", " IF (GROUPING(school_year), '***** GRAND',\n", " IF (GROUPING(semester), '**** YEARLY',\n", " IF (GROUPING(school), '*** SEMESTER',\n", " IF (GROUPING(department), '** SCHOOL',\n", " IF (GROUPING(level), '* DEPARTMENT',\n", " ''))))) AS LABEL\n", " \n", " FROM calendar, classes, counts\n", " WHERE counts.when = calendar.key\n", " AND counts.which = classes.key\n", " GROUP BY school_year, semester, school, department, level \n", " WITH ROLLUP\n", " \"\"\"\n", ")\n", "\n", "df" ] }, { "cell_type": "code", "execution_count": null, "id": "3c1f3d0b-4929-4695-a728-493f0d9998b0", "metadata": {}, "outputs": [], "source": [ "cursor.close()\n", "conn.close()" ] }, { "cell_type": "markdown", "id": "fab647f6-2f57-4770-a8e5-4662e0773c11", "metadata": {}, "source": [ "#### (c) 2023 by Ronald Mak" ] }, { "cell_type": "code", "execution_count": null, "id": "1aad8eb9-c1c8-43ee-9095-12ceeefdad76", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }