{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "4f97970b-6879-484a-a6b4-6f20d448c884", "metadata": {}, "outputs": [], "source": [ "from DATA225utils import make_connection, dataframe_query" ] }, { "cell_type": "code", "execution_count": 2, "id": "6e08ff77-36d4-46dd-a5e9-6e0e69edc45c", "metadata": {}, "outputs": [], "source": [ "conn = make_connection(config_file = 'ZAGI-sales.ini')\n", "cursor = conn.cursor()" ] }, { "cell_type": "markdown", "id": "616a9710-00e5-4bb5-95e5-fb56f78320c7", "metadata": {}, "source": [ "### **1.** Display the region ID, region name, and number of stores in the region for all regions." ] }, { "cell_type": "code", "execution_count": 3, "id": "9b65916f-9c8b-42fe-a55d-6fac471d1681", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
region_idregion_namecount(*)
0CChicagoland2
1TTristate1
\n", "
" ], "text/plain": [ " region_id region_name count(*)\n", "0 C Chicagoland 2\n", "1 T Tristate 1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT r.region_id, r.region_name, count(*)\n", " FROM region r, store s\n", " WHERE r.region_id = s.region_id\n", " GROUP BY r.region_id, r.region_name\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "10c8af26-9fb2-42d2-b783-5be273d4aabb", "metadata": {}, "source": [ "### **2.** For each product category, display the category ID, category name, and average price of a product in the category." ] }, { "cell_type": "code", "execution_count": 4, "id": "7230bc4a-a2e3-4f3e-a21e-39b72030b7ed", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
category_idcategory_nameavg(p.product_price)
0CPCamping166.666667
1FWFootwear58.333333
\n", "
" ], "text/plain": [ " category_id category_name avg(p.product_price)\n", "0 CP Camping 166.666667\n", "1 FW Footwear 58.333333" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT c.category_id, c.category_name, avg(p.product_price)\n", " FROM product p, category c\n", " WHERE p.category_id = c.category_id\n", " GROUP BY c.category_id, c.category_name\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "722d1353-48c0-4c4b-920e-1b03548c2570", "metadata": {}, "source": [ "### **3.** For each product category, display the category ID and the total number of items purchased in the category." ] }, { "cell_type": "code", "execution_count": 5, "id": "c3cff1cf-9cd4-45c5-9a3c-4e9a8b18116e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
category_idsum(s.no_of_items)
0CP5
1FW13
\n", "
" ], "text/plain": [ " category_id sum(s.no_of_items)\n", "0 CP 5\n", "1 FW 13" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT p.category_id, sum(s.no_of_items)\n", " FROM sold_via s, product p\n", " WHERE s.product_id = p.product_id\n", " GROUP BY p.category_id\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "dcb26e78-a6b6-479e-b505-252b2b0079df", "metadata": {}, "source": [ "### **4.** Display the TID and the total number of items (of all products) sold within the transaction for all sales transactions whose total number of items (of all products) sold within the transaction is greater than five." ] }, { "cell_type": "code", "execution_count": 6, "id": "b7ccadbb-e686-4cdd-bddf-4cbb57e7eaf5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tidSUM(no_of_items)
0T3336
1T5557
\n", "
" ], "text/plain": [ " tid SUM(no_of_items)\n", "0 T333 6\n", "1 T555 7" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT tid, SUM(no_of_items)\n", " FROM sold_via\n", " GROUP BY tid\n", " HAVING SUM(no_of_items) > 5\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "f83ae713-0b17-4da3-9d01-678cde05f82b", "metadata": {}, "source": [ "### **5.** Display the product ID and ProductName of the cheapest product." ] }, { "cell_type": "code", "execution_count": 7, "id": "59df5153-187f-4960-8112-393b6c1ceff8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_idproduct_name
03X3Cosy Sock
\n", "
" ], "text/plain": [ " product_id product_name\n", "0 3X3 Cosy Sock" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT product_id, product_name\n", " FROM product\n", " WHERE product_price = (SELECT MIN(product_price)\n", " FROM product)\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "453dd348-760a-4737-889f-f648b121dddd", "metadata": {}, "source": [ "### **6.** Display the product ID for the product that has been sold the most (i.e., that has been sold in the highest quantity)." ] }, { "cell_type": "markdown", "id": "2b04fc41-ea8a-4cd7-b337-4942b531bf6c", "metadata": {}, "source": [ "#### The following query works with most versions of SQL. The nested SELECT in the HAVING clause selects the highest number of product sales." ] }, { "cell_type": "code", "execution_count": 8, "id": "6077a6fc-3447-4da5-8ca5-c148e9e8b827", "metadata": {}, "outputs": [ { "ename": "Exception", "evalue": "Query failed: 1111 (HY000): Invalid use of group function", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mDatabaseError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m~/~mak/DATA225/assignments/5/solutions/DATA225utils.py\u001b[0m in \u001b[0;36mdataframe_query\u001b[0;34m(conn, sql)\u001b[0m\n\u001b[1;32m 67\u001b[0m \u001b[0mcursor\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcursor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 68\u001b[0;31m \u001b[0mcursor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msql\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 69\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/mysql/connector/cursor.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, operation, params, multi)\u001b[0m\n\u001b[1;32m 560\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 561\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_handle_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_connection\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcmd_query\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstmt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 562\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mInterfaceError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/mysql/connector/connection.py\u001b[0m in \u001b[0;36mcmd_query\u001b[0;34m(self, query, raw, buffered, raw_as_string)\u001b[0m\n\u001b[1;32m 989\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 990\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_handle_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_send_cmd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mServerCmd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mQUERY\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mquery\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 991\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mProgrammingError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/mysql/connector/connection.py\u001b[0m in \u001b[0;36m_handle_result\u001b[0;34m(self, packet)\u001b[0m\n\u001b[1;32m 783\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mpacket\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m255\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 784\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mget_exception\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpacket\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 785\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mDatabaseError\u001b[0m: 1111 (HY000): Invalid use of group function", "\nDuring handling of the above exception, another exception occurred:\n", "\u001b[0;31mException\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/s1/ksqw692x33n0d4mn6kwcpdl00000gr/T/ipykernel_80467/3729357571.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m _, df = dataframe_query(conn,\n\u001b[0m\u001b[1;32m 2\u001b[0m \"\"\"\n\u001b[1;32m 3\u001b[0m \u001b[0mSELECT\u001b[0m \u001b[0mproduct_id\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mFROM\u001b[0m \u001b[0msold_via\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mGROUP\u001b[0m \u001b[0mBY\u001b[0m \u001b[0mproduct_id\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/~mak/DATA225/assignments/5/solutions/DATA225utils.py\u001b[0m in \u001b[0;36mdataframe_query\u001b[0;34m(conn, sql)\u001b[0m\n\u001b[1;32m 89\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 90\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 91\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf'Query failed: {e}'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 92\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[0;31m# Copyright (c) 2023 by Ronald Mak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mException\u001b[0m: Query failed: 1111 (HY000): Invalid use of group function" ] } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT product_id\n", " FROM sold_via\n", " GROUP BY product_id\n", " HAVING SUM(no_of_items) = (SELECT MAX(SUM(no_of_items))\n", " FROM sold_via\n", " GROUP BY product_id)\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "745fa170-aae5-4c7c-8e9e-0098f26964f6", "metadata": {}, "source": [ "#### Could the problem here be with the nested SELECT?" ] }, { "cell_type": "code", "execution_count": 9, "id": "632bdaff-a115-4e16-8725-231d511bbacc", "metadata": {}, "outputs": [ { "ename": "Exception", "evalue": "Query failed: 1111 (HY000): Invalid use of group function", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mDatabaseError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m~/~mak/DATA225/assignments/5/solutions/DATA225utils.py\u001b[0m in \u001b[0;36mdataframe_query\u001b[0;34m(conn, sql)\u001b[0m\n\u001b[1;32m 67\u001b[0m \u001b[0mcursor\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcursor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 68\u001b[0;31m \u001b[0mcursor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msql\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 69\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/mysql/connector/cursor.py\u001b[0m in \u001b[0;36mexecute\u001b[0;34m(self, operation, params, multi)\u001b[0m\n\u001b[1;32m 560\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 561\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_handle_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_connection\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcmd_query\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstmt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 562\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mInterfaceError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/mysql/connector/connection.py\u001b[0m in \u001b[0;36mcmd_query\u001b[0;34m(self, query, raw, buffered, raw_as_string)\u001b[0m\n\u001b[1;32m 989\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 990\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_handle_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_send_cmd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mServerCmd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mQUERY\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mquery\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 991\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mProgrammingError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/lib/python3.9/site-packages/mysql/connector/connection.py\u001b[0m in \u001b[0;36m_handle_result\u001b[0;34m(self, packet)\u001b[0m\n\u001b[1;32m 783\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mpacket\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m255\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 784\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mget_exception\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpacket\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 785\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mDatabaseError\u001b[0m: 1111 (HY000): Invalid use of group function", "\nDuring handling of the above exception, another exception occurred:\n", "\u001b[0;31mException\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/s1/ksqw692x33n0d4mn6kwcpdl00000gr/T/ipykernel_80467/3430846782.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m _, df = dataframe_query(conn,\n\u001b[0m\u001b[1;32m 2\u001b[0m \"\"\"\n\u001b[1;32m 3\u001b[0m \u001b[0mSELECT\u001b[0m \u001b[0mMAX\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mSUM\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mno_of_items\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mFROM\u001b[0m \u001b[0msold_via\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mGROUP\u001b[0m \u001b[0mBY\u001b[0m \u001b[0mproduct_id\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/~mak/DATA225/assignments/5/solutions/DATA225utils.py\u001b[0m in \u001b[0;36mdataframe_query\u001b[0;34m(conn, sql)\u001b[0m\n\u001b[1;32m 89\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 90\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 91\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf'Query failed: {e}'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 92\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[0;31m# Copyright (c) 2023 by Ronald Mak\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mException\u001b[0m: Query failed: 1111 (HY000): Invalid use of group function" ] } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT MAX(SUM(no_of_items))\n", " FROM sold_via\n", " GROUP BY product_id\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "382563b3-2413-4540-95dd-a9821f3a62f0", "metadata": {}, "source": [ "#### Yes, that's where the problem is. MySQL apparently doesn't like the combination MAX(SUM(no_of_items)) in the SELECT. No, I don't know why MySQL is missing certain features. \n", "\n", "#### So let's find the highest product sales in another way. First, what are the number of sales per product?" ] }, { "cell_type": "code", "execution_count": 10, "id": "b6094b67-a127-4f82-bbcd-8aac1aefcf48", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_idtotal_sales
01X12
12X23
23X35
34X45
45X52
56X61
\n", "
" ], "text/plain": [ " product_id total_sales\n", "0 1X1 2\n", "1 2X2 3\n", "2 3X3 5\n", "3 4X4 5\n", "4 5X5 2\n", "5 6X6 1" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Query A\n", "\n", "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT product_id, sum(no_of_items) AS total_sales\n", " FROM sold_via\n", " GROUP BY product_id\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "b82f9f2e-c939-4acd-954c-e5081197269a", "metadata": {}, "source": [ "#### We need to treat the above results as a **virtual table**. MySQL insists that we give it a name, so we can call it `total_sales_table`. Then we can select the maximum from the `total_sales` column." ] }, { "cell_type": "code", "execution_count": 11, "id": "63120a52-476f-42f1-9d3f-b377db6907a1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
highest_sales
05
\n", "
" ], "text/plain": [ " highest_sales\n", "0 5" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Query B\n", "\n", "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT max(total_sales) AS highest_sales\n", " FROM (SELECT sum(no_of_items) AS total_sales\n", " FROM sold_via\n", " GROUP BY product_id) AS total_sales_table\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "ea34c9e5-5c95-4289-bcf0-011c86436926", "metadata": {}, "source": [ "#### Another way to obtain the highest product sales is to sort the result of Query A in descending order and then take the top row." ] }, { "cell_type": "code", "execution_count": 12, "id": "4add9b82-a515-4284-8c3d-d46bf948d6b6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
highest_sales
05
\n", "
" ], "text/plain": [ " highest_sales\n", "0 5" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Query B'\n", "\n", "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT sum(no_of_items) AS highest_sales\n", " FROM sold_via\n", " GROUP BY product_id\n", " ORDER BY highest_sales DESC\n", " LIMIT 1\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "57ed3b32-a831-4e67-afeb-d2f98d61003e", "metadata": {}, "source": [ "#### We cannot use this trick (sort in descending order and then simply take the top row) to retrieve all the products that have the highest sales, because there can be more than one product." ] }, { "cell_type": "markdown", "id": "d6321260-366a-464a-873f-8f0c76245581", "metadata": {}, "source": [ "#### In either case, we have a SELECT command that returns the highest sales, we can use it to select the products whose total sales matches the highest sales. We use the SELECT command in the HAVING clause.\n", "\n", "#### Query C below combines Query A and Query B." ] }, { "cell_type": "code", "execution_count": 13, "id": "5e9d4dc1-8b65-46e8-9a25-f4dabde83ef9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_id
03X3
14X4
\n", "
" ], "text/plain": [ " product_id\n", "0 3X3\n", "1 4X4" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Query C\n", "\n", "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT product_id\n", " FROM sold_via\n", " GROUP BY product_id\n", " HAVING SUM(no_of_items) = \n", " (SELECT max(total_sales) AS highest_sales\n", " FROM (SELECT sum(no_of_items) AS total_sales\n", " FROM sold_via\n", " GROUP BY product_id) AS total_sales_table)\n", " ORDER BY product_id\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "d97ecb21-230b-4c67-aebd-ed0722712034", "metadata": {}, "source": [ "#### Now that we know about views, see how views can simplify the queries." ] }, { "cell_type": "code", "execution_count": 14, "id": "d8b950df-cc54-4509-9ffa-26b0a557d121", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_idtotal_sales
01X12
12X23
23X35
34X45
45X52
56X61
\n", "
" ], "text/plain": [ " product_id total_sales\n", "0 1X1 2\n", "1 2X2 3\n", "2 3X3 5\n", "3 4X4 5\n", "4 5X5 2\n", "5 6X6 1" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute('DROP VIEW IF EXISTS sales_per_product')\n", "\n", "sql = ( \"\"\"\n", " CREATE VIEW sales_per_product AS\n", " SELECT product_id, sum(no_of_items) AS total_sales\n", " FROM sold_via\n", " GROUP BY product_id\n", " \"\"\"\n", " )\n", "\n", "cursor.execute(sql)\n", "\n", "_, df = dataframe_query(conn, 'SELECT * FROM sales_per_product')\n", "df" ] }, { "cell_type": "code", "execution_count": 15, "id": "ac868a07-4026-41ba-827d-7e79402335f1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
highest_sales
05
\n", "
" ], "text/plain": [ " highest_sales\n", "0 5" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute('DROP VIEW IF EXISTS highest_sales')\n", "\n", "sql = ( \"\"\"\n", " CREATE VIEW highest_sales AS\n", " SELECT max(total_sales) AS highest_sales\n", " FROM sales_per_product\n", " \"\"\"\n", " )\n", "\n", "cursor.execute(sql)\n", "\n", "_, df = dataframe_query(conn, 'SELECT * FROM highest_sales')\n", "df" ] }, { "cell_type": "code", "execution_count": 16, "id": "29ca5a9c-9838-45b0-9341-8af82ea556ad", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_id
03X3
14X4
\n", "
" ], "text/plain": [ " product_id\n", "0 3X3\n", "1 4X4" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT product_id\n", " FROM sold_via\n", " GROUP BY product_id\n", " HAVING SUM(no_of_items) = \n", " (SELECT highest_sales FROM highest_sales)\n", " ORDER BY product_id\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "3a9ac60d-8c3c-4ee8-b6fe-9c21a67f7041", "metadata": {}, "source": [ "#### **7.** Rewrite the following query using a join instead of the nested query. Your rewritten query should produce the same results.\n", "\n", "#### _For each product that has more than three items sold within all sales transactions, retrieve the product ID, product name, and product price._\n", "\n", "``` sql\n", "SELECT product_id, product_name, product_price\n", "FROM product\n", "WHERE product_id IN (SELECT product_id\n", " FROM sold_via\n", " GROUP BY product_id\n", " HAVING SUM(no_of_items) > 3)\n", "```" ] }, { "cell_type": "code", "execution_count": 17, "id": "80656041-aa15-4f4e-9e69-3df4366fa942", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_idproduct_nameproduct_price
03X3Cosy Sock15.00
14X4Dura Boot90.00
\n", "
" ], "text/plain": [ " product_id product_name product_price\n", "0 3X3 Cosy Sock 15.00\n", "1 4X4 Dura Boot 90.00" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT p.product_id, product_name, product_price\n", " FROM product p, sold_via s\n", " WHERE p.product_id = s.product_id\n", " GROUP BY p.product_id, p.product_name, p.product_price\n", " HAVING SUM(s.no_of_items) > 3\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "markdown", "id": "b54633cd-cd35-4c2d-a955-ae04761d79b5", "metadata": {}, "source": [ "#### **8.** Rewrite the following query using a join instead of the nested query. Your rewritten query should produce the same results.\n", "\n", "#### _For each product whose items were sold in more than one sales transaction, retrieve the product id, product name and product price._\n", "``` sql\n", "SELECT product_id, product_name, product_price\n", "FROM product\n", "WHERE product_id IN (SELECT product_id\n", " FROM sold_via\n", " GROUP BY product_id\n", " HAVING COUNT(*) > 1)\n", "```" ] }, { "cell_type": "code", "execution_count": 18, "id": "bce51b9a-9e7e-407d-a879-cb0a3c3b788b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_idproduct_nameproduct_price
01X1Zzz Bag100.00
12X2Easy Boot70.00
24X4Dura Boot90.00
\n", "
" ], "text/plain": [ " product_id product_name product_price\n", "0 1X1 Zzz Bag 100.00\n", "1 2X2 Easy Boot 70.00\n", "2 4X4 Dura Boot 90.00" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, df = dataframe_query(conn,\n", " \"\"\"\n", " SELECT p.product_id, product_name, product_price\n", " FROM product p, sold_via s\n", " WHERE p.product_id = s.product_id\n", " GROUP BY p.product_id, p.product_name, p.product_price\n", " HAVING COUNT(s.tid) > 1\n", " \"\"\"\n", " )\n", "\n", "df" ] }, { "cell_type": "code", "execution_count": 19, "id": "81ee622b-ad25-44c0-a335-76bae156c6ae", "metadata": {}, "outputs": [], "source": [ "cursor.close()\n", "conn.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }