{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "2b6f1e1d-142e-49a4-a8d6-01f05b59dbc8", "metadata": {}, "outputs": [], "source": [ "from data201 import db_connection, df_query" ] }, { "cell_type": "code", "execution_count": null, "id": "e1955763-1657-4bc2-8e4f-d7a94d29efa7", "metadata": {}, "outputs": [], "source": [ "conn = db_connection(config_file = 'northwind.ini')" ] }, { "cell_type": "markdown", "id": "7aa1ec04-10d7-4590-bd5c-2a18a12e5cb0", "metadata": {}, "source": [ "# Multi-Table Queries" ] }, { "cell_type": "markdown", "id": "e24d34cf-c6d6-469c-ae8f-548697fdf84e", "metadata": {}, "source": [ "### **QUERY 11.** Who were the high-value customers in 2016 who made at least one order with a total amount (quantity times unit price) of \\\\$10,000 or more? Show customer ID, company name, order ID, and total order amount. Sort in descending order of amount." ] }, { "cell_type": "code", "execution_count": null, "id": "0bdf068e-f101-4aef-bafd-308b9c572f71", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT c.customer_id AS \"Customer ID\", \n", " c.company_name AS \"Company name\",\n", " o.order_id AS \"Order ID\",\n", " CONCAT('$', FORMAT(SUM(d.quantity*d.unit_price), 2)) \n", " AS \"Total amount\"\n", " FROM customers c\n", " JOIN orders o USING (customer_id)\n", " JOIN order_details d USING (order_id)\n", " WHERE YEAR(o.order_date) = 2016\n", " GROUP BY c.customer_id,\n", " c.company_name, \n", " o.order_id\n", " HAVING SUM(d.quantity*d.unit_price) >= 10000\n", " ORDER BY SUM(d.quantity*d.unit_price) DESC\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "843840d1-3050-42d2-94b6-52e5da6c1a6b", "metadata": {}, "source": [ "### **QUERY 12.** Display the customer ID, company name, ship country, and the order date and time of any rush orders within 15 hours before Christmas 2015. Format the date and time.\n", "#### **TIP:** `DATE_ADD('20151225', INTERVAL -15 HOUR)` is the last 15 hours before Chrismas." ] }, { "cell_type": "code", "execution_count": null, "id": "8801b3a8-54a3-482f-a9c0-46cf28ee8f52", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT o.customer_id AS \"Ccustomer ID\", \n", " company_name AS \"Company name\", \n", " ship_country AS \"Ship country\", \n", " DATE_FORMAT(order_date, '%b %d, %Y @ %h:%i %p') \n", " AS \"Rush order time\"\n", " FROM orders o\n", " JOIN customers c\n", " ON c.customer_id = o.customer_id\n", " WHERE order_date >= DATE_ADD('20151225', INTERVAL -15 HOUR)\n", " AND order_date < '20151225'\n", " ORDER BY order_date\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "f0183774-0c51-4d84-897f-0bd1ea8b6feb", "metadata": {}, "source": [ "### **QUERY 13.** Which customers have never placed an order?" ] }, { "cell_type": "code", "execution_count": null, "id": "7d3098b7-044e-42fb-bc00-60b85565e2bc", "metadata": {}, "outputs": [], "source": [ "# Do a LEFT OUTER JOIN of customer.customer_id and order.customer_id.\n", "# Customers with no order will have NULL (displayed as None) in\n", "# the second column.\n", "\n", "import pandas as pd\n", "pd.options.display.max_rows = None\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT c.customer_id AS \"Customer ID\", \n", " o.customer_id AS \"Customer ID of order\"\n", " FROM customers c\n", " LEFT OUTER JOIN orders o USING (customer_id)\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "2c5d10d3-1abd-4a49-b233-f78b6ec72699", "metadata": {}, "outputs": [], "source": [ "# Only display rows with NULL second column.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT c.customer_id AS \"Customer ID\", \n", " o.customer_id AS \"Customer ID of order\"\n", " FROM customers c\n", " LEFT OUTER JOIN orders o USING (customer_id)\n", " WHERE o.customer_id IS NULL\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "22e5e926-4e72-4317-9b6e-84afc7343a00", "metadata": {}, "outputs": [], "source": [ "# Don't show the second column.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT c.customer_id AS \"Customer ID\"\n", " FROM customers c\n", " LEFT OUTER JOIN orders o USING (customer_id)\n", " WHERE o.customer_id IS NULL\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "d1f9e050-baca-4458-badb-e38da4dd52d0", "metadata": {}, "outputs": [], "source": [ "# If you don't like doing outer joins ...\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT customer_id AS \"Customer ID\"\n", " FROM customers\n", " WHERE customer_id NOT IN (\n", " SELECT customer_id \n", " FROM orders\n", " )\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "253dab43-29f2-4a40-b166-53968a5fc76b", "metadata": {}, "outputs": [], "source": [ "# For crazy SQL programmers only!\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT customer_id AS \"Customer ID\"\n", " FROM customers c\n", " WHERE NOT EXISTS (\n", " SELECT customer_id\n", " FROM orders o\n", " WHERE o.customer_id = c.customer_id\n", " )\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "78fae5ba-9d3a-4aaf-8329-bfc8a27589f6", "metadata": {}, "source": [ "### **QUERY 14.** What is the total number of orders handled by each employee?" ] }, { "cell_type": "code", "execution_count": null, "id": "f41942c6-90de-4f1a-8d09-961e21052c9e", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT e.employee_id AS \"Employee ID\", \n", " e.last_name AS \"Last name\", \n", " e.first_name AS \"First name\", \n", " COUNT(*) AS \"Total orders\"\n", " FROM employees e\n", " JOIN orders o\n", " ON o.employee_id = e.employee_id\n", " GROUP BY e.employee_id\n", " ORDER BY e.employee_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "09d5eb89-968a-4472-820e-60a0aef79a4e", "metadata": {}, "source": [ "### **QUERY 15.** What is the total number of late orders handled by each employee? An order is late if its shipped date is not before its required date." ] }, { "cell_type": "code", "execution_count": null, "id": "7e0eed7c-6890-42af-b3bd-3a20817679eb", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT e.employee_id AS \"Employee ID\", \n", " e.last_name AS \"Last name\", \n", " e.first_name AS \"First name\", \n", " COUNT(*) AS \"Late orders\"\n", " FROM employees e\n", " JOIN orders o\n", " ON o.employee_id = e.employee_id\n", " WHERE o.shipped_date >= o.required_date\n", " GROUP BY e.employee_id\n", " ORDER BY e.employee_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "b19f1aeb-7047-431f-ae75-69201a1d35bb", "metadata": {}, "source": [ "#### Note that employee 5 had no late orders." ] }, { "cell_type": "markdown", "id": "b61f0862-6218-4f6f-94c4-87b25639cb97", "metadata": {}, "source": [ "### **QUERY 16.** For each employee, display the **total number of orders** and the **number of late orders** handled by that employee." ] }, { "cell_type": "code", "execution_count": null, "id": "b90a0af2-5984-4b77-834b-40fccb50be7d", "metadata": {}, "outputs": [], "source": [ "# Use two CTEs, one for total orders and one for late orders.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " WITH\n", " total_orders AS (\n", " \t\tSELECT e.employee_id,\n", " \t\t\t COUNT(*) AS total_orders\n", " \t\tFROM employees e\n", " \t\tJOIN orders o USING (employee_id)\n", " \t\tGROUP BY e.employee_id\n", " ),\n", " late_orders AS (\n", " \t\tSELECT e.employee_id, \n", " \t\t\t COUNT(*) AS late_orders\n", " \t\tFROM employees e\n", " \t\tJOIN orders o USING (employee_id)\n", " \t\tWHERE o.shipped_date >= o.required_date\n", " \t\tGROUP BY e.employee_id\n", " )\n", " SELECT e.employee_id AS \"Employee ID\", \n", " e.last_name AS \"Last name\", \n", " e.first_name AS \"First name\", \n", " total_orders AS \"Total orders\", \n", " late_orders AS \"Late orders\"\n", " FROM employees e\n", " JOIN total_orders total USING (employee_id)\n", " JOIN late_orders late USING (employee_id)\n", " ORDER BY e.employee_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "c9ae0387-b745-44fa-90f0-6f52cb9df257", "metadata": {}, "source": [ "#### What is wrong with this result?" ] }, { "cell_type": "code", "execution_count": null, "id": "031d0d37-8bfa-4b2b-b8a0-61f61d5e1764", "metadata": {}, "outputs": [], "source": [ "# Do a LEFT OUTER JOIN on late_orders.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " WITH\n", " total_orders AS (\n", " \t\tSELECT e.employee_id,\n", " \t\t\t COUNT(*) AS total_orders\n", " \t\tFROM employees e\n", " \t\tJOIN orders o USING (employee_id)\n", " \t\tGROUP BY e.employee_id\n", " ),\n", " late_orders AS (\n", " \t\tSELECT e.employee_id, \n", " \t\t\t COUNT(*) AS late_orders\n", " \t\tFROM employees e\n", " \t\tJOIN orders o USING (employee_id)\n", " \t\tWHERE o.shipped_date >= o.required_date\n", " \t\tGROUP BY e.employee_id\n", " )\n", " SELECT e.employee_id AS \"Employee ID\", \n", " e.last_name AS \"Last name\", \n", " e.first_name AS \"First name\", \n", " total_orders AS \"Total orders\", \n", " late_orders AS \"Late orders\"\n", " FROM employees e\n", " JOIN total_orders total USING (employee_id)\n", " LEFT JOIN late_orders late USING (employee_id)\n", " ORDER BY e.employee_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "d0faddef-cee7-4b29-9301-dadd74be5cce", "metadata": {}, "source": [ "#### Now employee 5 is included. Can we fix the NaN (not a number) in the dataframe?" ] }, { "cell_type": "code", "execution_count": null, "id": "d71d7cc9-de5d-409e-8b74-ae834af1c8c8", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " WITH\n", " total_orders AS (\n", " \t\tSELECT e.employee_id,\n", " \t\t\t COUNT(*) AS total_orders\n", " \t\tFROM employees e\n", " \t\tJOIN orders o USING (employee_id)\n", " \t\tGROUP BY e.employee_id\n", " ),\n", " late_orders AS (\n", " \t\tSELECT e.employee_id, \n", " \t\t\t COUNT(*) AS late_orders\n", " \t\tFROM employees e\n", " \t\tJOIN orders o USING (employee_id)\n", " \t\tWHERE o.shipped_date >= o.required_date\n", " \t\tGROUP BY e.employee_id\n", " )\n", " SELECT e.employee_id AS \"Employee ID\", \n", " e.last_name AS \"Last name\", \n", " e.first_name AS \"First name\", \n", " total_orders AS \"Total orders\", \n", "\t\t IF(ISNULL(late_orders), 0, late_orders) \n", "\t\t AS \"Late Orders\"\n", " FROM employees e\n", " JOIN total_orders total USING (employee_id)\n", " LEFT JOIN late_orders late USING (employee_id)\n", " ORDER BY e.employee_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "a53add4a-f84f-43cc-8eae-21654e84d293", "metadata": {}, "source": [ "### **QUERY 17.** Rate each customer by the total value of the customer’s orders (the sum of quantity*price) during 2016. A customer is\n", "- ### \"Very high\" if the total value exceeds \\\\$10,000\n", "- ### \"High\" if the total value is \\\\$5,000 to \\\\$10,000\n", "- ### \"Medium\" if the total value is \\\\$1,000 to \\\\$5,000\n", "- ### otherwise \"Low\"." ] }, { "cell_type": "code", "execution_count": null, "id": "0bcd67a0-9b90-424d-8e6b-2b2e18db7d6a", "metadata": {}, "outputs": [], "source": [ "# What is each customer's total order?\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT c.customer_id, c.company_name,\n", " \t SUM(d.quantity*d.unit_price) AS total\n", " FROM customers c\n", " JOIN orders o\n", " ON o.customer_id = c.customer_id\n", " JOIN order_details d\n", " ON d.order_id = o.order_id\n", " WHERE YEAR(o.order_date) = 2016\n", " GROUP BY c.customer_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "2eef6a22-9d81-475f-a1f0-cf78a3314465", "metadata": {}, "outputs": [], "source": [ "# Make the above query a CTE.\n", "# Use CASE to set the customer ratings.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " WITH\n", " orders_2016 AS (\n", " \t\tSELECT c.customer_id, c.company_name,\n", " \t\t\t SUM(d.quantity*d.unit_price) AS total\n", " \t\tFROM customers c\n", " \t\tJOIN orders o\n", " \t\t ON o.customer_id = c.customer_id\n", " \t\tJOIN order_details d\n", " \t\t ON d.order_id = o.order_id\n", " \t\tWHERE YEAR(o.order_date) = 2016\n", " \t\tGROUP BY c.customer_id\n", " \t)\n", " SELECT o.customer_id AS \"Customer ID\", \n", " o.company_name AS \"Company name\",\n", " CONCAT('$', FORMAT(o.total, 2)) AS \"Total order\",\n", " CASE \n", " WHEN o.total > 10000 THEN 'Very high'\n", " WHEN o.total > 5000 THEN 'High'\n", " WHEN o.total > 1000 THEN 'Medium'\n", " ELSE 'Low'\n", " END AS \"Customer rating\" \n", " FROM orders_2016 o\n", " ORDER BY o.customer_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "3ffbad13-1c2c-408f-8e72-922ab20ff1f0", "metadata": {}, "source": [ "### **QUERY 18.** Due to an accounting error, some orders of quantities 60 or more were entered more than once with **different product IDs** but with the **same quantity**. Display the order ID, product ID, and quantity of the erroneous entries." ] }, { "cell_type": "code", "execution_count": null, "id": "b9cfd356-4c15-4b19-86a5-451c6a028459", "metadata": {}, "outputs": [], "source": [ "# Which orders have quantity > 60?\n", "\n", "import pandas as pd\n", "pd.options.display.max_rows = None\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT order_id AS \"Order ID\", \n", " product_id AS \"Product ID\", \n", " quantity AS \"Ordered quantity\"\n", " FROM order_details\n", " WHERE quantity >= 60\n", " ORDER BY order_id, quantity\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "cc2aa65a-37e1-4fca-92a3-a4e95e43b719", "metadata": {}, "outputs": [], "source": [ "# Only consider groups with the same order ID and quantity,\n", "# where the group size is greater than 1.\n", "# These are the erroneous order entries.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT order_id AS \"Order ID\", \n", " quantity AS \"Ordered quantity\",\n", " COUNT(*) AS Count\n", " FROM order_details\n", " WHERE quantity >= 60\n", " GROUP BY order_id, quantity\n", " HAVING Count > 1\n", " ORDER BY order_id, Count\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "89c30710-5afb-43d5-80b1-c0457a455860", "metadata": {}, "outputs": [], "source": [ "# Let's make the above query into a CTE that returns \n", "# the order ID and quantity of the erroneous orders. \n", "# Use it as a filter to display the different \n", "# product IDs of the errorneous orders.\n", "\n", "import pandas as pd\n", "pd.options.display.max_rows = None\n", "\n", "df_query(conn,\n", " \"\"\"\n", " WITH\n", " erroneous_orders AS (\n", " \t\tSELECT order_id, quantity\n", " \t\tFROM order_details\n", " \t\tWHERE quantity >= 60\n", " \t\tGROUP BY order_id, quantity\n", " \t\tHAVING COUNT(*) > 1\n", " \t)\n", " SELECT od.order_id AS \"Erroneous order\",\n", " od.product_id AS \"Product ID\", \n", " od.quantity AS \"Ordered quantity\"\n", " FROM order_details od\n", " JOIN erroneous_orders eo\n", " ON eo.order_id = od.order_id\n", " AND eo.quantity = od.quantity\n", " ORDER BY od.order_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "c3d7b7c4-80fe-4a21-be02-464f0a2de2bd", "metadata": {}, "outputs": [], "source": [ "conn.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "62a6fa6c-94af-4b90-9945-ea7f3ee56b42", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }