{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "e914e4b2-5382-4ddd-b1cf-6ad94a8c1834", "metadata": {}, "outputs": [], "source": [ "from data201 import db_connection, df_query" ] }, { "cell_type": "code", "execution_count": null, "id": "b32609e1-9a35-400f-8a30-b398cfaccc0a", "metadata": {}, "outputs": [], "source": [ "conn = db_connection(config_file = 'northwind.ini')" ] }, { "cell_type": "markdown", "id": "50d89327-e93e-4375-a365-03a306c20fac", "metadata": {}, "source": [ "# Single-Table Queries" ] }, { "cell_type": "markdown", "id": "5ed61c96-12a4-4387-8c2a-5ffaa758f8e0", "metadata": {}, "source": [ "### **QUERY 1.** Display the product ID and name for those products whose name includes “queso”." ] }, { "cell_type": "code", "execution_count": null, "id": "af35bedd-c024-41b7-ab1d-76b4732d596e", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT product_id, product_name\n", " FROM products\n", " WHERE product_name LIKE '%queso%'\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "5de5f135-9818-4389-b4c0-084a386c438e", "metadata": {}, "source": [ "### **QUERY 2.** Display all orders from any Latin American country”." ] }, { "cell_type": "code", "execution_count": null, "id": "dff232dd-7da0-40ba-93b5-9114225e14e3", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT order_id, customer_id, ship_country\n", " FROM orders\n", " WHERE ship_country IN \n", " ('Brazil', 'Mexico', 'Argentina', 'Venezuela')\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "f1d53d81-5a8a-464f-955f-88831738773d", "metadata": {}, "source": [ "### **QUERY 3.** For each line item in order details, display the order ID, product ID, unit price, quantity, and total price (= unit price * quantity). Format the monetary amount." ] }, { "cell_type": "code", "execution_count": null, "id": "d556501f-c196-4f70-bb8f-beb2802bb5e7", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT order_id AS \"Order ID\", \n", " product_id AS \"Product ID\", \n", " CONCAT('$', FORMAT(unit_price,2)) AS \"Unit price\", \n", " quantity AS \"Quantity\", \n", " CONCAT('$', FORMAT(unit_price*quantity, 2)) AS \"Total price\"\n", " FROM order_details\n", " ORDER BY order_id, product_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "bffcada4-82f7-471b-8480-7ff354aef585", "metadata": {}, "source": [ "### **QUERY 4.** When was the first order?" ] }, { "cell_type": "code", "execution_count": null, "id": "0c118a97-e3a6-41ad-9c60-8afda4dcd52b", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT MIN(order_date) AS \"First order\"\n", " FROM orders\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "a204f1aa-8593-4808-831c-23e148659272", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT order_date AS \"First order\"\n", " FROM orders\n", " ORDER BY order_date \n", " LIMIT 1\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "d0f36d33-d2ad-4d8e-8d23-9230df701084", "metadata": {}, "source": [ "### **QUERY 5.** Northwind has customers in which countries and how many in each country? Sort the results alphabetically." ] }, { "cell_type": "code", "execution_count": null, "id": "5039df75-dfe0-4aec-afa3-a7b08b349468", "metadata": {}, "outputs": [], "source": [ "# Which countries?\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT country AS \"Country\",\n", " company_name AS \"Customer name\"\n", " FROM customers\n", " ORDER BY country ASC\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "b328e80a-f7e6-4560-80c1-7480bc184cb2", "metadata": {}, "outputs": [], "source": [ "# Count the customers in each country.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT country AS \"Country\",\n", " COUNT(*) AS \"Customer count\"\n", " FROM customers\n", " GROUP BY country\n", " ORDER BY country ASC\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "a1b672e8-aa9a-4217-af9d-522ec6ac4b0e", "metadata": {}, "source": [ "### **QUERY 6.** What is the count of customers in each city? Show the country, city, and count. Sort by count in descending order and then alphabetically by country and city in ascending order." ] }, { "cell_type": "code", "execution_count": null, "id": "95091a3a-aa52-4a60-bece-50d7f7745c8f", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT country AS \"Country\", \n", " city AS \"City\", \n", " COUNT(*) AS \"Total customer count\"\n", " FROM customers\n", " GROUP BY country, city\n", " ORDER BY COUNT(*) DESC, \n", " country ASC, \n", " city ASC\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "872c55dd-539c-4b26-8b87-e1be0a572380", "metadata": {}, "source": [ "### **QUERY 7.** Display each customer’s ID, company name, and region. Sort alphabetically by region. Customers who have no recorded region should be sorted at the bottom by customer ID." ] }, { "cell_type": "code", "execution_count": null, "id": "f577dc13-2e4a-4d6a-ac37-f2a97447d418", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT customer_id AS \"Customer ID\", \n", " company_name AS \"Company name\", \n", " region AS \"Region\"\n", " FROM customers\n", " ORDER BY region, customer_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "839de9c3-7572-4c6d-a6d7-5c215f7290ee", "metadata": {}, "source": [ "#### Oops! Customers with no regions (None) are at the top. Let's use a trick to move them to the bottom." ] }, { "cell_type": "code", "execution_count": null, "id": "b1a9f060-27fa-4b1b-9525-572478242ec7", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT customer_id AS \"Customer ID\", \n", " company_name AS \"Company name\", \n", " region AS \"Region\",\n", " IF(region IS NULL, 1, 0) AS region_index\n", " FROM customers\n", " ORDER BY region_index, region, customer_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "2529a32e-6a0f-4e25-847f-e708f9765f9f", "metadata": {}, "source": [ "#### We don't want to see the region index. We can move it to the ORDER BY clause. Also, let's replace the Nones by blanks." ] }, { "cell_type": "code", "execution_count": null, "id": "35b6226b-42bb-4274-97b4-bcaca5085cee", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "pd.options.display.max_rows = None\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT customer_id AS \"Customer ID\", \n", " company_name AS \"Company name\", \n", " IF(region IS NULL, \"\", region) AS \"Region\"\n", " FROM customers\n", " ORDER BY IF(region IS NULL, 1, 0), \n", " region, customer_id\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "ac3ad827-edc8-4b24-ab9d-3d50eccfffca", "metadata": {}, "source": [ "### **QUERY 8.** In the year 2015, which three countries had the highest average freight charges? Sort the three in descending order." ] }, { "cell_type": "code", "execution_count": null, "id": "db3f05a1-1ff6-40bc-9215-1a66250950a9", "metadata": {}, "outputs": [], "source": [ "# First, let's see what all the averages are.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT ship_country AS \"Ship country\",\n", " AVG(freight) AS \"Average freight\"\n", " FROM orders\n", " WHERE YEAR(order_date) = 2015\n", " GROUP BY ship_country\n", " ORDER BY AVG(freight) DESC\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "bec86812-4bda-43ea-aa1e-a3a3b038e9b8", "metadata": {}, "outputs": [], "source": [ "# Then we'll take the top 3.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT ship_country AS \"Ship country\",\n", " AVG(freight) AS \"Average freight\"\n", " FROM orders\n", " WHERE YEAR(order_date) = 2015\n", " GROUP BY ship_country\n", " ORDER BY AVG(freight) DESC\n", " LIMIT 3\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "5fb6ad27-2822-4598-aacf-e8537f819ab9", "metadata": {}, "source": [ "#### Why is this query wrong?" ] }, { "cell_type": "code", "execution_count": null, "id": "9ea67b5d-43c4-439a-b6ef-7ae716158fc7", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT ship_country AS \"Ship country\",\n", " AVG(freight) AS \"Average freight\"\n", " FROM orders\n", " WHERE order_date BETWEEN '20150101' AND '20151231'\n", " GROUP BY ship_country\n", " ORDER BY AVG(freight) DESC\n", " LIMIT 3\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "22311966-0112-4a39-a310-344271f56508", "metadata": {}, "source": [ "#### Why did we lose France?" ] }, { "cell_type": "code", "execution_count": null, "id": "c91097e2-d314-43d4-a1ff-5eee302eac38", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT order_date AS \"Order date\", \n", " freight AS \"Freight\", \n", " ship_country AS \"Ship country\"\n", " FROM orders\n", " WHERE ship_country = 'France'\n", " AND YEAR(order_date) = 2015\n", " ORDER BY order_date\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "e6ab88eb-2707-4949-a9e6-ad200908cc5f", "metadata": {}, "source": [ "#### Look closely at the last row above!" ] }, { "cell_type": "markdown", "id": "cd9dfc6b-1b25-41bd-89e2-6e3ba300dfcf", "metadata": {}, "source": [ "### **QUERY 9.** Randomly pick 5 customers for a survey. Show the customer ID and company name. There should be a different set of customers picked each time." ] }, { "cell_type": "code", "execution_count": null, "id": "3e6ab322-16c7-4f1f-a91a-d6491173a4ec", "metadata": {}, "outputs": [], "source": [ "# Sort randomly, then pick the top 5.\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT customer_id AS \"Customer ID\", \n", " company_name AS \"Company name\"\n", " FROM customers\n", " ORDER BY RAND()\n", " LIMIT 5\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "6cb8a50c-2d7b-4384-9713-43d431d520af", "metadata": {}, "source": [ "### **QUERY 10.** Randomly pick around 10% of the customers for a survey. Show the customer ID and company name. There should be a different set of customers picked each time.\n", "#### **TIP:** Function RAND generates *uniformly distributed* random values." ] }, { "cell_type": "code", "execution_count": null, "id": "840f517a-30bd-40fe-97ae-0607f385aee5", "metadata": {}, "outputs": [], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT customer_id, company_name\n", " FROM customers\n", " WHERE RAND() < 0.1\n", " ORDER BY RAND()\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "c3d7b7c4-80fe-4a21-be02-464f0a2de2bd", "metadata": {}, "outputs": [], "source": [ "conn.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "963d05bd-5cec-4130-ac52-268821d00951", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }