{
"cells": [
{
"cell_type": "markdown",
"id": "19163f05-d595-4caa-9406-a406be482113",
"metadata": {},
"source": [
"# Assignment #5 Solutions\n",
"#### A good practice in applications is to use column aliases to give result set columns descriptive names."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4f97970b-6879-484a-a6b4-6f20d448c884",
"metadata": {},
"outputs": [],
"source": [
"from data201 import db_connection, df_query"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6e08ff77-36d4-46dd-a5e9-6e0e69edc45c",
"metadata": {},
"outputs": [],
"source": [
"conn = db_connection(config_file = 'ZAGI-sales.ini')\n",
"cursor = conn.cursor()"
]
},
{
"cell_type": "markdown",
"id": "616a9710-00e5-4bb5-95e5-fb56f78320c7",
"metadata": {},
"source": [
"### **1.** Display the region ID, region name, and number of stores in the region for all regions."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f56b0947-f3ed-4ebc-85f8-77c0f3a52763",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Region ID | \n",
" Region name | \n",
" Region store count | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" C | \n",
" Chicagoland | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" T | \n",
" Tristate | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Region ID Region name Region store count\n",
"0 C Chicagoland 2\n",
"1 T Tristate 1"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_query(conn,\n",
" \"\"\"\n",
" SELECT r.region_id AS 'Region ID',\n",
" r.region_name AS 'Region name', \n",
" COUNT(*) AS 'Region store count'\n",
" FROM region r\n",
" JOIN store s\n",
" ON s.region_id = r.region_id\n",
" GROUP BY r.region_id, r.region_name\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "10c8af26-9fb2-42d2-b783-5be273d4aabb",
"metadata": {},
"source": [
"### **2.** For each product category, display the category ID, category name, and average price of a product in the category."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7230bc4a-a2e3-4f3e-a21e-39b72030b7ed",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Category ID | \n",
" Category name | \n",
" Average product price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" CP | \n",
" Camping | \n",
" 166.666667 | \n",
"
\n",
" \n",
" 1 | \n",
" FW | \n",
" Footwear | \n",
" 58.333333 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Category ID Category name Average product price\n",
"0 CP Camping 166.666667\n",
"1 FW Footwear 58.333333"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_query(conn,\n",
" \"\"\"\n",
" SELECT c.category_id AS 'Category ID', \n",
" c.category_name AS 'Category name', \n",
" AVG(p.product_price) AS 'Average product price'\n",
" FROM product p, category c\n",
" WHERE p.category_id = c.category_id\n",
" GROUP BY c.category_id, c.category_name\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "722d1353-48c0-4c4b-920e-1b03548c2570",
"metadata": {},
"source": [
"### **3.** For each product category, display the category ID and the total number of items purchased in the category."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "cad63eb1-5f33-4b84-8896-a75c2e870503",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Category ID | \n",
" Total products purchased | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" CP | \n",
" 5.0 | \n",
"
\n",
" \n",
" 1 | \n",
" FW | \n",
" 13.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Category ID Total products purchased\n",
"0 CP 5.0\n",
"1 FW 13.0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_query(conn,\n",
" \"\"\"\n",
" SELECT p.category_id AS 'Category ID', \n",
" SUM(s.no_of_items) AS 'Total products purchased'\n",
" FROM sold_via s\n",
" JOIN product p USING (product_id)\n",
" GROUP BY p.category_id\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "dcb26e78-a6b6-479e-b505-252b2b0079df",
"metadata": {},
"source": [
"### **4.** Display the TID and the total number of items (of all products) sold within the transaction for all sales transactions whose total number of items (of all products) sold within the transaction is greater than five."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b7ccadbb-e686-4cdd-bddf-4cbb57e7eaf5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Tranaction ID | \n",
" Total products sold > 5 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" T333 | \n",
" 6.0 | \n",
"
\n",
" \n",
" 1 | \n",
" T555 | \n",
" 7.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Tranaction ID Total products sold > 5\n",
"0 T333 6.0\n",
"1 T555 7.0"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_query(conn,\n",
" \"\"\"\n",
" SELECT tid AS 'Tranaction ID', \n",
" SUM(no_of_items) AS 'Total products sold > 5'\n",
" FROM sold_via\n",
" GROUP BY tid\n",
" HAVING SUM(no_of_items) > 5\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f83ae713-0b17-4da3-9d01-678cde05f82b",
"metadata": {},
"source": [
"### **5.** Display the product IDs and the product names of the cheapest products.\n",
"#### We can't simply sort the procts by price in ascending order and then use `LIMIT 1` to pick the product at the top of the order. There can be more than one product with the lowest price. Therefore, we need the nested `SELECT` to first determine the lowest price, and the outer `SELECT` matches products to that price."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "59df5153-187f-4960-8112-393b6c1ceff8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product ID | \n",
" Cheapest products | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3X3 | \n",
" Cosy Sock | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product ID Cheapest products\n",
"0 3X3 Cosy Sock"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Product ID', \n",
" product_name AS 'Cheapest products'\n",
" FROM product\n",
" WHERE product_price = (SELECT MIN(product_price)\n",
" FROM product)\n",
" \"\"\"\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d191eb2d-9000-4682-aeef-2273f4d0258d",
"metadata": {},
"source": [
"### **6.** Display the product ID for the product that has been sold the most (i.e., that has been sold in the highest quantity).\n",
"\n",
"#### The following query works with most versions of SQL. The nested SELECT in the HAVING clause selects the highest number of product sales."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6077a6fc-3447-4da5-8ca5-c148e9e8b827",
"metadata": {},
"outputs": [
{
"ename": "DatabaseError",
"evalue": "Execution failed on sql '\n SELECT product_id AS 'Highest selling products'\n FROM sold_via\n GROUP BY product_id\n HAVING SUM(no_of_items) = (SELECT MAX(SUM(no_of_items))\n FROM sold_via\n GROUP BY product_id)\n ': 1111 (HY000): Invalid use of group function",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mDatabaseError\u001b[0m Traceback (most recent call last)",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/pandas/io/sql.py:2674\u001b[0m, in \u001b[0;36mSQLiteDatabase.execute\u001b[0;34m(self, sql, params)\u001b[0m\n\u001b[1;32m 2673\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m-> 2674\u001b[0m cur\u001b[38;5;241m.\u001b[39mexecute(sql, \u001b[38;5;241m*\u001b[39margs)\n\u001b[1;32m 2675\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m cur\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/mysql/connector/cursor.py:537\u001b[0m, in \u001b[0;36mMySQLCursor.execute\u001b[0;34m(self, operation, params, multi)\u001b[0m\n\u001b[1;32m 536\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 537\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_handle_result(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_connection\u001b[38;5;241m.\u001b[39mcmd_query(stmt))\n\u001b[1;32m 538\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m InterfaceError \u001b[38;5;28;01mas\u001b[39;00m err:\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/mysql/connector/opentelemetry/context_propagation.py:97\u001b[0m, in \u001b[0;36mwith_context_propagation..wrapper\u001b[0;34m(cnx, *args, **kwargs)\u001b[0m\n\u001b[1;32m 96\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m OTEL_ENABLED \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m cnx\u001b[38;5;241m.\u001b[39motel_context_propagation:\n\u001b[0;32m---> 97\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m method(cnx, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 99\u001b[0m current_span \u001b[38;5;241m=\u001b[39m trace\u001b[38;5;241m.\u001b[39mget_current_span()\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/mysql/connector/connection.py:859\u001b[0m, in \u001b[0;36mMySQLConnection.cmd_query\u001b[0;34m(self, query, raw, buffered, raw_as_string)\u001b[0m\n\u001b[1;32m 858\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 859\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_handle_result(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_send_cmd(ServerCmd\u001b[38;5;241m.\u001b[39mQUERY, query))\n\u001b[1;32m 860\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m ProgrammingError \u001b[38;5;28;01mas\u001b[39;00m err:\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/mysql/connector/connection.py:635\u001b[0m, in \u001b[0;36mMySQLConnection._handle_result\u001b[0;34m(self, packet)\u001b[0m\n\u001b[1;32m 634\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m packet[\u001b[38;5;241m4\u001b[39m] \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m255\u001b[39m:\n\u001b[0;32m--> 635\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m get_exception(packet)\n\u001b[1;32m 637\u001b[0m \u001b[38;5;66;03m# We have a text result set\u001b[39;00m\n",
"\u001b[0;31mDatabaseError\u001b[0m: 1111 (HY000): Invalid use of group function",
"\nThe above exception was the direct cause of the following exception:\n",
"\u001b[0;31mDatabaseError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[8], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m df_query(conn,\n\u001b[1;32m 2\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;124;03m SELECT product_id AS 'Highest selling products'\u001b[39;00m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;124;03m FROM sold_via\u001b[39;00m\n\u001b[1;32m 5\u001b[0m \u001b[38;5;124;03m GROUP BY product_id\u001b[39;00m\n\u001b[1;32m 6\u001b[0m \u001b[38;5;124;03m HAVING SUM(no_of_items) = (SELECT MAX(SUM(no_of_items))\u001b[39;00m\n\u001b[1;32m 7\u001b[0m \u001b[38;5;124;03m FROM sold_via\u001b[39;00m\n\u001b[1;32m 8\u001b[0m \u001b[38;5;124;03m GROUP BY product_id)\u001b[39;00m\n\u001b[1;32m 9\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[1;32m 10\u001b[0m )\n",
"File \u001b[0;32m~/~mak/DATA201/ASSIGNMENTS/5/solutions/data201.py:68\u001b[0m, in \u001b[0;36mdf_query\u001b[0;34m(conn, sql)\u001b[0m\n\u001b[1;32m 65\u001b[0m warnings\u001b[38;5;241m.\u001b[39msimplefilter(action\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mignore\u001b[39m\u001b[38;5;124m'\u001b[39m, category\u001b[38;5;241m=\u001b[39m\u001b[38;5;167;01mUserWarning\u001b[39;00m)\n\u001b[1;32m 67\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m---> 68\u001b[0m df \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mread_sql_query(sql, conn)\n\u001b[1;32m 69\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m df \n\u001b[1;32m 70\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m Error \u001b[38;5;28;01mas\u001b[39;00m e:\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/pandas/io/sql.py:526\u001b[0m, in \u001b[0;36mread_sql_query\u001b[0;34m(sql, con, index_col, coerce_float, params, parse_dates, chunksize, dtype, dtype_backend)\u001b[0m\n\u001b[1;32m 523\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m dtype_backend \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m lib\u001b[38;5;241m.\u001b[39mno_default\n\u001b[1;32m 525\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m pandasSQL_builder(con) \u001b[38;5;28;01mas\u001b[39;00m pandas_sql:\n\u001b[0;32m--> 526\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m pandas_sql\u001b[38;5;241m.\u001b[39mread_query(\n\u001b[1;32m 527\u001b[0m sql,\n\u001b[1;32m 528\u001b[0m index_col\u001b[38;5;241m=\u001b[39mindex_col,\n\u001b[1;32m 529\u001b[0m params\u001b[38;5;241m=\u001b[39mparams,\n\u001b[1;32m 530\u001b[0m coerce_float\u001b[38;5;241m=\u001b[39mcoerce_float,\n\u001b[1;32m 531\u001b[0m parse_dates\u001b[38;5;241m=\u001b[39mparse_dates,\n\u001b[1;32m 532\u001b[0m chunksize\u001b[38;5;241m=\u001b[39mchunksize,\n\u001b[1;32m 533\u001b[0m dtype\u001b[38;5;241m=\u001b[39mdtype,\n\u001b[1;32m 534\u001b[0m dtype_backend\u001b[38;5;241m=\u001b[39mdtype_backend,\n\u001b[1;32m 535\u001b[0m )\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/pandas/io/sql.py:2738\u001b[0m, in \u001b[0;36mSQLiteDatabase.read_query\u001b[0;34m(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)\u001b[0m\n\u001b[1;32m 2727\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mread_query\u001b[39m(\n\u001b[1;32m 2728\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 2729\u001b[0m sql,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2736\u001b[0m dtype_backend: DtypeBackend \u001b[38;5;241m|\u001b[39m Literal[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnumpy\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnumpy\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 2737\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DataFrame \u001b[38;5;241m|\u001b[39m Iterator[DataFrame]:\n\u001b[0;32m-> 2738\u001b[0m cursor \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mexecute(sql, params)\n\u001b[1;32m 2739\u001b[0m columns \u001b[38;5;241m=\u001b[39m [col_desc[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01mfor\u001b[39;00m col_desc \u001b[38;5;129;01min\u001b[39;00m cursor\u001b[38;5;241m.\u001b[39mdescription]\n\u001b[1;32m 2741\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m chunksize \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/pandas/io/sql.py:2686\u001b[0m, in \u001b[0;36mSQLiteDatabase.execute\u001b[0;34m(self, sql, params)\u001b[0m\n\u001b[1;32m 2683\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m ex \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01minner_exc\u001b[39;00m\n\u001b[1;32m 2685\u001b[0m ex \u001b[38;5;241m=\u001b[39m DatabaseError(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mExecution failed on sql \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00msql\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mexc\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m-> 2686\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m ex \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mexc\u001b[39;00m\n",
"\u001b[0;31mDatabaseError\u001b[0m: Execution failed on sql '\n SELECT product_id AS 'Highest selling products'\n FROM sold_via\n GROUP BY product_id\n HAVING SUM(no_of_items) = (SELECT MAX(SUM(no_of_items))\n FROM sold_via\n GROUP BY product_id)\n ': 1111 (HY000): Invalid use of group function"
]
}
],
"source": [
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Highest selling products'\n",
" FROM sold_via\n",
" GROUP BY product_id\n",
" HAVING SUM(no_of_items) = (SELECT MAX(SUM(no_of_items))\n",
" FROM sold_via\n",
" GROUP BY product_id)\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "745fa170-aae5-4c7c-8e9e-0098f26964f6",
"metadata": {},
"source": [
"#### Could the problem here be with the nested `SELECT`?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "632bdaff-a115-4e16-8725-231d511bbacc",
"metadata": {},
"outputs": [
{
"ename": "DatabaseError",
"evalue": "Execution failed on sql '\n SELECT MAX(SUM(no_of_items))\n FROM sold_via\n GROUP BY product_id|\n ': 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 3",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mProgrammingError\u001b[0m Traceback (most recent call last)",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/pandas/io/sql.py:2674\u001b[0m, in \u001b[0;36mSQLiteDatabase.execute\u001b[0;34m(self, sql, params)\u001b[0m\n\u001b[1;32m 2673\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m-> 2674\u001b[0m cur\u001b[38;5;241m.\u001b[39mexecute(sql, \u001b[38;5;241m*\u001b[39margs)\n\u001b[1;32m 2675\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m cur\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/mysql/connector/cursor.py:537\u001b[0m, in \u001b[0;36mMySQLCursor.execute\u001b[0;34m(self, operation, params, multi)\u001b[0m\n\u001b[1;32m 536\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 537\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_handle_result(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_connection\u001b[38;5;241m.\u001b[39mcmd_query(stmt))\n\u001b[1;32m 538\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m InterfaceError \u001b[38;5;28;01mas\u001b[39;00m err:\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/mysql/connector/opentelemetry/context_propagation.py:97\u001b[0m, in \u001b[0;36mwith_context_propagation..wrapper\u001b[0;34m(cnx, *args, **kwargs)\u001b[0m\n\u001b[1;32m 96\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m OTEL_ENABLED \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m cnx\u001b[38;5;241m.\u001b[39motel_context_propagation:\n\u001b[0;32m---> 97\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m method(cnx, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 99\u001b[0m current_span \u001b[38;5;241m=\u001b[39m trace\u001b[38;5;241m.\u001b[39mget_current_span()\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/mysql/connector/connection.py:859\u001b[0m, in \u001b[0;36mMySQLConnection.cmd_query\u001b[0;34m(self, query, raw, buffered, raw_as_string)\u001b[0m\n\u001b[1;32m 858\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 859\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_handle_result(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_send_cmd(ServerCmd\u001b[38;5;241m.\u001b[39mQUERY, query))\n\u001b[1;32m 860\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m ProgrammingError \u001b[38;5;28;01mas\u001b[39;00m err:\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/mysql/connector/connection.py:635\u001b[0m, in \u001b[0;36mMySQLConnection._handle_result\u001b[0;34m(self, packet)\u001b[0m\n\u001b[1;32m 634\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m packet[\u001b[38;5;241m4\u001b[39m] \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m255\u001b[39m:\n\u001b[0;32m--> 635\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m get_exception(packet)\n\u001b[1;32m 637\u001b[0m \u001b[38;5;66;03m# We have a text result set\u001b[39;00m\n",
"\u001b[0;31mProgrammingError\u001b[0m: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 3",
"\nThe above exception was the direct cause of the following exception:\n",
"\u001b[0;31mDatabaseError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[9], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m df_query(conn,\n\u001b[1;32m 2\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;124;03m SELECT MAX(SUM(no_of_items))\u001b[39;00m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;124;03m FROM sold_via\u001b[39;00m\n\u001b[1;32m 5\u001b[0m \u001b[38;5;124;03m GROUP BY product_id|\u001b[39;00m\n\u001b[1;32m 6\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[1;32m 7\u001b[0m )\n",
"File \u001b[0;32m~/~mak/DATA201/ASSIGNMENTS/5/solutions/data201.py:68\u001b[0m, in \u001b[0;36mdf_query\u001b[0;34m(conn, sql)\u001b[0m\n\u001b[1;32m 65\u001b[0m warnings\u001b[38;5;241m.\u001b[39msimplefilter(action\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mignore\u001b[39m\u001b[38;5;124m'\u001b[39m, category\u001b[38;5;241m=\u001b[39m\u001b[38;5;167;01mUserWarning\u001b[39;00m)\n\u001b[1;32m 67\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m---> 68\u001b[0m df \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mread_sql_query(sql, conn)\n\u001b[1;32m 69\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m df \n\u001b[1;32m 70\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m Error \u001b[38;5;28;01mas\u001b[39;00m e:\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/pandas/io/sql.py:526\u001b[0m, in \u001b[0;36mread_sql_query\u001b[0;34m(sql, con, index_col, coerce_float, params, parse_dates, chunksize, dtype, dtype_backend)\u001b[0m\n\u001b[1;32m 523\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m dtype_backend \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m lib\u001b[38;5;241m.\u001b[39mno_default\n\u001b[1;32m 525\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m pandasSQL_builder(con) \u001b[38;5;28;01mas\u001b[39;00m pandas_sql:\n\u001b[0;32m--> 526\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m pandas_sql\u001b[38;5;241m.\u001b[39mread_query(\n\u001b[1;32m 527\u001b[0m sql,\n\u001b[1;32m 528\u001b[0m index_col\u001b[38;5;241m=\u001b[39mindex_col,\n\u001b[1;32m 529\u001b[0m params\u001b[38;5;241m=\u001b[39mparams,\n\u001b[1;32m 530\u001b[0m coerce_float\u001b[38;5;241m=\u001b[39mcoerce_float,\n\u001b[1;32m 531\u001b[0m parse_dates\u001b[38;5;241m=\u001b[39mparse_dates,\n\u001b[1;32m 532\u001b[0m chunksize\u001b[38;5;241m=\u001b[39mchunksize,\n\u001b[1;32m 533\u001b[0m dtype\u001b[38;5;241m=\u001b[39mdtype,\n\u001b[1;32m 534\u001b[0m dtype_backend\u001b[38;5;241m=\u001b[39mdtype_backend,\n\u001b[1;32m 535\u001b[0m )\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/pandas/io/sql.py:2738\u001b[0m, in \u001b[0;36mSQLiteDatabase.read_query\u001b[0;34m(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)\u001b[0m\n\u001b[1;32m 2727\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mread_query\u001b[39m(\n\u001b[1;32m 2728\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 2729\u001b[0m sql,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2736\u001b[0m dtype_backend: DtypeBackend \u001b[38;5;241m|\u001b[39m Literal[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnumpy\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnumpy\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 2737\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DataFrame \u001b[38;5;241m|\u001b[39m Iterator[DataFrame]:\n\u001b[0;32m-> 2738\u001b[0m cursor \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mexecute(sql, params)\n\u001b[1;32m 2739\u001b[0m columns \u001b[38;5;241m=\u001b[39m [col_desc[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;28;01mfor\u001b[39;00m col_desc \u001b[38;5;129;01min\u001b[39;00m cursor\u001b[38;5;241m.\u001b[39mdescription]\n\u001b[1;32m 2741\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m chunksize \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n",
"File \u001b[0;32m/Applications/anaconda3/lib/python3.12/site-packages/pandas/io/sql.py:2686\u001b[0m, in \u001b[0;36mSQLiteDatabase.execute\u001b[0;34m(self, sql, params)\u001b[0m\n\u001b[1;32m 2683\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m ex \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01minner_exc\u001b[39;00m\n\u001b[1;32m 2685\u001b[0m ex \u001b[38;5;241m=\u001b[39m DatabaseError(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mExecution failed on sql \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00msql\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mexc\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m-> 2686\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m ex \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mexc\u001b[39;00m\n",
"\u001b[0;31mDatabaseError\u001b[0m: Execution failed on sql '\n SELECT MAX(SUM(no_of_items))\n FROM sold_via\n GROUP BY product_id|\n ': 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 3"
]
}
],
"source": [
"df_query(conn,\n",
" \"\"\"\n",
" SELECT MAX(SUM(no_of_items))\n",
" FROM sold_via\n",
" GROUP BY product_id|\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "382563b3-2413-4540-95dd-a9821f3a62f0",
"metadata": {},
"source": [
"#### Yes, that's where the problem is. MySQL apparently doesn't like the combination MAX(SUM(no_of_items)) in the SELECT. No, I don't know why MySQL is missing certain language features. \n",
"\n",
"#### So let's find the highest number of product sales in another way. First, what are the number of sales per product?"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b6094b67-a127-4f82-bbcd-8aac1aefcf48",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product ID | \n",
" Total number of sales | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1X1 | \n",
" 2.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2X2 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 3X3 | \n",
" 5.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 4X4 | \n",
" 5.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 5X5 | \n",
" 2.0 | \n",
"
\n",
" \n",
" 5 | \n",
" 6X6 | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product ID Total number of sales\n",
"0 1X1 2.0\n",
"1 2X2 3.0\n",
"2 3X3 5.0\n",
"3 4X4 5.0\n",
"4 5X5 2.0\n",
"5 6X6 1.0"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Query A\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Product ID', \n",
" SUM(no_of_items) AS 'Total number of sales'\n",
" FROM sold_via\n",
" GROUP BY product_id\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b82f9f2e-c939-4acd-954c-e5081197269a",
"metadata": {},
"source": [
"#### We're going to nest the above `SELECT` in an outer `SELECT`'s `FROM` clause. A `FROM` cause must contain table names. The nested `SELECT` will have the role of a **virtual table**, and so we'll give it the name `total_sales_table`. Also, we'll give its total number of sales column the valid column name `total_sales`. Then the outer `SELECT` in Query C below will be able to select the `MAX` from the `total_sales` column of the virtual table."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "63120a52-476f-42f1-9d3f-b377db6907a1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Highest total sales | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 5.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Highest total sales\n",
"0 5.0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Query B\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT MAX(total_sales) AS 'Highest total sales'\n",
" FROM (SELECT SUM(no_of_items) AS total_sales\n",
" FROM sold_via\n",
" GROUP BY product_id) \n",
" AS total_sales_table\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ea34c9e5-5c95-4289-bcf0-011c86436926",
"metadata": {},
"source": [
"#### Another way to obtain the highest product sales *number* is to sort the result of Query A in descending order and then take the top row.\n",
"\n",
"#### Note that if you rename a column with an alias, the other parts of the query must use that alias, as in the `ORDER BY`.\n",
"\n",
"#### **WARNING:** `ORDER BY` fails to sort if the alias is a string."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4add9b82-a515-4284-8c3d-d46bf948d6b6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" highest_total_sales | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 5.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" highest_total_sales\n",
"0 5.0"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Query B'\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT SUM(no_of_items) AS highest_total_sales\n",
" FROM sold_via\n",
" GROUP BY product_id\n",
" ORDER BY highest_total_sales DESC\n",
" LIMIT 1\n",
" \"\"\"\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "978f1966-1213-4084-976c-45642979d0c8",
"metadata": {},
"source": [
"#### In either case, since we have a `SELECT` command that returns the highest sales, we can use it to select the products whose total sales matches the highest sales. We nest the `SELECT` command in the `HAVING` clause of the outer `SELECT`.\n",
"\n",
"#### Query C below combines Query A and Query B."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "5e9d4dc1-8b65-46e8-9a25-f4dabde83ef9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Highest selling products | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3X3 | \n",
"
\n",
" \n",
" 1 | \n",
" 4X4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Highest selling products\n",
"0 3X3\n",
"1 4X4"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Query C = combined Query A and Query B\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Highest selling products'\n",
" FROM sold_via\n",
" GROUP BY product_id\n",
" HAVING SUM(no_of_items) = \n",
" (SELECT MAX(total_sales) AS highest_sales\n",
" FROM (SELECT sum(no_of_items) AS total_sales\n",
" FROM sold_via\n",
" GROUP BY product_id) \n",
" AS total_sales_table)\n",
" ORDER BY product_id\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "38a9f83a-a061-496f-bd3f-7a64a28d61da",
"metadata": {},
"source": [
"#### Query C' below combines Query A and Query B'."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "02d76ad1-e726-43d1-90c2-9d4162a4a93b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Highest selling products | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3X3 | \n",
"
\n",
" \n",
" 1 | \n",
" 4X4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Highest selling products\n",
"0 3X3\n",
"1 4X4"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Query C' = combined Query A and Query B'\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Highest selling products'\n",
" FROM sold_via\n",
" GROUP BY product_id\n",
" HAVING SUM(no_of_items) = \n",
" (SELECT SUM(no_of_items) AS highest_sales\n",
" FROM sold_via\n",
" GROUP BY product_id\n",
" ORDER BY highest_sales DESC\n",
" LIMIT 1)\n",
" ORDER BY product_id\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "3a9ac60d-8c3c-4ee8-b6fe-9c21a67f7041",
"metadata": {},
"source": [
"#### **7.** Rewrite the following query using a join instead of the nested query. Your rewritten query should produce the same results.\n",
"\n",
"#### _For each product that has more than three items sold within all sales transactions, retrieve the product ID, product name, and product price._"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "7977927b-a6e8-4b84-90be-1dadfd72d772",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product ID | \n",
" Product name | \n",
" Price for more than three sold | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3X3 | \n",
" Cosy Sock | \n",
" 15.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 4X4 | \n",
" Dura Boot | \n",
" 90.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product ID Product name Price for more than three sold\n",
"0 3X3 Cosy Sock 15.0\n",
"1 4X4 Dura Boot 90.0"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Nested query\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Product ID',\n",
" product_name AS 'Product name', \n",
" product_price AS 'Price for more than three sold'\n",
" FROM product\n",
" WHERE product_id IN (SELECT product_id\n",
" FROM sold_via\n",
" GROUP BY product_id\n",
" HAVING SUM(no_of_items) > 3)\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "d952aa59-e775-4f46-9986-334d410c23bd",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product ID | \n",
" Product name | \n",
" Price for more than three sold | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3X3 | \n",
" Cosy Sock | \n",
" 15.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 4X4 | \n",
" Dura Boot | \n",
" 90.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product ID Product name Price for more than three sold\n",
"0 3X3 Cosy Sock 15.0\n",
"1 4X4 Dura Boot 90.0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Rewritten using JOIN\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Product ID',\n",
" product_name AS 'Product name', \n",
" product_price AS 'Price for more than three sold'\n",
" FROM product\n",
" JOIN sold_via USING (product_id)\n",
" GROUP BY product_id, product_name, product_price\n",
" HAVING SUM(no_of_items) > 3\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b54633cd-cd35-4c2d-a955-ae04761d79b5",
"metadata": {},
"source": [
"#### **8.** Rewrite the following query using a join instead of the nested query. Your rewritten query should produce the same results.\n",
"\n",
"#### _For each product whose items were sold in more than one sales transaction, retrieve the product id, product name and product price._"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "816be4ae-bc77-4845-80f9-43ce4422c12c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product ID | \n",
" Product name | \n",
" Prices of products sold in more than one transaction | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1X1 | \n",
" Zzz Bag | \n",
" 100.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2X2 | \n",
" Easy Boot | \n",
" 70.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 4X4 | \n",
" Dura Boot | \n",
" 90.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product ID Product name \\\n",
"0 1X1 Zzz Bag \n",
"1 2X2 Easy Boot \n",
"2 4X4 Dura Boot \n",
"\n",
" Prices of products sold in more than one transaction \n",
"0 100.0 \n",
"1 70.0 \n",
"2 90.0 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Nested query\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Product ID',\n",
" product_name AS 'Product name', \n",
" product_price AS \n",
" 'Prices of products sold in more than one transaction'\n",
" FROM product\n",
" WHERE product_id IN (SELECT product_id\n",
" FROM sold_via\n",
" GROUP BY product_id\n",
" HAVING COUNT(product_id) > 1)\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "9f0edc5e-94a3-4f58-aac1-c16513ca81dd",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Product ID | \n",
" Product name | \n",
" Prices of products sold in more than one transaction | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1X1 | \n",
" Zzz Bag | \n",
" 100.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2X2 | \n",
" Easy Boot | \n",
" 70.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 4X4 | \n",
" Dura Boot | \n",
" 90.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Product ID Product name \\\n",
"0 1X1 Zzz Bag \n",
"1 2X2 Easy Boot \n",
"2 4X4 Dura Boot \n",
"\n",
" Prices of products sold in more than one transaction \n",
"0 100.0 \n",
"1 70.0 \n",
"2 90.0 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Rewritten using JOIN\n",
"\n",
"df_query(conn,\n",
" \"\"\"\n",
" SELECT product_id AS 'Product ID',\n",
" product_name AS 'Product name', \n",
" product_price AS \n",
" 'Prices of products sold in more than one transaction'\n",
" FROM product\n",
" JOIN sold_via USING (product_id)\n",
" GROUP BY product_id, product_name, product_price\n",
" HAVING COUNT(tid) > 1\n",
" \"\"\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "81ee622b-ad25-44c0-a335-76bae156c6ae",
"metadata": {},
"outputs": [],
"source": [
"cursor.close()\n",
"conn.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ede0e1bf-ac03-44b5-9a61-94f2ec458bef",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}