{ "cells": [ { "cell_type": "markdown", "id": "7df55037-7513-460d-80b1-fd897c12eb99", "metadata": {}, "source": [ "# Stored procedure `pivot()` explanation\n", "#### The how and why of creating a new simple pivoted table from a given base table." ] }, { "cell_type": "code", "execution_count": 1, "id": "9394a166-f603-46f4-87f1-95fefe13aca9", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from data201 import db_connection, df_query" ] }, { "cell_type": "code", "execution_count": 2, "id": "d506717d-48c5-4afc-8af7-37eb18574cec", "metadata": {}, "outputs": [], "source": [ "conn = db_connection(config_file = 'pivot_test.ini')\n", "cursor = conn.cursor()" ] }, { "cell_type": "code", "execution_count": 3, "id": "8ccbdeeb-2dae-4b9e-b049-86a70aa02360", "metadata": {}, "outputs": [], "source": [ "pd.set_option('display.max_colwidth', 1000)" ] }, { "cell_type": "markdown", "id": "444500ed-f1e5-4b6d-a8de-0148841e00db", "metadata": {}, "source": [ "### Stored procedure parameters:\n", "- #### `database_name`: the name of the database containing the table to pivot\n", "- #### `base_table_name`: the name of the table to pivot\n", "- #### `base_primary_key_name`: the name of the primary key column of the base table\n", "- #### `pivot_primary_key_name`: the desired name of the primary key column of the pivoted table\n", "\n", "#### The name of the created pivot table will be the base table name with `_pivoted` appended. \n", "```\n", "CREATE PROCEDURE pivot(\n", "\tIN database_name VARCHAR(32),\n", " IN base_table_name VARCHAR(32),\n", " IN base_primary_key_name VARCHAR(32),\n", " IN pivoted_primary_key_name VARCHAR(32)\n", ")\n", "```\n", "#### In the following excerpts, user-defined variables `@database_name`, `@base_table_name`, `@base_primary_key_name`, and `pivoted_primary_key_name` take the place of the stored procedure's parameters. We'll test with database `pivot_test` and its table `fruits` with primary key `store_id`." ] }, { "cell_type": "code", "execution_count": 4, "id": "8468906c-e3a9-4e73-97e3-da53f8869d7b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@database_name@base_table_name@base_primary_key_name@pivoted_primary_key_name
0pivot_testfruitsstore_idfruit_id
\n", "
" ], "text/plain": [ " @database_name @base_table_name @base_primary_key_name \\\n", "0 pivot_test fruits store_id \n", "\n", " @pivoted_primary_key_name \n", "0 fruit_id " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\"SET @database_name = 'pivot_test'\")\n", "cursor.execute(\"SET @base_table_name = 'fruits'\")\n", "cursor.execute(\"SET @base_primary_key_name = 'store_id'\")\n", "cursor.execute(\"SET @pivoted_primary_key_name = 'fruit_id'\")\n", "\n", "df_query(conn,\n", " \"\"\"\n", " SELECT @database_name, @base_table_name, \n", " @base_primary_key_name, @pivoted_primary_key_name\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "efbfdca8-a628-4d77-92fd-3eb26ea79f51", "metadata": {}, "source": [ "### Table `fruits` to pivot." ] }, { "cell_type": "code", "execution_count": 5, "id": "1369bcd1-5e88-44f7-8790-7b12f124f5f9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
store_idapplesorangesbananaspeachespears
0store_1101102103104105
1store_2201202203204205
2store_3301302303304305
\n", "
" ], "text/plain": [ " store_id apples oranges bananas peaches pears\n", "0 store_1 101 102 103 104 105\n", "1 store_2 201 202 203 204 205\n", "2 store_3 301 302 303 304 305" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_query(conn, 'SELECT * FROM fruits')" ] }, { "cell_type": "markdown", "id": "0f8d1ed9-1879-49ae-bdf9-f08c265ab20e", "metadata": {}, "source": [ "### Call stored procedure `pivot()` to pivot the table." ] }, { "cell_type": "code", "execution_count": 6, "id": "32d2311e-e8b7-430b-ae3e-6c444a10d568", "metadata": {}, "outputs": [], "source": [ "# database_name: pivot_test\n", "# base table_name: fruits\n", "# base_primary_key_name: store_id\n", "# pivoted_primary_key_name: fruit_id\n", "\n", "cursor.execute(\"CALL pivot('pivot_test', 'fruits', 'store_id', 'fruit_id')\")" ] }, { "cell_type": "markdown", "id": "01e2bcd4-b6f0-4c7f-8ab6-fef1c8b841a3", "metadata": {}, "source": [ "### The resulting pivoted table `fruits_pivoted`." ] }, { "cell_type": "code", "execution_count": 7, "id": "23942612-b72c-4bb4-a40f-56be6a2be265", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fruit_idstore_1store_2store_3
0apples101201301
1bananas103203303
2oranges102202302
3peaches104204304
4pears105205305
\n", "
" ], "text/plain": [ " fruit_id store_1 store_2 store_3\n", "0 apples 101 201 301\n", "1 bananas 103 203 303\n", "2 oranges 102 202 302\n", "3 peaches 104 204 304\n", "4 pears 105 205 305" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_query(conn, 'SELECT * FROM fruits_pivoted')" ] }, { "cell_type": "markdown", "id": "66369a27-efbc-43a1-b84b-876b54bf4d28", "metadata": {}, "source": [ "# How to pivot an database table.\n", "### Restrictions:\n", "- #### The base table's primary key must be its first column, and the key's values must be varchar (string).\n", "- #### Since rows become columns and columns become rows after pivoting, all the data in the base table other than the primary key's values must be from the same domain (i.e., they all must have the same datatype).\n", "- #### Since the base table's primary key values will become the names of the pivoted table's columns, each value must be a legitimate column name." ] }, { "cell_type": "markdown", "id": "097105c6-b14a-484e-b3e8-4d832e727f46", "metadata": {}, "source": [ "### Base table's primary key values\n", "#### The built-in `GROUP_CONCAT` function constructs a comma-separated list of the base table's primary key values. This will be useful later for inserting values into the pivoted table." ] }, { "cell_type": "code", "execution_count": 8, "id": "1f94fc19-9047-4e09-9bf9-cbb89937000b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@pivoted_col_names
0store_1,store_2,store_3
\n", "
" ], "text/plain": [ " @pivoted_col_names\n", "0 store_1,store_2,store_3" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\n", " \"\"\"\n", " SELECT GROUP_CONCAT(store_id)\n", " INTO @pivoted_col_names\n", " FROM fruits\n", " \"\"\"\n", ")\n", "\n", "df_query(conn, 'SELECT @pivoted_col_names')" ] }, { "cell_type": "markdown", "id": "12ce2390-d276-407b-b25b-a293f8fe76b7", "metadata": {}, "source": [ "### Prepared statements\n", "#### We want the stored procedure to work with any base table. Therefore, we need to replace `store_id` with the value of parameter `base_primary_key_name` and `fruits` with the value of parameter `base_table_name`. We can create the SQL command as a string using the built-in `CONCAT()` function. (In this excerpt, the user-defined variables `@base_primary_key_name` and `@base_table_name` take the place of the stored procedure's parameters.)" ] }, { "cell_type": "code", "execution_count": 9, "id": "8f216cd3-e2af-4c81-b11d-a433fc1a7724", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@sql_string
0SELECT GROUP_CONCAT(store_id) INTO @pivoted_col_names FROM fruits
\n", "
" ], "text/plain": [ " @sql_string\n", "0 SELECT GROUP_CONCAT(store_id) INTO @pivoted_col_names FROM fruits" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\n", " \"\"\"\n", " SET @sql_string = CONCAT(\n", " 'SELECT GROUP_CONCAT(', @base_primary_key_name, ') ',\n", " 'INTO @pivoted_col_names FROM ', @base_table_name\n", " );\n", " \"\"\"\n", ")\n", "\n", "df_query(conn, 'SELECT @sql_string')" ] }, { "cell_type": "markdown", "id": "6418fd43-fda5-4fb8-a627-673f35b5109e", "metadata": {}, "source": [ "#### Once we have a SQL string, we can create a prepared statement from it, and then execute the statement." ] }, { "cell_type": "code", "execution_count": 10, "id": "2e8545b2-e2dd-4c40-a97b-c6b69266d701", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@pivoted_col_names
0store_1,store_2,store_3
\n", "
" ], "text/plain": [ " @pivoted_col_names\n", "0 store_1,store_2,store_3" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute('PREPARE sql_stmt FROM @sql_string')\n", "\n", "df_query(conn, 'SELECT @pivoted_col_names')" ] }, { "cell_type": "markdown", "id": "ee529051-72b7-47f9-9dc6-328b565eb2c7", "metadata": {}, "source": [ "#### Afterward, we should deallocate the prepared statement." ] }, { "cell_type": "code", "execution_count": 11, "id": "7a466449-c267-454e-b603-722ef546eaa7", "metadata": {}, "outputs": [], "source": [ "cursor.execute('DEALLOCATE PREPARE sql_stmt')" ] }, { "cell_type": "markdown", "id": "e60b29f7-fb27-4821-98b8-e18df1153c92", "metadata": {}, "source": [ "#### Helper stored procedure `exec_stmt()` does all this. We simply pass it the SQL string. (In this excerpt, the user-defined variables `@base_primary_key_name` and `@base_table_name` take the place of the stored procedure's parameters.)" ] }, { "cell_type": "code", "execution_count": 12, "id": "7cb6d270-815b-4950-84d5-7a12b6a089cc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@pivoted_col_names
0store_1,store_2,store_3
\n", "
" ], "text/plain": [ " @pivoted_col_names\n", "0 store_1,store_2,store_3" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\"SET @pivoted_col_names = ''\")\n", "\n", "cursor.execute(\n", " \"\"\"\n", " CALL exec_stmt(CONCAT(\n", " 'SELECT GROUP_CONCAT(', @base_primary_key_name, ') ',\n", " ' INTO @pivoted_col_names FROM ', @base_table_name\n", " ));\n", " \"\"\"\n", ")\n", "\n", "df_query(conn, 'SELECT @pivoted_col_names')" ] }, { "cell_type": "markdown", "id": "d41a486b-be43-455b-a70b-1338cb9bebb2", "metadata": {}, "source": [ "#### Prepend the pivoted column names with the name of the pivoted primary key name. (In this excerpt, the user-defined variables @pivoted_primary_key_name takes the place of the stored procedure's parameters.)" ] }, { "cell_type": "code", "execution_count": 13, "id": "812257b5-0e3d-4e61-bb09-39620d10f6db", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@pivoted_col_names
0fruit_id, store_1,store_2,store_3
\n", "
" ], "text/plain": [ " @pivoted_col_names\n", "0 fruit_id, store_1,store_2,store_3" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\n", " \"\"\"\n", " SET @pivoted_col_names = CONCAT(@pivoted_primary_key_name, \n", " ', ', @pivoted_col_names);\n", " \"\"\"\n", ")\n", "\n", "df_query(conn, 'SELECT @pivoted_col_names')" ] }, { "cell_type": "markdown", "id": "b3d9cfc7-cfd2-478b-967b-1d576bdc38fa", "metadata": {}, "source": [ "### Create the pivoted table\n", "#### To create the table `fruits_pivoted`:\n", "```\n", "CREATE TABLE fruits_pivoted\n", "(\n", " fruit_id VARCHAR(32) NOT NULL,\n", " store_1 INT,\n", " store_2 INT,\n", " store_3 INT,\n", " PRIMARY KEY (fruit_id)\n", ")\n", "```\n", "#### For the `fruits_pivoted` example:" ] }, { "cell_type": "code", "execution_count": 14, "id": "9bf014ad-c846-4ccc-8aed-24b7233118c6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@field_list
0store_1 INT,store_2 INT,store_3 INT
\n", "
" ], "text/plain": [ " @field_list\n", "0 store_1 INT,store_2 INT,store_3 INT" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute('SET @key = @base_primary_key_name')\n", "\n", "cursor.execute(\n", " \"\"\"\n", " CALL exec_stmt(CONCAT(\n", " \"SELECT GROUP_CONCAT(\", @key, \", ' INT') \",\n", " \"INTO @field_list \",\n", " \"FROM \", @base_table_name\n", " ))\n", " \"\"\"\n", ")\n", "\n", "df_query(conn, 'SELECT @field_list')" ] }, { "cell_type": "markdown", "id": "4f317eeb-355a-400e-8a4a-adfef93bc10a", "metadata": {}, "source": [ "#### The SQL string to create the pivoted table. The name of the pivoted table's primary key column is determined by the caller of the stored procedure as the value of the parameter `pivoted_primary_key_name`. (In this excerpt, the user-defined variables `@base_table_name`, `@pivoted_table_name`, and `@pivoted_primary_key_name` take the place of the stored procedure's parameters.)" ] }, { "cell_type": "code", "execution_count": 15, "id": "c9afdedd-4d79-4219-a26e-05dbbf21ec2d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@create_string
0CREATE TABLE fruits_pivoted(fruit_id VARCHAR(32) NOT NULL, store_1 INT,store_2 INT,store_3 INT, PRIMARY KEY (fruit_id) )
\n", "
" ], "text/plain": [ " @create_string\n", "0 CREATE TABLE fruits_pivoted(fruit_id VARCHAR(32) NOT NULL, store_1 INT,store_2 INT,store_3 INT, PRIMARY KEY (fruit_id) )" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\n", " \"\"\"\n", " SET @pivoted_table_name = CONCAT(@base_table_name, '_pivoted')\n", " \"\"\"\n", ")\n", "\n", "cursor.execute(\n", " \"\"\"\n", " SET @create_string = CONCAT(\n", " 'CREATE TABLE ', @pivoted_table_name,\n", " '(', @pivoted_primary_key_name, ' VARCHAR(32) NOT NULL, ', \n", " @field_list,\n", " ', PRIMARY KEY (', @pivoted_primary_key_name, ') ',\n", " ')'\n", " )\n", " \"\"\"\n", ")\n", "\n", "df_query(conn, 'SELECT @create_string')" ] }, { "cell_type": "markdown", "id": "e0314aef-ab73-418b-ae04-42be1b09b65d", "metadata": {}, "source": [ "#### Therefore, to create any pivoted table. (In this excerpt, the user-defined variables `@base_table_name`, `@pivoted_table_name`, and `@pivoted_primary_key_name` take the place of the stored procedure's parameters.)" ] }, { "cell_type": "code", "execution_count": 16, "id": "d2b0fcfe-82b1-4cc0-9821-e322feee5dea", "metadata": {}, "outputs": [], "source": [ "cursor.execute(\n", " \"\"\"\n", " CALL exec_stmt(CONCAT(\n", " 'DROP TABLE IF EXISTS ', @pivoted_table_name\n", " ));\n", " \"\"\"\n", ")\n", "\n", "cursor.execute(\n", " \"\"\"\n", " CALL exec_stmt(CONCAT(\n", " 'CREATE TABLE ', @pivoted_table_name,\n", " '(', @pivoted_primary_key_name, ' VARCHAR(32) NOT NULL, ', \n", " @field_list,\n", " ', PRIMARY KEY (', @pivoted_primary_key_name, ') ',\n", " ')'\n", " ))\n", " \"\"\"\n", ")\n" ] }, { "cell_type": "markdown", "id": "8f7eb43a-efb2-4009-9a5e-2d89f241b3f2", "metadata": {}, "source": [ "### Columns of the base table\n", "#### To create table `fruits_pivoted`, we need to go column by column, left to right, from the base table `fruits` and insert the base column values row by row into the pivoted table `fruits_pivoted`. The column names of the base table become the primary key values of the pivoted table, and the primary key values of the base table become the column names of the pivoted table. \n", "#### We want the stored procedure to work with any base table. Therefore, we must query the built-in table `information_schema.columns`. (In this excerpt, the user-defined variables `@database_name` and `@base_table_name` take the place of the stored procedure's parameters.)" ] }, { "cell_type": "code", "execution_count": 17, "id": "aeb81938-4fdc-4ad1-917f-a7d758378d17", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TABLE_CATALOGTABLE_SCHEMATABLE_NAMECOLUMN_NAMEORDINAL_POSITIONCOLUMN_DEFAULTIS_NULLABLEDATA_TYPECHARACTER_MAXIMUM_LENGTHCHARACTER_OCTET_LENGTH...DATETIME_PRECISIONCHARACTER_SET_NAMECOLLATION_NAMECOLUMN_TYPECOLUMN_KEYEXTRAPRIVILEGESCOLUMN_COMMENTGENERATION_EXPRESSIONSRS_ID
0defpivot_testfruitsstore_id1NoneNOvarchar8.032.0...Noneutf8mb4utf8mb4_0900_ai_civarchar(8)PRIselect,insert,update,referencesNone
1defpivot_testfruitsapples2NoneYESintNaNNaN...NoneNoneNoneintselect,insert,update,referencesNone
2defpivot_testfruitsoranges3NoneYESintNaNNaN...NoneNoneNoneintselect,insert,update,referencesNone
3defpivot_testfruitsbananas4NoneYESintNaNNaN...NoneNoneNoneintselect,insert,update,referencesNone
4defpivot_testfruitspeaches5NoneYESintNaNNaN...NoneNoneNoneintselect,insert,update,referencesNone
5defpivot_testfruitspears6NoneYESintNaNNaN...NoneNoneNoneintselect,insert,update,referencesNone
\n", "

6 rows × 22 columns

\n", "
" ], "text/plain": [ " TABLE_CATALOG TABLE_SCHEMA TABLE_NAME COLUMN_NAME ORDINAL_POSITION \\\n", "0 def pivot_test fruits store_id 1 \n", "1 def pivot_test fruits apples 2 \n", "2 def pivot_test fruits oranges 3 \n", "3 def pivot_test fruits bananas 4 \n", "4 def pivot_test fruits peaches 5 \n", "5 def pivot_test fruits pears 6 \n", "\n", " COLUMN_DEFAULT IS_NULLABLE DATA_TYPE CHARACTER_MAXIMUM_LENGTH \\\n", "0 None NO varchar 8.0 \n", "1 None YES int NaN \n", "2 None YES int NaN \n", "3 None YES int NaN \n", "4 None YES int NaN \n", "5 None YES int NaN \n", "\n", " CHARACTER_OCTET_LENGTH ... DATETIME_PRECISION CHARACTER_SET_NAME \\\n", "0 32.0 ... None utf8mb4 \n", "1 NaN ... None None \n", "2 NaN ... None None \n", "3 NaN ... None None \n", "4 NaN ... None None \n", "5 NaN ... None None \n", "\n", " COLLATION_NAME COLUMN_TYPE COLUMN_KEY EXTRA \\\n", "0 utf8mb4_0900_ai_ci varchar(8) PRI \n", "1 None int \n", "2 None int \n", "3 None int \n", "4 None int \n", "5 None int \n", "\n", " PRIVILEGES COLUMN_COMMENT GENERATION_EXPRESSION SRS_ID \n", "0 select,insert,update,references None \n", "1 select,insert,update,references None \n", "2 select,insert,update,references None \n", "3 select,insert,update,references None \n", "4 select,insert,update,references None \n", "5 select,insert,update,references None \n", "\n", "[6 rows x 22 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT *\n", " FROM information_schema.columns\n", " WHERE table_schema = @database_name\n", " AND table_name = @base_table_name\n", " ORDER BY ordinal_position;\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "9f4a428c-82e0-4979-be81-c010a7baac53", "metadata": {}, "source": [ "#### We only want the column names of the base table, not including the primary key column." ] }, { "cell_type": "code", "execution_count": 18, "id": "fb387548-db0f-4d69-b90d-c273c4b58b88", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
COLUMN_NAME
0apples
1oranges
2bananas
3peaches
4pears
\n", "
" ], "text/plain": [ " COLUMN_NAME\n", "0 apples\n", "1 oranges\n", "2 bananas\n", "3 peaches\n", "4 pears" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_query(conn,\n", " \"\"\"\n", " SELECT column_name\n", " FROM information_schema.columns\n", " WHERE table_schema = @database_name\n", " AND table_name = @base_table_name\n", " AND ordinal_position > 1\n", " ORDER BY ordinal_position;\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "id": "5da96ac5-cbf7-4df3-86fb-40e0ad460b80", "metadata": {}, "source": [ "### `CURSOR` and `CONTINUE HANDLER` for the base table columns\n", "#### In order to process the base table column by column, we don't want the column names all at once. Instead, we want the names one at a time. To do this, we defined a cursor:\n", "```\n", "DECLARE base_col_cursor CURSOR FOR\n", " SELECT column_name\n", " FROM information_schema.columns\n", " WHERE table_schema = database_name\n", " AND table_name = base_table_name\n", " AND ordinal_position > 1\n", " ORDER BY ordinal_position;\n", "```\n", "#### To open the cursor:\n", "```\n", "OPEN base_col_cursor;\n", "```\n", "#### Then, to get the next column name each time into the stored procedure's local variable `base_col_name`:\n", "```\n", "FETCH base_col_cursor INTO base_col_name;\n", "```\n", "#### We also define a continue handler that's automatically called when the cursor has no more values:\n", "```\n", "DECLARE CONTINUE HANDLER FOR NOT FOUND\n", " SET no_more_columns = TRUE;\n", "```\n", "Local variable `no_more_columns` is initialized to `FALSE`:\n", "```\n", " DECLARE no_more_columns BOOL DEFAULT FALSE;\n", "```\n", "#### A `WHILE` loop iterates over the column names, one at a time:\n", "```\n", "FETCH base_col_cursor INTO base_col_name;\n", "\n", "WHILE NOT no_more_columns DO\n", " ...\n", "\n", " FETCH base_col_cursor INTO base_col_name;\n", "END WHILE; \n", "```" ] }, { "cell_type": "markdown", "id": "718755c5-7a6b-4f1e-84c9-16c729e2823a", "metadata": {}, "source": [ "### Values from one base table column\n", "#### `GROUP_CONCAT()` constructs a comma-separated list of values from one base table column. For example, column `apples` from table `fruits` produces values for the first row of table `fruits_pivoted`." ] }, { "cell_type": "code", "execution_count": 19, "id": "e6eb39c9-2bac-41fc-9c0e-0605108a272a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@pivoted_values
0101,201,301
\n", "
" ], "text/plain": [ " @pivoted_values\n", "0 101,201,301" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\n", " \"\"\"\n", " SELECT GROUP_CONCAT(apples)\n", " INTO @pivoted_values\n", " FROM fruits\n", " \"\"\"\n", ")\n", "\n", "df_query(conn, 'SELECT @pivoted_values')" ] }, { "cell_type": "markdown", "id": "78eaffb7-d6d9-4154-b3b1-d27a18f9b77a", "metadata": {}, "source": [ "#### Prepend the base column column name, and we have all the values for the pivoted table's first row." ] }, { "cell_type": "code", "execution_count": 20, "id": "b1111bdc-16aa-4218-b43e-9a6e930535b5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
@pivoted_values
0apple, 101,201,301
\n", "
" ], "text/plain": [ " @pivoted_values\n", "0 apple, 101,201,301" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\n", " \"\"\"\n", " SET @pivoted_values = CONCAT('apple', ', ', @pivoted_values)\n", " \"\"\"\n", ")\n", "\n", "df_query(conn, 'SELECT @pivoted_values')" ] }, { "cell_type": "markdown", "id": "e2ea3fbe-88e8-4d31-8ebc-31146dbda676", "metadata": {}, "source": [ "### Inside the `WHILE` loop\n", "#### Thne `WHILE` loop iterates over all of any base table's columns. We can get the comma-separated values from each base column, one column at a time, since the `base_col_cursor` supplies the base column names.\n", "```\n", "CALL exec_stmt(CONCAT(\n", " 'SELECT GROUP_CONCAT(', base_col_name, ') ',\n", " 'INTO @pivoted_values ',\n", " 'FROM ', base_table_name\n", "))\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cf9685d8-97be-430b-9888-2c3c7fc27dc3", "metadata": {}, "source": [ "#### Prepend the base column name as the pivoted row's primary key value:\n", "```\n", "SET @pivoted_values = CONCAT(\"'\", base_col_name, \n", " \"', \", @pivoted_values);\n", "```\n", "#### And insert the row into the pivoted table:\n", "```\n", "CALL exec_stmt(CONCAT(\n", " 'INSERT INTO ', pivoted_table_name,\n", " '(', @pivoted_col_names, ') ',\n", " 'VALUES (', @pivoted_values, ')'\n", "));\n", "```" ] }, { "cell_type": "code", "execution_count": 21, "id": "691c6b05-0e6b-46a7-8ff7-fc4606f00bed", "metadata": {}, "outputs": [], "source": [ "cursor.close()\n", "conn.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "84bedcd2-14ec-4f06-a602-05d931879d8b", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }