San Jose State University Department of Applied Data Science
**DATA 200 Computational Programming for Data Analytics**
Spring 2024 Instructor: Ron Mak
"
]
},
{
"cell_type": "markdown",
"id": "cafef7ba-5958-4981-bc1b-0e1aaa5e04db",
"metadata": {},
"source": [
"# More `matplotlib`"
]
},
{
"cell_type": "markdown",
"id": "50739fd6-8e52-453b-811e-a58dbd28d8ca",
"metadata": {},
"source": [
"## `%matplotlib inline` \"magic\""
]
},
{
"cell_type": "markdown",
"id": "c87d7d46-9e79-456e-83b9-032ea2c6a716",
"metadata": {},
"source": [
"#### When using `matplotlib` in a Jupyter notebook, the \"magic command\"\n",
"``` Python\n",
"%matplotlib inline\n",
"```\n",
"#### enables the graphs to be drawn inside the notebook.\n",
"#### (Online forums claim that is no longer necessary with the latest version of Jupyter notebook.)"
]
},
{
"cell_type": "markdown",
"id": "5af68027-b69d-4ecc-ae85-78b5994fe57b",
"metadata": {},
"source": [
"## The `Figure` container object\n",
"#### Whenever we create a graph, the highest container for all the objects that make up the graph is a `Figure` object. If we simply call `plt.plot()`, Python implicitly creates the `Figure` object for us.\n",
"#### For example, we can plot Facebook stock prices."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a6a6aed-868f-4574-9085-f0543a6fc8df",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49ca3bb8-d9fd-451d-977e-bbc26d2ab203",
"metadata": {},
"outputs": [],
"source": [
"fb = pd.read_csv('fb_stock_prices_2018.csv', \n",
" index_col='date', parse_dates=True)\n",
"fb"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "472ca40c-f59c-4fe0-bec1-d865017fb3a5",
"metadata": {},
"outputs": [],
"source": [
"# Parameters: x values, y values\n",
"plt.plot(fb.index, fb.open)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "c82a0421-f3b5-4cdc-8fef-9d2b94431493",
"metadata": {},
"source": [
"#### We can access an implicitly recreated `Figure` object by calling `plt.figure()` if, for example, we want to change the figure size or its resoluton."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "464b0ff7-dc2a-46e6-b121-773501706f1e",
"metadata": {},
"outputs": [],
"source": [
"# By default, size units are inches.\n",
"plt.figure(figsize=(7, 3), dpi=300)\n",
"\n",
"plt.plot(fb.index, fb.open)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "8b25d9ef-d00b-4725-bee0-e3249d5482e2",
"metadata": {},
"source": [
"## Displaying a graph"
]
},
{
"cell_type": "markdown",
"id": "a8bb273b-86f8-4215-94ce-5f568ba420c2",
"metadata": {},
"source": [
"#### Python won't display a graph until we tell it to. That allows us to add features or make changes to the graph by configuring its objects while they are held in memory. We call `plt.show()` to finally display the graph.\n",
"#### We must call `plt.show()` in a standalone Python program. However, in a Jupyter notebook, executing the cell containing the graph creation code will automatically display it (and remove it from memory). Therefore, in a notebook, it's not necessary to make the call."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "79bad914-857c-4bdc-afab-d8a47e11bf6a",
"metadata": {},
"outputs": [],
"source": [
"plt.plot(fb.index, fb.open)"
]
},
{
"cell_type": "markdown",
"id": "292c50b1-d950-43f9-a450-45f2a75b262e",
"metadata": {},
"source": [
"#### Including a call to `plt.show()` in a notebook helps if we later decide to convert the notebook to a standalong Python program. Also, since `plt.show()` has no return value, so it cuts out extraneous output in a notebook cell.\n",
"#### After it is displayed, the graph's objects are removed from memory. We must recreate the graph to display it again."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e75546d-fa6a-4409-86bf-cb2e8a5e2c5a",
"metadata": {},
"outputs": [],
"source": [
"plt.show() # nothing will be displayed the second time"
]
},
{
"cell_type": "markdown",
"id": "7687b653-b543-47db-bcc1-d8900f82d2bb",
"metadata": {},
"source": [
"## Histograms"
]
},
{
"cell_type": "markdown",
"id": "cc8593da-340f-4518-9f20-407ffadd2d7e",
"metadata": {},
"source": [
"#### When creating and displaying histograms, do not ignore bin size. Bin size is the width of each subrange of x values for which there is a bar. The smaller the bin size, the more bars."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef646e1d-97ca-4d22-9d4e-1489bb0ad900",
"metadata": {},
"outputs": [],
"source": [
"quakes = pd.read_csv('earthquakes.csv')\n",
"quakes"
]
},
{
"cell_type": "markdown",
"id": "4e6af131-3128-4699-a917-40b7d3a65984",
"metadata": {},
"source": [
"#### Note the call to method `query()` below on the dataframe object. We only want to plot the magnitudes with type \"ml\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c9993ac-9313-49ef-b75b-895f0e062f88",
"metadata": {},
"outputs": [],
"source": [
"plt.hist(quakes.query('magType == \"ml\"').mag)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "9ec258c1-e128-4dae-9841-907d8abafae1",
"metadata": {},
"source": [
"#### With the default bin size, the data appears to be roughly normally distributed.\n",
"#### But appearances can be deceiving. Note how the shape of the distribution changes with different bin sizes, especially how the distribution appears to change from unimodal to bimodal."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cc84af46-7e45-4564-b94a-696d90b303dd",
"metadata": {},
"outputs": [],
"source": [
"x = quakes.query('magType == \"ml\"').mag\n",
"\n",
"# Parameters: number of rows, number of columns\n",
"fig, axes = plt.subplots(1, 2, figsize=(10, 3))\n",
"\n",
"for ax, bins in zip(axes, [7, 35]):\n",
" ax.hist(x, bins=bins)\n",
" ax.set_title(f'{bins} bins')\n",
" \n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "d44d4e08-31dc-4ea0-b6fe-6b6b1b6c29de",
"metadata": {},
"source": [
"## Subplots"
]
},
{
"cell_type": "markdown",
"id": "a91f10e1-2dd2-4b99-b48f-cdf25f0a4a68",
"metadata": {},
"source": [
"#### `plt.subplots()` returns the `Figure` object and the list of `Axes` objects that it contains. Each `Axes` object is a separate plot within the `Figure` container."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b593bcbc-985a-4f5d-8eb0-814f4fda2465",
"metadata": {},
"outputs": [],
"source": [
"fig, axes = plt.subplots(1, 2)\n",
"axes"
]
},
{
"cell_type": "markdown",
"id": "246509fb-bccc-4523-bbaa-1d7e9dc2e774",
"metadata": {},
"source": [
"#### `Figure` and `Axes` objects have methods with similar or identical names to their `pyplot` function counterparts. For example,\n",
"``` Python\n",
"plt.hist()\n",
"ax.hist()\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "4e74ba26-c087-4f2d-95ea-02cd3561a527",
"metadata": {},
"source": [
"#### Instead of calling `plt.subplots()`, we can call the `Figure` method `add_axes()`. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3504aa1d-a30a-4932-829f-74ced9bda0b9",
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(3, 3))\n",
"\n",
"# Parameters left, bottom, width, height:\n",
"# left is the distance of the left axis from the left border\n",
"# height is the distance of the bottom axis from the bottom border\n",
"outside = fig.add_axes([0.1, 0.1, 0.9, 0.9])\n",
"inside = fig.add_axes([0.7, 0.7, 0.25, 0.25])"
]
},
{
"cell_type": "markdown",
"id": "acc121ca-aedb-4188-bb26-02e20fda638c",
"metadata": {},
"source": [
"#### And, there's `GridSpec`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19d45567-d051-4a53-98d1-13c2bb89fd5b",
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(8, 8))\n",
"gs = fig.add_gridspec(3, 3)\n",
"\n",
"# Parameter: gs[which rows, which columns]\n",
"# can use range notation\n",
"top_left = fig.add_subplot(gs[0, 0])\n",
"mid_left = fig.add_subplot(gs[1, 0])\n",
"top_right = fig.add_subplot(gs[:2, 1:]) # rows 0 and 1, cols 1 and 2\n",
"bottom = fig.add_subplot(gs[2,:]) # row 2, all columns\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "404feb1d-a52e-4a57-94cd-4cb3b07dc294",
"metadata": {},
"source": [
"## Saving graphs"
]
},
{
"cell_type": "markdown",
"id": "5635eb24-79ae-437a-8387-5f1236f0c3af",
"metadata": {},
"source": [
"#### Call `plt.savefig()` to save a graph in an image file. But be sure to call it before displaying the graph."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47937b00-2166-4117-af16-7834851d9e96",
"metadata": {},
"outputs": [],
"source": [
"plt.plot(fb.index, fb.open)\n",
"plt.savefig('FacebookStock.png')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2044ab71-65e3-4730-b067-f17a7a64809d",
"metadata": {},
"outputs": [],
"source": [
"plt.close() # required for standalone Python programs"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4cfb0376-544c-4cc7-a016-5a7945d7a8ea",
"metadata": {},
"outputs": [],
"source": [
"# Additional material (c) 2024 by Ronald Mak"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}