{ "cells": [ { "cell_type": "markdown", "id": "9e334ab8-eb54-45bb-ba39-704a46402baa", "metadata": {}, "source": [ "###
San Jose State University
Department of Applied Data Science

**DATA 200
Computational Programming for Data Analytics**

Spring 2024
Instructor: Ron Mak
" ] }, { "cell_type": "markdown", "id": "cafef7ba-5958-4981-bc1b-0e1aaa5e04db", "metadata": {}, "source": [ "# More `matplotlib`" ] }, { "cell_type": "markdown", "id": "50739fd6-8e52-453b-811e-a58dbd28d8ca", "metadata": {}, "source": [ "## `%matplotlib inline` \"magic\"" ] }, { "cell_type": "markdown", "id": "c87d7d46-9e79-456e-83b9-032ea2c6a716", "metadata": {}, "source": [ "#### When using `matplotlib` in a Jupyter notebook, the \"magic command\"\n", "``` Python\n", "%matplotlib inline\n", "```\n", "#### enables the graphs to be drawn inside the notebook.\n", "#### (Online forums claim that is no longer necessary with the latest version of Jupyter notebook.)" ] }, { "cell_type": "markdown", "id": "5af68027-b69d-4ecc-ae85-78b5994fe57b", "metadata": {}, "source": [ "## The `Figure` container object\n", "#### Whenever we create a graph, the highest container for all the objects that make up the graph is a `Figure` object. If we simply call `plt.plot()`, Python implicitly creates the `Figure` object for us.\n", "#### For example, we can plot Facebook stock prices." ] }, { "cell_type": "code", "execution_count": null, "id": "3a6a6aed-868f-4574-9085-f0543a6fc8df", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "id": "49ca3bb8-d9fd-451d-977e-bbc26d2ab203", "metadata": {}, "outputs": [], "source": [ "fb = pd.read_csv('fb_stock_prices_2018.csv', \n", " index_col='date', parse_dates=True)\n", "fb" ] }, { "cell_type": "code", "execution_count": null, "id": "472ca40c-f59c-4fe0-bec1-d865017fb3a5", "metadata": {}, "outputs": [], "source": [ "# Parameters: x values, y values\n", "plt.plot(fb.index, fb.open)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "c82a0421-f3b5-4cdc-8fef-9d2b94431493", "metadata": {}, "source": [ "#### We can access an implicitly recreated `Figure` object by calling `plt.figure()` if, for example, we want to change the figure size or its resoluton." ] }, { "cell_type": "code", "execution_count": null, "id": "464b0ff7-dc2a-46e6-b121-773501706f1e", "metadata": {}, "outputs": [], "source": [ "# By default, size units are inches.\n", "plt.figure(figsize=(7, 3), dpi=300)\n", "\n", "plt.plot(fb.index, fb.open)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "8b25d9ef-d00b-4725-bee0-e3249d5482e2", "metadata": {}, "source": [ "## Displaying a graph" ] }, { "cell_type": "markdown", "id": "a8bb273b-86f8-4215-94ce-5f568ba420c2", "metadata": {}, "source": [ "#### Python won't display a graph until we tell it to. That allows us to add features or make changes to the graph by configuring its objects while they are held in memory. We call `plt.show()` to finally display the graph.\n", "#### We must call `plt.show()` in a standalone Python program. However, in a Jupyter notebook, executing the cell containing the graph creation code will automatically display it (and remove it from memory). Therefore, in a notebook, it's not necessary to make the call." ] }, { "cell_type": "code", "execution_count": null, "id": "79bad914-857c-4bdc-afab-d8a47e11bf6a", "metadata": {}, "outputs": [], "source": [ "plt.plot(fb.index, fb.open)" ] }, { "cell_type": "markdown", "id": "292c50b1-d950-43f9-a450-45f2a75b262e", "metadata": {}, "source": [ "#### Including a call to `plt.show()` in a notebook helps if we later decide to convert the notebook to a standalong Python program. Also, since `plt.show()` has no return value, so it cuts out extraneous output in a notebook cell.\n", "#### After it is displayed, the graph's objects are removed from memory. We must recreate the graph to display it again." ] }, { "cell_type": "code", "execution_count": null, "id": "7e75546d-fa6a-4409-86bf-cb2e8a5e2c5a", "metadata": {}, "outputs": [], "source": [ "plt.show() # nothing will be displayed the second time" ] }, { "cell_type": "markdown", "id": "7687b653-b543-47db-bcc1-d8900f82d2bb", "metadata": {}, "source": [ "## Histograms" ] }, { "cell_type": "markdown", "id": "cc8593da-340f-4518-9f20-407ffadd2d7e", "metadata": {}, "source": [ "#### When creating and displaying histograms, do not ignore bin size. Bin size is the width of each subrange of x values for which there is a bar. The smaller the bin size, the more bars." ] }, { "cell_type": "code", "execution_count": null, "id": "ef646e1d-97ca-4d22-9d4e-1489bb0ad900", "metadata": {}, "outputs": [], "source": [ "quakes = pd.read_csv('earthquakes.csv')\n", "quakes" ] }, { "cell_type": "markdown", "id": "4e6af131-3128-4699-a917-40b7d3a65984", "metadata": {}, "source": [ "#### Note the call to method `query()` below on the dataframe object. We only want to plot the magnitudes with type \"ml\"." ] }, { "cell_type": "code", "execution_count": null, "id": "6c9993ac-9313-49ef-b75b-895f0e062f88", "metadata": {}, "outputs": [], "source": [ "plt.hist(quakes.query('magType == \"ml\"').mag)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "9ec258c1-e128-4dae-9841-907d8abafae1", "metadata": {}, "source": [ "#### With the default bin size, the data appears to be roughly normally distributed.\n", "#### But appearances can be deceiving. Note how the shape of the distribution changes with different bin sizes, especially how the distribution appears to change from unimodal to bimodal." ] }, { "cell_type": "code", "execution_count": null, "id": "cc84af46-7e45-4564-b94a-696d90b303dd", "metadata": {}, "outputs": [], "source": [ "x = quakes.query('magType == \"ml\"').mag\n", "\n", "# Parameters: number of rows, number of columns\n", "fig, axes = plt.subplots(1, 2, figsize=(10, 3))\n", "\n", "for ax, bins in zip(axes, [7, 35]):\n", " ax.hist(x, bins=bins)\n", " ax.set_title(f'{bins} bins')\n", " \n", "plt.show()" ] }, { "cell_type": "markdown", "id": "d44d4e08-31dc-4ea0-b6fe-6b6b1b6c29de", "metadata": {}, "source": [ "## Subplots" ] }, { "cell_type": "markdown", "id": "a91f10e1-2dd2-4b99-b48f-cdf25f0a4a68", "metadata": {}, "source": [ "#### `plt.subplots()` returns the `Figure` object and the list of `Axes` objects that it contains. Each `Axes` object is a separate plot within the `Figure` container." ] }, { "cell_type": "code", "execution_count": null, "id": "b593bcbc-985a-4f5d-8eb0-814f4fda2465", "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(1, 2)\n", "axes" ] }, { "cell_type": "markdown", "id": "246509fb-bccc-4523-bbaa-1d7e9dc2e774", "metadata": {}, "source": [ "#### `Figure` and `Axes` objects have methods with similar or identical names to their `pyplot` function counterparts. For example,\n", "``` Python\n", "plt.hist()\n", "ax.hist()\n", "```" ] }, { "cell_type": "markdown", "id": "4e74ba26-c087-4f2d-95ea-02cd3561a527", "metadata": {}, "source": [ "#### Instead of calling `plt.subplots()`, we can call the `Figure` method `add_axes()`. For example:" ] }, { "cell_type": "code", "execution_count": null, "id": "3504aa1d-a30a-4932-829f-74ced9bda0b9", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(3, 3))\n", "\n", "# Parameters left, bottom, width, height:\n", "# left is the distance of the left axis from the left border\n", "# height is the distance of the bottom axis from the bottom border\n", "outside = fig.add_axes([0.1, 0.1, 0.9, 0.9])\n", "inside = fig.add_axes([0.7, 0.7, 0.25, 0.25])" ] }, { "cell_type": "markdown", "id": "acc121ca-aedb-4188-bb26-02e20fda638c", "metadata": {}, "source": [ "#### And, there's `GridSpec`:" ] }, { "cell_type": "code", "execution_count": null, "id": "19d45567-d051-4a53-98d1-13c2bb89fd5b", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(8, 8))\n", "gs = fig.add_gridspec(3, 3)\n", "\n", "# Parameter: gs[which rows, which columns]\n", "# can use range notation\n", "top_left = fig.add_subplot(gs[0, 0])\n", "mid_left = fig.add_subplot(gs[1, 0])\n", "top_right = fig.add_subplot(gs[:2, 1:]) # rows 0 and 1, cols 1 and 2\n", "bottom = fig.add_subplot(gs[2,:]) # row 2, all columns\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "404feb1d-a52e-4a57-94cd-4cb3b07dc294", "metadata": {}, "source": [ "## Saving graphs" ] }, { "cell_type": "markdown", "id": "5635eb24-79ae-437a-8387-5f1236f0c3af", "metadata": {}, "source": [ "#### Call `plt.savefig()` to save a graph in an image file. But be sure to call it before displaying the graph." ] }, { "cell_type": "code", "execution_count": null, "id": "47937b00-2166-4117-af16-7834851d9e96", "metadata": {}, "outputs": [], "source": [ "plt.plot(fb.index, fb.open)\n", "plt.savefig('FacebookStock.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "2044ab71-65e3-4730-b067-f17a7a64809d", "metadata": {}, "outputs": [], "source": [ "plt.close() # required for standalone Python programs" ] }, { "cell_type": "code", "execution_count": null, "id": "4cfb0376-544c-4cc7-a016-5a7945d7a8ea", "metadata": {}, "outputs": [], "source": [ "# Additional material (c) 2024 by Ronald Mak" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }