{ "cells": [ { "cell_type": "markdown", "id": "fcab9c82-d384-406f-91ba-13f5cb45a1c2", "metadata": {}, "source": [ "###
San Jose State University
Department of Applied Data Science

**DATA 200
Computational Programming for Data Analytics**

Spring 2024
Instructor: Ron Mak
" ] }, { "cell_type": "markdown", "id": "4e21c8f2-403f-4263-8102-26378e5e2844", "metadata": {}, "source": [ "# Plot Facebook stock prices from the `pandas` module" ] }, { "cell_type": "markdown", "id": "60e5abc8-5c98-4589-9046-f64b5a61386a", "metadata": {}, "source": [ "#### `pandas` is the primary module for doing data analytics, and graphs are important for data visualizations. Therefore, both `Series` and `DataFrame` objects have a `plot()` method that use `matplotlib` to draw graphs. It is a convenient and somewhat simplified way to draw graphs directly from `Series` and `DataFrame` objects." ] }, { "cell_type": "markdown", "id": "ce064784-41bd-4cc6-98c1-6219e39d087f", "metadata": {}, "source": [ "## `plot()` parameters" ] }, { "cell_type": "markdown", "id": "48796b2f-48cb-493d-b9ed-d54c20383d6c", "metadata": {}, "source": [ "#### A call to `plot()` implicitly makes calls to `matplotlib`. The `kind` argument specifies the type of graph. The graph type determines what other arguments are necessary. Some commonly used parameters:\n", "| Parameter | Purpose | Data Type |\n", "| --- | --- | --- |\n", "| `kind` | Determines the plot type | String |\n", "| `x`/`y` | Column(s) to plot on the *x*-axis/*y*-axis | String or list |\n", "| `ax` | Draws the plot on the `Axes` object provided | `Axes` |\n", "| `subplots` | Determines whether to make subplots | Boolean |\n", "| `layout` | Specifies how to arrange the subplots | Tuple of `(rows, columns)` |\n", "| `figsize` | Size to make the `Figure` object | Tuple of `(width, height)` | \n", "| `title` | The title of the plot or subplots | String for the plot title or a list of strings for subplot titles |\n", "| `legend` | Determines whether to show the legend | Boolean |\n", "| `label` | What to call an item in the legend | String if a single column is being plotted; otherwise, a list of strings |\n", "| `style` | `matplotlib` style strings for each item being plotted | String if a single column is being plotted; otherwise, a list of strings |\n", "| `color` | The color to plot the item in | String or red, green, blue tuple if a single column is being plotted; otherwise, a list |\n", "| `colormap` | The colormap to use | String or `matplotlib` colormap object |\n", "| `logx`/`logy`/`loglog` | Determines whether to use a logarithmic scale for the *x*-axis, *y*-axis, or both | Boolean |\n", "| `xticks`/`yticks` | Determines where to draw the ticks on the *x*-axis/*y*-axis | List of values |\n", "| `xlim`/`ylim` | The axis limits for the *x*-axis/*y*-axis | Tuple of the form `(min, max)` |\n", "| `rot` | The angle to write the tick labels at | Integer |\n", "| `sharex`/`sharey` | Determines whether to have subplots share the *x*-axis/*y*-axis | Boolean |\n", "| `fontsize` | Controls the size of the tick labels | Integer |\n", "| `grid` | Turns on/off the grid lines | Boolean |\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "8a5728ca-df58-4ae1-bab0-ada54f400ebc", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "444edfd1-13b0-4359-bbac-37c2c39aaffc", "metadata": {}, "source": [ "## Line graphs" ] }, { "cell_type": "code", "execution_count": null, "id": "6769f2d8-9dde-45db-b1a0-ce2939601452", "metadata": {}, "outputs": [], "source": [ "fb = pd.read_csv('fb_stock_prices_2018.csv', \n", " index_col='date', parse_dates=True)\n", "fb" ] }, { "cell_type": "code", "execution_count": null, "id": "938415a7-b6c6-4cf4-953d-70246331f081", "metadata": {}, "outputs": [], "source": [ "fb.plot(\n", " kind='line',\n", " y='open',\n", " figsize=(10, 5),\n", " style='-b',\n", " legend=False,\n", " title='Evolution of Facebook Open Price'\n", ")" ] }, { "cell_type": "markdown", "id": "20d611f4-85ef-4cad-840c-7ed591a712f9", "metadata": {}, "source": [ "#### Instead of using the `style='-b'` keyword argument, we can use the `color` and `linestype` keyword arguments." ] }, { "cell_type": "code", "execution_count": null, "id": "cf24002f-fc87-43d5-8fc4-2909e68a69dc", "metadata": {}, "outputs": [], "source": [ "fb.plot(\n", " kind='line',\n", " y='open',\n", " figsize=(10, 5),\n", " color='blue',\n", " linestyle='solid',\n", " legend=False,\n", " title='Evolution of Facebook Open Price'\n", ")" ] }, { "cell_type": "markdown", "id": "284be798-60f1-4e0a-aa6f-458298aab270", "metadata": {}, "source": [ "#### Plot many lines at once by passing a list of the columns to plot. For example, plot the open, high, low, and close (OHLC) prices of the first week (`1W`)." ] }, { "cell_type": "code", "execution_count": null, "id": "ca0c4e67-0a4f-4b2e-bf96-54d72f209e7f", "metadata": {}, "outputs": [], "source": [ "fb.first('1W')" ] }, { "cell_type": "markdown", "id": "e4abd63a-0714-4b60-866d-4d3f37522a7f", "metadata": {}, "source": [ "#### A line plot is the default. Calling `autoscale()` at the end adds space between the line plots and the x- and y-axes." ] }, { "cell_type": "code", "execution_count": null, "id": "913d84a3-3c56-403c-878d-cf5795d2e44c", "metadata": {}, "outputs": [], "source": [ "fb.first('1W').plot(\n", " y=['open', 'high', 'low', 'close'],\n", " style=['o-b', '--r', ':k', '.-g'],\n", " title='Facebook OHLC Prices during 1st Week of Trading 2018'\n", ").autoscale()" ] }, { "cell_type": "markdown", "id": "ede96456-6a6c-4184-a160-1e1bd32eb7d1", "metadata": {}, "source": [ "## Scatter plots" ] }, { "cell_type": "markdown", "id": "7617d9ad-62bb-4323-9188-de57c59b8eb0", "metadata": {}, "source": [ "#### The dataframe method `assign()` creates a new dataframe with a new column." ] }, { "cell_type": "code", "execution_count": null, "id": "99fcbfbd-508f-4fcf-989d-851f88a2c993", "metadata": {}, "outputs": [], "source": [ "fb.assign(max_abs_change = fb.high - fb.low).head()" ] }, { "cell_type": "code", "execution_count": null, "id": "1f60341e-f5c0-4b9d-b4bb-3cf573aa559d", "metadata": {}, "outputs": [], "source": [ "from random import randint\n", "\n", "fb.assign(\n", " max_abs_change=fb.high - fb.low\n", ").plot(\n", " kind='scatter', x='volume', y='max_abs_change',\n", " title='Facebook Daily High - Low vs. log(Volume Traded)',\n", " logx=True, alpha=0.25\n", ")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "a5d77d12-7c14-4f2f-acf5-1adf2bef0485", "metadata": {}, "source": [ "## Hexbins\n", "#### Hexbins divide up the plot into hexagons, which are shaded according to the density of points there." ] }, { "cell_type": "code", "execution_count": null, "id": "5cc73c14-287b-482d-aaba-aadbc0e06f5a", "metadata": {}, "outputs": [], "source": [ "fb.assign(\n", " log_volume=np.log(fb.volume),\n", " max_abs_change=fb.high - fb.low\n", ").plot(\n", " kind='hexbin',\n", " x='log_volume',\n", " y='max_abs_change',\n", " title='Facebook Daily High - Low vs. log(Volume Traded)',\n", " colormap='gray_r',\n", " gridsize=20, \n", " sharex=False # we have to pass this to see the x-axis\n", ")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "9f54d8d2-2a0a-4ee8-a416-126d9967517e", "metadata": {}, "source": [ "## Histograms" ] }, { "cell_type": "code", "execution_count": null, "id": "ca9e1b45-3407-4360-8c0d-179939adb8d3", "metadata": { "tags": [] }, "outputs": [], "source": [ "fb.volume.plot(\n", " kind='hist', \n", " title='Histogram of Daily Volume Traded in Facebook Stock'\n", ")\n", "plt.xlabel('Volume traded') # label the x-axis" ] }, { "cell_type": "markdown", "id": "e1a8f07b-8da1-442d-a375-1bf89d89f093", "metadata": {}, "source": [ "#### Use the `alpha` parameter to compare distributions by overlapping histograms. For example, compare the open and closing prices:" ] }, { "cell_type": "code", "execution_count": null, "id": "6d163f92-dda1-4f51-87ac-a024a6b39842", "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(figsize=(8, 5))\n", "\n", "fb[['open', 'close']].plot(\n", " kind='hist', ax=axes, alpha=0.5, \n", " label=['open', 'close'], legend=True,\n", " title='Comparison of opening and closing prices'\n", ")\n", "\n", "plt.xlabel('Prices')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "2e2948b1-184c-4a52-b91d-ce9cdf53d37f", "metadata": {}, "source": [ "### Kernel Density Estimation (KDE)\n", "#### Estimate the probability density function (PDF). Pass `kind='kde'` for an estimate of the probability density function (PDF). For example, estimate the probability of getting a particular stock price:" ] }, { "cell_type": "code", "execution_count": null, "id": "b71342ea-f2cb-4aa8-9455-abfcdff6f7a9", "metadata": {}, "outputs": [], "source": [ "fb.high.plot(\n", " kind='kde', \n", " title='KDE of Daily High Price for Facebook Stock'\n", ")\n", "\n", "plt.xlabel('Price ($)')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "4a62307f-8921-40ba-b36a-44e3d06d6a38", "metadata": {}, "source": [ "#### The `plot()` method returns an `Axes` object. Store this for additional customization of the plot, or pass it into another call to `plot()` as the `ax` argument to add to the original plot. \n", "\n", "#### It can often be helpful to view the KDE superimposed on top of the histogram. For example:" ] }, { "cell_type": "code", "execution_count": null, "id": "19e1ef64-449a-41b4-a99c-3f48125f078b", "metadata": {}, "outputs": [], "source": [ "# First plot: the histogram\n", "ax_hist = fb.high.plot(kind='hist', density=True, alpha=0.5)\n", "\n", "# Second plot: the KDE\n", "fb.high.plot(\n", " ax=ax_hist, kind='kde', color='blue', \n", " title=\"Distribution of Facebook Stock's Daily High Price in 2018\"\n", ")\n", "\n", "plt.xlabel('Price ($)')" ] }, { "cell_type": "markdown", "id": "d6aea152-3f03-4f91-9045-d8e80400e29e", "metadata": {}, "source": [ "## Box plots\n", "#### Pass `kind='box'` to create box plots. For example:" ] }, { "cell_type": "code", "execution_count": null, "id": "e6309ee8-d395-40c2-ac6d-d77cb62f3ee7", "metadata": {}, "outputs": [], "source": [ "fb.iloc[:,:4].plot(\n", " kind='box', title='Facebook OHLC Prices Box Plot'\n", ")\n", "\n", "plt.ylabel('price ($)')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "739da044-0aa4-4f5d-be22-ac8448d95caa", "metadata": {}, "source": [ "#### A notched box plot represents a 95% confidence interval around the median. Pass the keyword argument `notch=True`. For example:" ] }, { "cell_type": "code", "execution_count": null, "id": "b3bd511d-d458-4e63-bb6b-fc5b24c6fedd", "metadata": {}, "outputs": [], "source": [ "fb.iloc[:,:4].plot(\n", " kind='box', \n", " title='Facebook OHLC Prices Box Plot', \n", " notch=True)\n", "\n", "plt.ylabel('price ($)')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "4ff78f6e-ce3e-43cc-b182-136a0ec87d20", "metadata": {}, "source": [ "## Subplots" ] }, { "cell_type": "markdown", "id": "62a1e5ac-dc27-49f4-8f1b-a3af12b1de9a", "metadata": {}, "source": [ "#### Create subplots by passing `subplots=True` and (optionally) specifying the `layout` in a tuple of `(rows, columns)`. For example:" ] }, { "cell_type": "code", "execution_count": null, "id": "fd16c0ba-2c03-4cbf-b861-61d2f2ad9ffe", "metadata": {}, "outputs": [], "source": [ "fb.plot(\n", " kind='line',\n", " subplots=True,\n", " layout=(3, 2),\n", " figsize=(15, 10),\n", " title='Facebook Stock 2018'\n", ")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "192280bb-4227-4e4f-a59b-f902deca4cf2", "metadata": {}, "source": [ "#### Since we didn't specify which columns to graph, `pandas` graphed all five of them. They automatically shared the x-axis scale (the dates), but there are different y-axis scales." ] }, { "cell_type": "code", "execution_count": null, "id": "3e1c4cf3-c9af-44e3-a4c4-79e68ca86825", "metadata": {}, "outputs": [], "source": [ "plt.close()" ] }, { "cell_type": "markdown", "id": "778c402f-df39-4abf-b3e2-69eef28ed965", "metadata": {}, "source": [ "#### Adapted from ***Hands-On Data Analysis with Pandas, second edition***, by Stephanie Molin, Packt 2021, ISBN 978-1-80056-345-2" ] }, { "cell_type": "code", "execution_count": null, "id": "521523df-fdd2-47b2-a114-fc674c35db2e", "metadata": {}, "outputs": [], "source": [ "# Additional material (c) 2024 by Ronald Mak" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }