San Jose State University Department of Applied Data Science
**DATA 200 Computational Programming for Data Analytics**
Spring 2024 Instructor: Ron Mak
"
]
},
{
"cell_type": "markdown",
"id": "4e21c8f2-403f-4263-8102-26378e5e2844",
"metadata": {},
"source": [
"# Plot Facebook stock prices from the `pandas` module"
]
},
{
"cell_type": "markdown",
"id": "60e5abc8-5c98-4589-9046-f64b5a61386a",
"metadata": {},
"source": [
"#### `pandas` is the primary module for doing data analytics, and graphs are important for data visualizations. Therefore, both `Series` and `DataFrame` objects have a `plot()` method that use `matplotlib` to draw graphs. It is a convenient and somewhat simplified way to draw graphs directly from `Series` and `DataFrame` objects."
]
},
{
"cell_type": "markdown",
"id": "ce064784-41bd-4cc6-98c1-6219e39d087f",
"metadata": {},
"source": [
"## `plot()` parameters"
]
},
{
"cell_type": "markdown",
"id": "48796b2f-48cb-493d-b9ed-d54c20383d6c",
"metadata": {},
"source": [
"#### A call to `plot()` implicitly makes calls to `matplotlib`. The `kind` argument specifies the type of graph. The graph type determines what other arguments are necessary. Some commonly used parameters:\n",
"| Parameter | Purpose | Data Type |\n",
"| --- | --- | --- |\n",
"| `kind` | Determines the plot type | String |\n",
"| `x`/`y` | Column(s) to plot on the *x*-axis/*y*-axis | String or list |\n",
"| `ax` | Draws the plot on the `Axes` object provided | `Axes` |\n",
"| `subplots` | Determines whether to make subplots | Boolean |\n",
"| `layout` | Specifies how to arrange the subplots | Tuple of `(rows, columns)` |\n",
"| `figsize` | Size to make the `Figure` object | Tuple of `(width, height)` | \n",
"| `title` | The title of the plot or subplots | String for the plot title or a list of strings for subplot titles |\n",
"| `legend` | Determines whether to show the legend | Boolean |\n",
"| `label` | What to call an item in the legend | String if a single column is being plotted; otherwise, a list of strings |\n",
"| `style` | `matplotlib` style strings for each item being plotted | String if a single column is being plotted; otherwise, a list of strings |\n",
"| `color` | The color to plot the item in | String or red, green, blue tuple if a single column is being plotted; otherwise, a list |\n",
"| `colormap` | The colormap to use | String or `matplotlib` colormap object |\n",
"| `logx`/`logy`/`loglog` | Determines whether to use a logarithmic scale for the *x*-axis, *y*-axis, or both | Boolean |\n",
"| `xticks`/`yticks` | Determines where to draw the ticks on the *x*-axis/*y*-axis | List of values |\n",
"| `xlim`/`ylim` | The axis limits for the *x*-axis/*y*-axis | Tuple of the form `(min, max)` |\n",
"| `rot` | The angle to write the tick labels at | Integer |\n",
"| `sharex`/`sharey` | Determines whether to have subplots share the *x*-axis/*y*-axis | Boolean |\n",
"| `fontsize` | Controls the size of the tick labels | Integer |\n",
"| `grid` | Turns on/off the grid lines | Boolean |\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "8a5728ca-df58-4ae1-bab0-ada54f400ebc",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"id": "444edfd1-13b0-4359-bbac-37c2c39aaffc",
"metadata": {},
"source": [
"## Line graphs"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6769f2d8-9dde-45db-b1a0-ce2939601452",
"metadata": {},
"outputs": [],
"source": [
"fb = pd.read_csv('fb_stock_prices_2018.csv', \n",
" index_col='date', parse_dates=True)\n",
"fb"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "938415a7-b6c6-4cf4-953d-70246331f081",
"metadata": {},
"outputs": [],
"source": [
"fb.plot(\n",
" kind='line',\n",
" y='open',\n",
" figsize=(10, 5),\n",
" style='-b',\n",
" legend=False,\n",
" title='Evolution of Facebook Open Price'\n",
")"
]
},
{
"cell_type": "markdown",
"id": "20d611f4-85ef-4cad-840c-7ed591a712f9",
"metadata": {},
"source": [
"#### Instead of using the `style='-b'` keyword argument, we can use the `color` and `linestype` keyword arguments."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf24002f-fc87-43d5-8fc4-2909e68a69dc",
"metadata": {},
"outputs": [],
"source": [
"fb.plot(\n",
" kind='line',\n",
" y='open',\n",
" figsize=(10, 5),\n",
" color='blue',\n",
" linestyle='solid',\n",
" legend=False,\n",
" title='Evolution of Facebook Open Price'\n",
")"
]
},
{
"cell_type": "markdown",
"id": "284be798-60f1-4e0a-aa6f-458298aab270",
"metadata": {},
"source": [
"#### Plot many lines at once by passing a list of the columns to plot. For example, plot the open, high, low, and close (OHLC) prices of the first week (`1W`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca0c4e67-0a4f-4b2e-bf96-54d72f209e7f",
"metadata": {},
"outputs": [],
"source": [
"fb.first('1W')"
]
},
{
"cell_type": "markdown",
"id": "e4abd63a-0714-4b60-866d-4d3f37522a7f",
"metadata": {},
"source": [
"#### A line plot is the default. Calling `autoscale()` at the end adds space between the line plots and the x- and y-axes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "913d84a3-3c56-403c-878d-cf5795d2e44c",
"metadata": {},
"outputs": [],
"source": [
"fb.first('1W').plot(\n",
" y=['open', 'high', 'low', 'close'],\n",
" style=['o-b', '--r', ':k', '.-g'],\n",
" title='Facebook OHLC Prices during 1st Week of Trading 2018'\n",
").autoscale()"
]
},
{
"cell_type": "markdown",
"id": "ede96456-6a6c-4184-a160-1e1bd32eb7d1",
"metadata": {},
"source": [
"## Scatter plots"
]
},
{
"cell_type": "markdown",
"id": "7617d9ad-62bb-4323-9188-de57c59b8eb0",
"metadata": {},
"source": [
"#### The dataframe method `assign()` creates a new dataframe with a new column."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99fcbfbd-508f-4fcf-989d-851f88a2c993",
"metadata": {},
"outputs": [],
"source": [
"fb.assign(max_abs_change = fb.high - fb.low).head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f60341e-f5c0-4b9d-b4bb-3cf573aa559d",
"metadata": {},
"outputs": [],
"source": [
"from random import randint\n",
"\n",
"fb.assign(\n",
" max_abs_change=fb.high - fb.low\n",
").plot(\n",
" kind='scatter', x='volume', y='max_abs_change',\n",
" title='Facebook Daily High - Low vs. log(Volume Traded)',\n",
" logx=True, alpha=0.25\n",
")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "a5d77d12-7c14-4f2f-acf5-1adf2bef0485",
"metadata": {},
"source": [
"## Hexbins\n",
"#### Hexbins divide up the plot into hexagons, which are shaded according to the density of points there."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5cc73c14-287b-482d-aaba-aadbc0e06f5a",
"metadata": {},
"outputs": [],
"source": [
"fb.assign(\n",
" log_volume=np.log(fb.volume),\n",
" max_abs_change=fb.high - fb.low\n",
").plot(\n",
" kind='hexbin',\n",
" x='log_volume',\n",
" y='max_abs_change',\n",
" title='Facebook Daily High - Low vs. log(Volume Traded)',\n",
" colormap='gray_r',\n",
" gridsize=20, \n",
" sharex=False # we have to pass this to see the x-axis\n",
")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "9f54d8d2-2a0a-4ee8-a416-126d9967517e",
"metadata": {},
"source": [
"## Histograms"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca9e1b45-3407-4360-8c0d-179939adb8d3",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"fb.volume.plot(\n",
" kind='hist', \n",
" title='Histogram of Daily Volume Traded in Facebook Stock'\n",
")\n",
"plt.xlabel('Volume traded') # label the x-axis"
]
},
{
"cell_type": "markdown",
"id": "e1a8f07b-8da1-442d-a375-1bf89d89f093",
"metadata": {},
"source": [
"#### Use the `alpha` parameter to compare distributions by overlapping histograms. For example, compare the open and closing prices:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d163f92-dda1-4f51-87ac-a024a6b39842",
"metadata": {},
"outputs": [],
"source": [
"fig, axes = plt.subplots(figsize=(8, 5))\n",
"\n",
"fb[['open', 'close']].plot(\n",
" kind='hist', ax=axes, alpha=0.5, \n",
" label=['open', 'close'], legend=True,\n",
" title='Comparison of opening and closing prices'\n",
")\n",
"\n",
"plt.xlabel('Prices')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "2e2948b1-184c-4a52-b91d-ce9cdf53d37f",
"metadata": {},
"source": [
"### Kernel Density Estimation (KDE)\n",
"#### Estimate the probability density function (PDF). Pass `kind='kde'` for an estimate of the probability density function (PDF). For example, estimate the probability of getting a particular stock price:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b71342ea-f2cb-4aa8-9455-abfcdff6f7a9",
"metadata": {},
"outputs": [],
"source": [
"fb.high.plot(\n",
" kind='kde', \n",
" title='KDE of Daily High Price for Facebook Stock'\n",
")\n",
"\n",
"plt.xlabel('Price ($)')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "4a62307f-8921-40ba-b36a-44e3d06d6a38",
"metadata": {},
"source": [
"#### The `plot()` method returns an `Axes` object. Store this for additional customization of the plot, or pass it into another call to `plot()` as the `ax` argument to add to the original plot. \n",
"\n",
"#### It can often be helpful to view the KDE superimposed on top of the histogram. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19e1ef64-449a-41b4-a99c-3f48125f078b",
"metadata": {},
"outputs": [],
"source": [
"# First plot: the histogram\n",
"ax_hist = fb.high.plot(kind='hist', density=True, alpha=0.5)\n",
"\n",
"# Second plot: the KDE\n",
"fb.high.plot(\n",
" ax=ax_hist, kind='kde', color='blue', \n",
" title=\"Distribution of Facebook Stock's Daily High Price in 2018\"\n",
")\n",
"\n",
"plt.xlabel('Price ($)')"
]
},
{
"cell_type": "markdown",
"id": "d6aea152-3f03-4f91-9045-d8e80400e29e",
"metadata": {},
"source": [
"## Box plots\n",
"#### Pass `kind='box'` to create box plots. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6309ee8-d395-40c2-ac6d-d77cb62f3ee7",
"metadata": {},
"outputs": [],
"source": [
"fb.iloc[:,:4].plot(\n",
" kind='box', title='Facebook OHLC Prices Box Plot'\n",
")\n",
"\n",
"plt.ylabel('price ($)')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "739da044-0aa4-4f5d-be22-ac8448d95caa",
"metadata": {},
"source": [
"#### A notched box plot represents a 95% confidence interval around the median. Pass the keyword argument `notch=True`. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3bd511d-d458-4e63-bb6b-fc5b24c6fedd",
"metadata": {},
"outputs": [],
"source": [
"fb.iloc[:,:4].plot(\n",
" kind='box', \n",
" title='Facebook OHLC Prices Box Plot', \n",
" notch=True)\n",
"\n",
"plt.ylabel('price ($)')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "4ff78f6e-ce3e-43cc-b182-136a0ec87d20",
"metadata": {},
"source": [
"## Subplots"
]
},
{
"cell_type": "markdown",
"id": "62a1e5ac-dc27-49f4-8f1b-a3af12b1de9a",
"metadata": {},
"source": [
"#### Create subplots by passing `subplots=True` and (optionally) specifying the `layout` in a tuple of `(rows, columns)`. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fd16c0ba-2c03-4cbf-b861-61d2f2ad9ffe",
"metadata": {},
"outputs": [],
"source": [
"fb.plot(\n",
" kind='line',\n",
" subplots=True,\n",
" layout=(3, 2),\n",
" figsize=(15, 10),\n",
" title='Facebook Stock 2018'\n",
")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "192280bb-4227-4e4f-a59b-f902deca4cf2",
"metadata": {},
"source": [
"#### Since we didn't specify which columns to graph, `pandas` graphed all five of them. They automatically shared the x-axis scale (the dates), but there are different y-axis scales."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3e1c4cf3-c9af-44e3-a4c4-79e68ca86825",
"metadata": {},
"outputs": [],
"source": [
"plt.close()"
]
},
{
"cell_type": "markdown",
"id": "778c402f-df39-4abf-b3e2-69eef28ed965",
"metadata": {},
"source": [
"#### Adapted from ***Hands-On Data Analysis with Pandas, second edition***, by Stephanie Molin, Packt 2021, ISBN 978-1-80056-345-2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "521523df-fdd2-47b2-a114-fc674c35db2e",
"metadata": {},
"outputs": [],
"source": [
"# Additional material (c) 2024 by Ronald Mak"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}