### <center>San Jose State University<br>Department of Applied Data Science<br><br>**DATA 200<br>Computational Programming for Data Analytics**<br><br>Spring 2024<br>Instructor: Ron Mak</center>

# Plot Facebook stock prices from the `pandas` module

#### `pandas` is the primary module for doing data analytics, and graphs are important for data visualizations. Therefore, both `Series` and `DataFrame` objects have a `plot()` method that use `matplotlib` to draw graphs. It is a convenient and somewhat simplified way to draw graphs directly from `Series` and `DataFrame` objects.

## `plot()` parameters

#### A call to `plot()` implicitly makes calls to `matplotlib`. The `kind` argument specifies the type of graph. The graph type determines what other arguments are necessary. Some commonly used parameters:
| Parameter | Purpose | Data Type |
| --- | --- | --- |
| `kind` | Determines the plot type | String |
| `x`/`y` | Column(s) to plot on the *x*-axis/*y*-axis | String or list |
| `ax` | Draws the plot on the `Axes` object provided | `Axes` |
| `subplots` | Determines whether to make subplots | Boolean |
| `layout` | Specifies how to arrange the subplots | Tuple of `(rows, columns)` |
| `figsize` | Size to make the `Figure` object | Tuple of `(width, height)` | 
| `title` | The title of the plot or subplots | String for the plot title or a list of strings for subplot titles |
| `legend` | Determines whether to show the legend | Boolean |
| `label` | What to call an item in the legend | String if a single column is being plotted; otherwise, a list of strings |
| `style` | `matplotlib` style strings for each item being plotted | String if a single column is being plotted; otherwise, a list of strings |
| `color` | The color to plot the item in | String or red, green, blue tuple if a single column is being plotted; otherwise, a list |
| `colormap` | The colormap to use | String or `matplotlib` colormap object |
| `logx`/`logy`/`loglog` | Determines whether to use a logarithmic scale for the *x*-axis, *y*-axis, or both | Boolean |
| `xticks`/`yticks` | Determines where to draw the ticks on the *x*-axis/*y*-axis | List of values |
| `xlim`/`ylim` | The axis limits for the *x*-axis/*y*-axis | Tuple of the form `(min, max)` |
| `rot` | The angle to write the tick labels at | Integer |
| `sharex`/`sharey` | Determines whether to have subplots share the *x*-axis/*y*-axis | Boolean |
| `fontsize` | Controls the size of the tick labels | Integer |
| `grid` | Turns on/off the grid lines | Boolean |


In [7]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline

## Line graphs

In [None]:
fb = pd.read_csv('fb_stock_prices_2018.csv', 
                 index_col='date', parse_dates=True)
fb

In [None]:
fb.plot(
    kind='line',
    y='open',
    figsize=(10, 5),
    style='-b',
    legend=False,
    title='Evolution of Facebook Open Price'
)

#### Instead of using the `style='-b'` keyword argument, we can use the `color` and `linestype` keyword arguments.

In [None]:
fb.plot(
    kind='line',
    y='open',
    figsize=(10, 5),
    color='blue',
    linestyle='solid',
    legend=False,
    title='Evolution of Facebook Open Price'
)

#### Plot many lines at once by  passing a list of the columns to plot. For example, plot the open, high, low, and close (OHLC) prices of the first week (`1W`).

In [None]:
fb.first('1W')

#### A line plot is the default. Calling `autoscale()` at the end adds space between the line plots and the x- and y-axes.

In [None]:
fb.first('1W').plot(
    y=['open', 'high', 'low', 'close'],
    style=['o-b', '--r', ':k', '.-g'],
    title='Facebook OHLC Prices during 1st Week of Trading 2018'
).autoscale()

## Scatter plots

#### The dataframe method `assign()` creates a new dataframe with a new column.

In [None]:
fb.assign(max_abs_change = fb.high - fb.low).head()

In [None]:
from random import randint

fb.assign(
    max_abs_change=fb.high - fb.low
).plot(
    kind='scatter', x='volume', y='max_abs_change',
    title='Facebook Daily High - Low vs. log(Volume Traded)',
    logx=True, alpha=0.25
)

plt.show()

## Hexbins
#### Hexbins divide up the plot into hexagons, which are shaded according to the density of points there.

In [None]:
fb.assign(
    log_volume=np.log(fb.volume),
    max_abs_change=fb.high - fb.low
).plot(
    kind='hexbin',
    x='log_volume',
    y='max_abs_change',
    title='Facebook Daily High - Low vs. log(Volume Traded)',
    colormap='gray_r',
    gridsize=20, 
    sharex=False # we have to pass this to see the x-axis
)

plt.show()

## Histograms

In [None]:
fb.volume.plot(
    kind='hist', 
    title='Histogram of Daily Volume Traded in Facebook Stock'
)
plt.xlabel('Volume traded') # label the x-axis

#### Use the `alpha` parameter  to compare distributions by overlapping histograms. For example, compare the open and closing prices:

In [None]:
fig, axes = plt.subplots(figsize=(8, 5))

fb[['open', 'close']].plot(
    kind='hist', ax=axes, alpha=0.5, 
    label=['open', 'close'], legend=True,
    title='Comparison of opening and closing prices'
)

plt.xlabel('Prices')
plt.show()

### Kernel Density Estimation (KDE)
#### Estimate the probability density function (PDF). Pass `kind='kde'` for an estimate of the probability density function (PDF). For example, estimate the probability of getting a particular stock price:

In [None]:
fb.high.plot(
    kind='kde', 
    title='KDE of Daily High Price for Facebook Stock'
)

plt.xlabel('Price ($)')
plt.show()

#### The `plot()` method returns an `Axes` object. Store this for additional customization of the plot, or pass it into another call to `plot()` as the `ax` argument to add to the original plot. 

#### It can often be helpful to view the KDE superimposed on top of the histogram. For example:

In [None]:
# First plot: the histogram
ax_hist = fb.high.plot(kind='hist', density=True, alpha=0.5)

# Second plot: the KDE
fb.high.plot(
    ax=ax_hist, kind='kde', color='blue', 
    title="Distribution of Facebook Stock's Daily High Price in 2018"
)

plt.xlabel('Price ($)')

## Box plots
#### Pass `kind='box'` to create box plots. For example:

In [None]:
fb.iloc[:,:4].plot(
    kind='box', title='Facebook OHLC Prices Box Plot'
)

plt.ylabel('price ($)')
plt.show()

#### A notched box plot represents a 95% confidence interval around the median. Pass the keyword argument `notch=True`. For example:

In [None]:
fb.iloc[:,:4].plot(
    kind='box', 
    title='Facebook OHLC Prices Box Plot', 
    notch=True)

plt.ylabel('price ($)')
plt.show()

## Subplots

#### Create subplots by passing `subplots=True` and (optionally) specifying the `layout` in a tuple of `(rows, columns)`. For example:

In [None]:
fb.plot(
    kind='line',
    subplots=True,
    layout=(3, 2),
    figsize=(15, 10),
    title='Facebook Stock 2018'
)

plt.show()

#### Since we didn't specify which columns to graph, `pandas` graphed all five of them. They automatically shared the x-axis scale (the dates), but there are different y-axis scales.

In [None]:
plt.close()

#### Adapted from ***Hands-On Data Analysis with Pandas, second edition***, by Stephanie Molin, Packt 2021, ISBN 978-1-80056-345-2

In [None]:
# Additional material (c) 2024 by Ronald Mak