{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <center>San Jose State University<br>Department of Applied Data Science<br><br>**DATA 200<br>Computational Programming for Data Analytics**<br><br>Spring 2024<br>Instructor: Ron Mak</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scatter Plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use a scatter plot to show correlation within a dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example: Animal Statistics\n",
    "#### You are given a dataset containing information about various animals. Visualize correlation between animal attributes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv('anage_data.csv')\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The dataset is not complete. Filter the data so you end up with samples containing a body mass and a maximum longevity. Sort the data according to the animal class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "longevity = 'Maximum longevity (yrs)'\n",
    "mass      = 'Body mass (g)'\n",
    "\n",
    "# Remove records with missing values. \n",
    "data = data[   np.isfinite(data[longevity]) \n",
    "             & np.isfinite(data[mass])\n",
    "           ]\n",
    "\n",
    "# Sort according to class.\n",
    "amphibia = data[data['Class'] == 'Amphibia']\n",
    "aves     = data[data['Class'] == 'Aves']\n",
    "mammalia = data[data['Class'] == 'Mammalia']\n",
    "reptilia = data[data['Class'] == 'Reptilia']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create a scatter plot visualizing the correlation between the body mass and the maximum longevity. Use different colors for grouping data samples according to their class. Add a legend, labels and a title. Use a log scale for both the x-axis and y-axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the figure.\n",
    "plt.figure(figsize=(10, 6), dpi=300)\n",
    "\n",
    "# Create the scatter plots.\n",
    "plt.scatter(amphibia[mass], amphibia[longevity], label='Amphibia')\n",
    "plt.scatter(aves[mass],     aves[longevity],     label='Aves')\n",
    "plt.scatter(mammalia[mass], mammalia[longevity], label='Mammalia')\n",
    "plt.scatter(reptilia[mass], reptilia[longevity], label='Reptilia')\n",
    "\n",
    "ax = plt.gca()\n",
    "\n",
    "# Set log scales.\n",
    "ax.set_xscale('log')\n",
    "ax.set_yscale('log')\n",
    "\n",
    "# Add labels.\n",
    "plt.xlabel('Body mass in grams')\n",
    "plt.ylabel('Maximum longevity in years')\n",
    "\n",
    "# Add a legend and a color bar.\n",
    "plt.legend()\n",
    "plt.colorbar()\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Adapted from ***Data Visualization with Python***, by Mario Döbler and Tim Großmann, Packt 2019, ISBN 978-1-78995-646-7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Additional material (c) 2024 by Ronald Mak"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
