{ "cells": [ { "cell_type": "markdown", "id": "a1b4eb26-3087-4ba4-9f27-76cdebd283e2", "metadata": {}, "source": [ "###
San Jose State University
Department of Applied Data Science

**DATA 200
Computational Programming for Data Analytics**

Spring 2024
Instructor: Ron Mak
" ] }, { "cell_type": "markdown", "id": "fba8bba9-0d0c-4c5e-9b4c-a275802a8608", "metadata": {}, "source": [ "# Plot Covid data from the `pandas` module" ] }, { "cell_type": "code", "execution_count": null, "id": "e45f91ac-695a-4e29-b3dd-08f0d9841b43", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "id": "74f77977-1c7f-4f09-a232-070e43c85d12", "metadata": {}, "outputs": [], "source": [ "covid = pd.read_csv('covid19_cases.csv',\n", " index_col='dateRep')\n", "covid.head()" ] }, { "cell_type": "markdown", "id": "6d4d9259-925c-47d4-a332-adbeb8405ab3", "metadata": {}, "source": [ "## Line graphs\n", "#### Make subplots that each have a few variables in them for comparison by using the `ax` parameter. For example, look at daily new COVID-19 cases in China, Spain, Italy, the USA, Brazil, and India. Since there is a lot of fluctuation in these values, plot the 7-day moving average of new cases using the `rolling()` method." ] }, { "cell_type": "code", "execution_count": null, "id": "463fcb86-71cd-48a4-8de5-efc81e7f5b86", "metadata": {}, "outputs": [], "source": [ "new_cases_rolling_average = covid.pivot_table(\n", " index=covid.index, \n", " columns='countriesAndTerritories', \n", " values='cases'\n", ").rolling(7).mean()\n", "\n", "new_cases_rolling_average" ] }, { "cell_type": "markdown", "id": "a7a814d5-16c3-4a9f-bab0-30f41ce9b568", "metadata": {}, "source": [ "#### Rather than create a separate plot for each country (which makes it harder to compare) or plot them all together (which will make it difficult to see the smaller values), plot countries that have had a similar number of cases in the same subplot:" ] }, { "cell_type": "code", "execution_count": null, "id": "97e0b098-a35c-451b-8f57-b11c4eb61194", "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n", "\n", "# China only\n", "new_cases_rolling_average[['China']].plot(ax=axes[0], style='-.c')\n", "\n", "# Italy and Spain\n", "new_cases_rolling_average[['Italy', 'Spain']].plot(\n", " ax=axes[1], style=['-', '--'], \n", " title='7-day rolling average of new COVID-19 cases\\n(source: ECDC)'\n", ")\n", "\n", "# Brazil, India, and the U.S.\n", "new_cases_rolling_average[['Brazil', 'India', \n", " 'United_States_of_America']].plot(\n", " ax=axes[2], style=['--', ':', '-']\n", ")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "a05154c2-f8f5-4bbc-8ada-b12e8b2550fe", "metadata": {}, "source": [ "## Area plot" ] }, { "cell_type": "code", "execution_count": null, "id": "d5bf111e-2e0a-4034-aa93-a2ce73d3cd1f", "metadata": {}, "outputs": [], "source": [ "# All countries other than Brazil, India, and USA.\n", "dropped_cols = [\n", " col for col in new_cases_rolling_average.columns \n", " if col not in ['Brazil', 'India', \n", " 'United_States_of_America']\n", "]\n", "\n", "# Drop all countries but keep Brazil, India, and USA.\n", "new_cases_rolling_average.drop(columns=dropped_cols).plot(\n", " kind='area', figsize=(15, 5), \n", " title='7-day rolling average of new COVID-19 cases'\n", ")\n", "\n", "plt.xlabel('Date')\n", "plt.ylabel('Cases')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "8a37b21e-3144-4d22-9700-2d08005d48d1", "metadata": {}, "outputs": [], "source": [ "plt.close()" ] }, { "cell_type": "markdown", "id": "19313371-84b4-4b6c-93c3-b44dbf85a183", "metadata": {}, "source": [ "#### Adapted from ***Hands-On Data Analysis with Pandas, second edition***, by Stephanie Molin, Packt 2021, ISBN 978-1-80056-345-2" ] }, { "cell_type": "code", "execution_count": null, "id": "a227531e-d678-48c0-b981-e58e0f535d07", "metadata": {}, "outputs": [], "source": [ "# Additional material (c) 2024 by Ronald Mak" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }