{ "cells": [ { "cell_type": "markdown", "id": "802e2a95-0f59-4f64-bc2d-d5736e852fad", "metadata": {}, "source": [ "###
San Jose State University
Department of Applied Data Science

**DATA 200
Computational Programming for Data Analytics**

Spring 2024
Instructor: Ron Mak

**Assignment #10
Dataframes and Simple Bar Charts**

Assigned: April 11, 2024
Due: April 18 at 5:30 PM

Each problem is worth 10 points
160 points maximum
Individual work only!
" ] }, { "cell_type": "markdown", "id": "02ad2c53-1228-40b2-b2c1-74d86ab8c1e3", "metadata": {}, "source": [ "# Dataframes and bar charts" ] }, { "cell_type": "markdown", "id": "17d912f7-8858-479f-81fb-bcc35dd498a5", "metadata": {}, "source": [ "#### This assignment will give you practice working with `pandas` dataframes and making simple bar charts of the data. Input datafile `covid_data.csv` contains Covid data for each day of the month of August 2020 for China, Germany, India, Russia, the United Kingdom, and the United States. Each problem is worth 10 points." ] }, { "attachments": {}, "cell_type": "markdown", "id": "6ed86ede-4330-41e7-bb34-e0b67dcf9ef8", "metadata": {}, "source": [ "#### **PROBLEM 1.** Read the CSV file `covid_data.csv` into a dataframe and display the dataframe. Include the keyword argument `index_col=0` in the call to `read_csv()` to make the `date` column the index." ] }, { "cell_type": "markdown", "id": "9ed8f4fc-25ce-4d2f-b89b-70ed0b1c3245", "metadata": {}, "source": [ "#### **PROBLEM 2.** Display all the column headers of your dataframe." ] }, { "cell_type": "markdown", "id": "942a9b74-23c1-4e3f-bb13-d55b672569e1", "metadata": {}, "source": [ "#### **PROBLEM 3.** Display the count, mean, standard deviation, minimum, maximum, and the 25th, 50th, and 75th percentiles of each numerical column." ] }, { "cell_type": "markdown", "id": "167b5c0e-7c44-4205-a313-a264033030fb", "metadata": {}, "source": [ "#### **PROBLEM 4.** For each of the countries, display in a dataframe the number of Covid cases for each day from the 15th through the 20th day of the month, inclusive. Use the `loc` attribute of `DataFrame`." ] }, { "cell_type": "markdown", "id": "c693b527-2519-4a88-aee2-f8e823200510", "metadata": {}, "source": [ "#### **PROBLEM 5.** Display the same data as in the previous problem, but this time use the `iloc` attribute." ] }, { "cell_type": "markdown", "id": "a83988a9-80a9-40d4-a19e-94fb3837bb07", "metadata": {}, "source": [ "#### **PROBLEM 6.** Use the `Series` method `sum()` to calculate and print the total number of deaths for each country during the month.\n", "##### Some countries were more open and accurate in their reporting than others." ] }, { "cell_type": "markdown", "id": "5ee0e2ae-70be-44c6-90f9-ed9d86a05cfe", "metadata": {}, "source": [ "#### **PROBLEM 7.** Create a simple bar chart of the deaths in China over each day of the month. " ] }, { "cell_type": "markdown", "id": "2237d0ff-079d-4b51-847f-cba3fd948ca7", "metadata": {}, "source": [ "#### See [Matplotlib Bar Chart](https://pythonbasics.org/matplotlib-bar-chart/) for a tutorial on making simple bar charts. After executing the following cell, you should be able to create each bar chart for this and subsequent problems with a call to `plt.bar()`." ] }, { "cell_type": "code", "execution_count": 1, "id": "39e56074-36bf-4642-aa66-24c0a6f13188", "metadata": { "tags": [] }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "d518a9fb-38c6-48ed-9fe1-7b32434f0d77", "metadata": {}, "source": [ "#### **PROBLEM 8.** Create a simple bar chart of the deaths in India over each day of the month. " ] }, { "cell_type": "markdown", "id": "2546800f-a710-4b2f-b132-5025320d3d50", "metadata": {}, "source": [ "#### **PROBLEM 9.** Create a simple bar chart of the deaths in the US over each day of the month. (Can you explain the multiple peaks in the chart?) " ] }, { "cell_type": "markdown", "id": "c4b584bb-8f76-4a2e-a695-3ec5102be587", "metadata": {}, "source": [ "#### **PROBLEM 10.** Create a simple bar chart of the running total of deaths in the US day by day during the month (i.e., each bar should represent the sum of deaths of that day and the previous days). " ] }, { "cell_type": "markdown", "id": "e5eeb9fc-5493-46ea-b785-564592496661", "metadata": {}, "source": [ "#### **PROBLEM 11.** Create a print a `Series` that comprises the cumulative values from the 31st day of the month for each country. Do not use any `for` statements. List comprehensions are OK." ] }, { "cell_type": "markdown", "id": "074995ba-88c2-4f05-b76a-32b204660000", "metadata": {}, "source": [ "#### **PROBLEM 12.** Create a simple bar chart of the values for each country from the previous problem. " ] }, { "cell_type": "markdown", "id": "8ab86daf-68ac-459a-9827-3b6f02f81825", "metadata": {}, "source": [ "#### **PROBLEM 13.** Print the average number of deaths during the month for each country. Do not use `for` statements. List comprehensions are OK." ] }, { "cell_type": "markdown", "id": "763ae037-7359-4568-a8fd-8721bf77c34a", "metadata": {}, "source": [ "#### **PROBLEM 14.** Create a simple bar chart of the average values for each country from the previous problem. " ] }, { "cell_type": "markdown", "id": "0955cefe-c243-47c8-846d-c7609bb35b21", "metadata": {}, "source": [ "#### **PROBLEM 15.** Define a function `continent_totals()` that takes a `DataFrame` parameter and returns a dictionary whose keys are the names of continents in the dataframe, such as `'Asia'`, `'Europe'`, `'Africa'`, `'America'`, etc. and the corresponding values are the total number of deaths during the month in that continent. For example, one element might be `'Asia': 99999`. Assume that the dataframe has columns named *country* + `' continent'` and *country* + `' deaths'`, such as `'Belgium continent'` and `'France deaths'`. The function should **not** assume any foreknowledge of what continents and countries are in the dataframe. You can use `for` statements and any other Python statements. Call the function and print the returned dictionary." ] }, { "cell_type": "markdown", "id": "376ea27d-c228-4c7c-b5bf-902f6b1353cd", "metadata": {}, "source": [ "#### **PROBLEM 16.** Create a simple bar chart of the total number deaths in each of the continents contained in the dataframe." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }