San Jose State University Department of Applied Data Science
**DATA 200 Computational Programming for Data Analytics**
Spring 2024 Instructor: Ron Mak
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bar Charts: Movie Comparison"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bar chart parameters\n",
"#### To create a bar chart, call `plt.bar(x, height, width)`, where:\n",
"- #### *x* is the sequence of x coordinates of the bars\n",
"- #### *y* is the sequence of the heights of the bars\n",
"- #### *width* is the width of all the bars (optional, default is 0.8)\n",
"#### Example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.bar(['Adam', 'Betty', 'Chuck', 'Didi'],\n",
" [75, 97, 85, 92])\n",
"plt.title('Test scores')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bar chart with subcategories\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Recall that the top visualization container is a `Figure` object. It can contain multiple `Axes` objects. An `Axes` object is an actual plot or subplot, depending on whether we draw a single plot or multiple plots. An `Axes` object itself contains multiple subobjects, including ones that control axes, tick marks, legends, title, textboxes, grid, and other objects.\n",
"#### **NOTE:** Do not confuse `Axes` object (where the plot lives) with the x ***axis*** and the y ***axis***, or the x and y ***axes*** which are parts of the plot.\n",
"#### All the objects are customizable. In the example below, we explictly get the current `Axes` object with a call to function `gca()` in order to set some of its attributes.\n",
"```\n",
"ax = plt.gca()\n",
"ax.set_xticklabels(labels)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"labels = ['Adam', 'Betty', 'Chuck', 'Didi']\n",
"x = np.arange(len(labels))\n",
"\n",
"bar_width = 0.4\n",
"\n",
"# Display the bars side-by-side.\n",
"plt.bar(x - bar_width/2, [75, 97, 85, 92],\n",
" width=bar_width, label='Midterm')\n",
"plt.bar(x + bar_width/2, [80, 97, 88, 99], \n",
" width=bar_width, label='Final')\n",
"\n",
"# Get the current Axes object\n",
"ax = plt.gca()\n",
"\n",
"# Must set ticks and labels manually.\n",
"plt.xticks(x)\n",
"ax.set_xticklabels(labels)\n",
"\n",
"# Graph title and legend.\n",
"plt.title('Midterm and Final Test scores')\n",
"plt.legend()\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### We will use a bar plot to compare movie scores. You are given five movies with scores from Rotten Tomatoes. The Tomatometer is the percentage of approved Tomatometer critics who have given a positive review for the movie. The Audience Score is the percentage of users who have given a score of 3.5 or higher out of 5. Compare these two scores among the five movies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"movie_scores = pd.read_csv('movie_scores.csv')\n",
"movie_scores"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use `matplotlib` to create a visually-appealing bar plot comparing the two scores for all five movies.\n",
"#### Use the movie titles as labels for the x-axis. Use percentages in an interval of 20 for the y-axis and minor ticks in interval of 5. Add a legend and a suitable title to the plot."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create the figure.\n",
"plt.figure(figsize=(10, 5), dpi=300)\n",
"\n",
"# Create the bar plot.\n",
"x = np.arange(len(movie_scores['MovieTitle']))\n",
"width = 0.3\n",
"plt.bar(x - width/2, movie_scores['Tomatometer'], \n",
" width, label='Tomatometer')\n",
"plt.bar(x + width/2, movie_scores['AudienceScore'], \n",
" width, label='Audience Score')\n",
"\n",
"# Specify ticks.\n",
"plt.xticks(x, rotation=10)\n",
"plt.yticks(np.arange(0, 101, 20))\n",
"\n",
"# Get the current Axes object for setting tick labels \n",
"# and the horizontal grid\n",
"ax = plt.gca()\n",
"\n",
"# Set axis tick labels.\n",
"ax.set_xticklabels(movie_scores['MovieTitle'])\n",
"ax.set_yticklabels(['0%', '20%', '40%', '60%', '80%', '100%'])\n",
"\n",
"# Add minor ticks for y-axis in the interval of 5.\n",
"ax.set_yticks(np.arange(0, 100, 5), minor=True)\n",
"\n",
"# Add major horizontal grid with solid lines.\n",
"ax.yaxis.grid(which='major')\n",
"\n",
"# Add minor horizontal grid with dashed lines.\n",
"ax.yaxis.grid(which='minor', linestyle='--')\n",
"\n",
"# Add title.\n",
"plt.title('Movie comparison')\n",
"\n",
"# Add legend.\n",
"plt.legend()\n",
"\n",
"# Show plot.\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Adapted from ***Data Visualization with Python***, by Mario Döbler and Tim Großmann, Packt 2019, ISBN 978-1-78995-646-7"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Additional material (c) 2024 by Ronald Mak"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}