{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "###
San Jose State University
Department of Applied Data Science

**DATA 200
Computational Programming for Data Analytics**

Spring 2024
Instructor: Ron Mak
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Bar Charts: Movie Comparison" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bar chart parameters\n", "#### To create a bar chart, call `plt.bar(x, height, width)`, where:\n", "- #### *x* is the sequence of x coordinates of the bars\n", "- #### *y* is the sequence of the heights of the bars\n", "- #### *width* is the width of all the bars (optional, default is 0.8)\n", "#### Example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.bar(['Adam', 'Betty', 'Chuck', 'Didi'],\n", " [75, 97, 85, 92])\n", "plt.title('Test scores')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bar chart with subcategories\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Recall that the top visualization container is a `Figure` object. It can contain multiple `Axes` objects. An `Axes` object is an actual plot or subplot, depending on whether we draw a single plot or multiple plots. An `Axes` object itself contains multiple subobjects, including ones that control axes, tick marks, legends, title, textboxes, grid, and other objects.\n", "#### **NOTE:** Do not confuse `Axes` object (where the plot lives) with the x ***axis*** and the y ***axis***, or the x and y ***axes*** which are parts of the plot.\n", "#### All the objects are customizable. In the example below, we explictly get the current `Axes` object with a call to function `gca()` in order to set some of its attributes.\n", "```\n", "ax = plt.gca()\n", "ax.set_xticklabels(labels)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "labels = ['Adam', 'Betty', 'Chuck', 'Didi']\n", "x = np.arange(len(labels))\n", "\n", "bar_width = 0.4\n", "\n", "# Display the bars side-by-side.\n", "plt.bar(x - bar_width/2, [75, 97, 85, 92],\n", " width=bar_width, label='Midterm')\n", "plt.bar(x + bar_width/2, [80, 97, 88, 99], \n", " width=bar_width, label='Final')\n", "\n", "# Get the current Axes object\n", "ax = plt.gca()\n", "\n", "# Must set ticks and labels manually.\n", "plt.xticks(x)\n", "ax.set_xticklabels(labels)\n", "\n", "# Graph title and legend.\n", "plt.title('Midterm and Final Test scores')\n", "plt.legend()\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We will use a bar plot to compare movie scores. You are given five movies with scores from Rotten Tomatoes. The Tomatometer is the percentage of approved Tomatometer critics who have given a positive review for the movie. The Audience Score is the percentage of users who have given a score of 3.5 or higher out of 5. Compare these two scores among the five movies." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "movie_scores = pd.read_csv('movie_scores.csv')\n", "movie_scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Use `matplotlib` to create a visually-appealing bar plot comparing the two scores for all five movies.\n", "#### Use the movie titles as labels for the x-axis. Use percentages in an interval of 20 for the y-axis and minor ticks in interval of 5. Add a legend and a suitable title to the plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create the figure.\n", "plt.figure(figsize=(10, 5), dpi=300)\n", "\n", "# Create the bar plot.\n", "x = np.arange(len(movie_scores['MovieTitle']))\n", "width = 0.3\n", "plt.bar(x - width/2, movie_scores['Tomatometer'], \n", " width, label='Tomatometer')\n", "plt.bar(x + width/2, movie_scores['AudienceScore'], \n", " width, label='Audience Score')\n", "\n", "# Specify ticks.\n", "plt.xticks(x, rotation=10)\n", "plt.yticks(np.arange(0, 101, 20))\n", "\n", "# Get the current Axes object for setting tick labels \n", "# and the horizontal grid\n", "ax = plt.gca()\n", "\n", "# Set axis tick labels.\n", "ax.set_xticklabels(movie_scores['MovieTitle'])\n", "ax.set_yticklabels(['0%', '20%', '40%', '60%', '80%', '100%'])\n", "\n", "# Add minor ticks for y-axis in the interval of 5.\n", "ax.set_yticks(np.arange(0, 100, 5), minor=True)\n", "\n", "# Add major horizontal grid with solid lines.\n", "ax.yaxis.grid(which='major')\n", "\n", "# Add minor horizontal grid with dashed lines.\n", "ax.yaxis.grid(which='minor', linestyle='--')\n", "\n", "# Add title.\n", "plt.title('Movie comparison')\n", "\n", "# Add legend.\n", "plt.legend()\n", "\n", "# Show plot.\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Adapted from ***Data Visualization with Python***, by Mario Döbler and Tim Großmann, Packt 2019, ISBN 978-1-78995-646-7" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Additional material (c) 2024 by Ronald Mak" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }