{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7e128b81-9203-4167-b91c-f28672709c74",
   "metadata": {},
   "source": [
    "### <center>San Jose State University<br>Department of Applied Data Science<br><br>**DATA 200<br>Computational Programming for Data Analytics**<br><br>Spring 2023<br>Instructor: Ron Mak</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "059b0eed-44e3-4869-9cf1-b242ab9f6800",
   "metadata": {},
   "source": [
    "# Titanic Survival CSV Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b0e2fda-c789-4634-a3ad-049e1bc262a8",
   "metadata": {},
   "source": [
    "#### We will analyze actual passenger survival data from the sinking of the Titanic. The first few lines of `TitanicSurvival.csv`:\n",
    "```\n",
    "\"\",\"survived\",\"sex\",\"age\",\"passengerClass\"\n",
    "\"Allen, Miss. Elisabeth Walton\",\"yes\",\"female\",29,\"1st\"\n",
    "\"Allison, Master. Hudson Trevor\",\"yes\",\"male\",0.916700006,\"1st\"\n",
    "\"Allison, Miss. Helen Loraine\",\"no\",\"female\",2,\"1st\"\n",
    "\"Allison, Mr. Hudson Joshua Crei\",\"no\",\"male\",30,\"1st\"\n",
    "\"Allison, Mrs. Hudson J C (Bessi\",\"no\",\"female\",25,\"1st\"\n",
    "\"Anderson, Mr. Harry\",\"yes\",\"male\",48,\"1st\"\n",
    "\"Andrews, Miss. Kornelia Theodos\",\"yes\",\"female\",63,\"1st\"\n",
    "\"Andrews, Mr. Thomas Jr\",\"no\",\"male\",39,\"1st\"\n",
    "\"Appleton, Mrs. Edward Dale (Cha\",\"yes\",\"female\",53,\"1st\"\n",
    "\"Artagaveytia, Mr. Ramon\",\"no\",\"male\",71,\"1st\"\n",
    "\"Astor, Col. John Jacob\",\"no\",\"male\",47,\"1st\"\n",
    "\"Astor, Mrs. John Jacob (Madelei\",\"yes\",\"female\",18,\"1st\"\n",
    "\"Aubart, Mme. Leontine Pauline\",\"yes\",\"female\",24,\"1st\"\n",
    "\"Barber, Miss. Ellen Nellie\",\"yes\",\"female\",26,\"1st\"\n",
    "\"Barkworth, Mr. Algernon Henry W\",\"yes\",\"male\",80,\"1st\"\n",
    "\"Baumann, Mr. John D\",\"no\",\"male\",NA,\"1st\"\n",
    "\"Baxter, Mr. Quigg Edmond\",\"no\",\"male\",24,\"1st\"\n",
    "\"Baxter, Mrs. James (Helene DeLa\",\"yes\",\"female\",50,\"1st\"\n",
    "```\n",
    "#### Note that the name column does not have a header. Babies have fractional ages (for example, the first Allison). Not all ages were recorded, and missing ages were entered as `NA` (for example, Baumann).\n",
    "#### After you've successfully identified and accessed the data you want to analyze, a major challenge is how to clean, format, and store the data in **data structures** that are appropriate for the types of analysis you want to do.\n",
    "#### We want to analyze the Titanic survival data along four dimensions: survived (yes or no), sex (male or female), age (several age groups), and passenger class (1st, 2nd, or 3rd)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f77ddc0f-b2b1-488b-894c-6a7dddde8c3e",
   "metadata": {},
   "source": [
    "## Global constants for the four dimensions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "016de89d-773b-44d4-a164-99249cf0daa6",
   "metadata": {},
   "outputs": [],
   "source": [
    "SURVIVAL_GROUPS = 2\n",
    "SURVIVED_NO = 0\n",
    "SURVIVED_YES = 1\n",
    "\n",
    "SEX_GROUPS = 2\n",
    "SEX_MALE = 0\n",
    "SEX_FEMALE = 1\n",
    "\n",
    "AGE_GROUPS = 9\n",
    "AGE_UNKNOWN = 0\n",
    "AGE_BABY = 1      # age < 1\n",
    "AGE_TODDLER = 2   #  1 <= age < 2\n",
    "AGE_CHILD = 3     #  2 <= age < 13\n",
    "AGE_TEENAGER = 4  # 13 <= age < 20\n",
    "AGE_YOUNG = 5     # 20 <= age < 30\n",
    "AGE_MIDDLE = 6    # 30 <= age < 65\n",
    "AGE_SENIOR = 7    # 65 <= age < 75\n",
    "AGE_ELDERLY = 8   # age >= 75\n",
    "\n",
    "CLASS_GROUPS = 3\n",
    "CLASS_1 = 0\n",
    "CLASS_2 = 1\n",
    "CLASS_3 = 2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3257c4c-401b-4755-9b5b-81baf54eef70",
   "metadata": {},
   "source": [
    "## Create the four-dimensional `numpy` array"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26533398-149f-4b31-af46-64b91892c72d",
   "metadata": {},
   "source": [
    "#### We need to count how many passengers are in each dimension. Therefore, store the counts in a four-dimensional `numpy` array so we can take advantage of `numpy`'s vector and matrix arithmetic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ee436c1-5693-4c71-bde4-bdbe1961bd24",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a3f6f45-962d-4e17-9bef-4de3c034976d",
   "metadata": {},
   "source": [
    "#### First, create a list of zeros of the appropriate length. Then convert the list into the `numpy` array `counts` that we can shape into the four dimensions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98a14b0b-afa2-424d-a3e4-ea41b27da557",
   "metadata": {},
   "outputs": [],
   "source": [
    "multidimensional_list = \\\n",
    "    [0]*SURVIVAL_GROUPS*SEX_GROUPS*AGE_GROUPS*CLASS_GROUPS\n",
    "\n",
    "counts = np.array(multidimensional_list)\\\n",
    "    .reshape(SURVIVAL_GROUPS, SEX_GROUPS, AGE_GROUPS, CLASS_GROUPS)\n",
    "counts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ae20bd4-db26-4c96-b744-5e71672472c2",
   "metadata": {},
   "source": [
    "#### Can you identify the four dimensions in the above **hypercube**?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f19b83d3-77a0-4d2f-906d-a9825396b347",
   "metadata": {},
   "source": [
    "## For graphing later"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97101a11-fd44-4220-8124-89aeba579d5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "age_known_count_1 = 0\n",
    "age_known_count_2 = 0\n",
    "age_known_count_3 = 0\n",
    "\n",
    "age_sum_1 = 0\n",
    "age_sum_2 = 0\n",
    "age_sum_3 = 0\n",
    "\n",
    "ages = []"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61e7a912-3071-4bc1-968e-8ac8625fbaa6",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4b07cab-58fa-47f7-83b6-2dc63edcd6e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "import re\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c3ab26f-2965-4ed3-b7a5-a591f4fc358d",
   "metadata": {},
   "source": [
    "## Read and process the rows of the CSV file\n",
    "#### We will get a **list of values** for each row of the CSV file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e96f23c-4d31-4f69-b574-62431bbd9269",
   "metadata": {},
   "outputs": [],
   "source": [
    "first = True\n",
    "\n",
    "with open('TitanicSurvival.csv', newline='') as titanic_csv_file:\n",
    "    titanic_data = csv.reader(titanic_csv_file, delimiter=',', quotechar='\"')\n",
    "    \n",
    "    # Loop for each row.\n",
    "    for row in titanic_data:\n",
    "        # Ignore the column headers.\n",
    "        if first:\n",
    "            first = False\n",
    "            continue\n",
    "        \n",
    "        # Unpack the row of values.\n",
    "        name, survived, sex, age, pclass = row\n",
    "        \n",
    "        # Convert the passenger's survival status and sex.\n",
    "        survived = SURVIVED_NO if survived == 'no' else SURVIVED_YES\n",
    "        sex      = SEX_FEMALE  if sex == 'female'  else SEX_MALE\n",
    "        \n",
    "        # Convert the passenger class.\n",
    "        if pclass == '1st':\n",
    "            pclass = CLASS_1\n",
    "        elif pclass == '2nd':\n",
    "            pclass = CLASS_2\n",
    "        else:\n",
    "            pclass = CLASS_3\n",
    "        \n",
    "        # Convert the age. Use a regular expression to\n",
    "        # check that it is numeric (and therefore not 'NA').\n",
    "        if re.fullmatch('\\d+(\\.\\d*)?', age):\n",
    "            age = float(age)\n",
    "            ages.append(round(age))\n",
    "            \n",
    "            # Count and sum of known ages in each class.\n",
    "            if pclass == CLASS_1:\n",
    "                age_known_count_1 += 1\n",
    "                age_sum_1 += age\n",
    "            elif pclass == CLASS_2:\n",
    "                age_known_count_2 += 1\n",
    "                age_sum_2 += age\n",
    "            else:\n",
    "                age_known_count_3 += 1\n",
    "                age_sum_3 += age\n",
    "        \n",
    "            # Tally each age group.\n",
    "            if age < 1:\n",
    "                age_group = AGE_BABY\n",
    "            elif age < 2:\n",
    "                age_group = AGE_TODDLER\n",
    "            elif age < 13:\n",
    "                age_group = AGE_CHILD\n",
    "            elif age < 20:\n",
    "                age_group = AGE_TEENAGER\n",
    "            elif age < 30:\n",
    "                age_group = AGE_YOUNG\n",
    "            elif age < 65:\n",
    "                age_group = AGE_MIDDLE\n",
    "            elif age < 75:\n",
    "                age_group = AGE_SENIOR\n",
    "            else:\n",
    "                age_group = AGE_ELDERLY\n",
    "        \n",
    "        # The age was 'NA'.\n",
    "        else:\n",
    "            age = 0\n",
    "            age_group = AGE_UNKNOWN\n",
    "            \n",
    "        # Update the counts.\n",
    "        counts[survived][sex][age_group][pclass] += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99a0de7a-2b66-438b-9b53-6c7cf63b8b9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "counts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f4e984c-ca8f-4026-ae31-3cb4dbd5a4d7",
   "metadata": {},
   "source": [
    "## Total count of passengers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5dffd39-0858-444e-ae86-7ad93ff4508d",
   "metadata": {},
   "outputs": [],
   "source": [
    "total_count = np.sum(counts)\n",
    "total_count"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73be810c-c9bb-417e-9838-749339d1e39d",
   "metadata": {},
   "source": [
    "## Count of males and females"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55d688c6-aebd-4164-9202-78262269cd3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "female_count = np.sum(counts[:, SEX_FEMALE, :, :])\n",
    "male_count   = np.sum(counts[:, SEX_MALE,   :, :])\n",
    "\n",
    "print(f'{female_count = }')\n",
    "print(f'{male_count = }')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e83fe80a-3b6a-4c2d-9300-e1e6f11495fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "male_count + female_count"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "723fec86-81e8-4237-b534-f9b14fe2e027",
   "metadata": {},
   "source": [
    "### Quick tutorial on creating simple Python graphs: [Matplotlib Tutorial](https://www.w3schools.com/python/matplotlib_intro.asp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50436a4f-9f4b-4036-a415-8e9a3b5a4cef",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ['males', 'females']\n",
    "y = [male_count, female_count]\n",
    "\n",
    "plt.bar(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2aecf1a4-ac03-4aed-b7d1-638fd54f9104",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.pie(y, labels=['males', 'females'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b33512b6-c976-4942-9999-e964e752eb09",
   "metadata": {},
   "source": [
    "## Count of passengers in each class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5c77079-8c57-4870-96ba-227a95a9de2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "class_1_count = np.sum(counts[:, :, :, CLASS_1])\n",
    "class_2_count = np.sum(counts[:, :, :, CLASS_2])\n",
    "class_3_count = np.sum(counts[:, :, :, CLASS_3])\n",
    "\n",
    "print(f'{class_1_count = }')\n",
    "print(f'{class_2_count = }')\n",
    "print(f'{class_3_count = }')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d715718-a0e8-4e0c-a81a-04e59542089c",
   "metadata": {},
   "outputs": [],
   "source": [
    "class_1_count + class_2_count + class_3_count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "920b30d6-1e42-428f-8973-85b748ca969b",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ['1st', '2nd', '3rd']\n",
    "y = [class_1_count, class_2_count, class_3_count]\n",
    "\n",
    "plt.bar(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20f3a450-f4fd-479f-8bc6-b82ebabceac4",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.pie(y, labels=['1st', '2nd', '3rd'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "600e68a5-cdcc-4bf4-bded-fb3b652b3bde",
   "metadata": {},
   "source": [
    "## Histogram of overall ages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "575097db-477d-48d9-9231-447d023190cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.hist(ages)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5bd08f0-6cef-4839-8fe6-1ca59d6efacd",
   "metadata": {},
   "source": [
    "### What's the difference between a histogram and a bar chart? See [Difference Between Histogram and Bar Graph](https://keydifferences.com/difference-between-histogram-and-bar-graph.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddeb9b15-e87f-4b65-9592-df21b68b163d",
   "metadata": {},
   "source": [
    "## Average age in each class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23386f93-6f3f-4078-9c71-e7f9f8a7bf44",
   "metadata": {},
   "outputs": [],
   "source": [
    "avg_1st = age_sum_1/age_known_count_1\n",
    "avg_2nd = age_sum_2/age_known_count_2\n",
    "avg_3rd = age_sum_3/age_known_count_3\n",
    "\n",
    "print(f'{avg_1st = :.1f}')\n",
    "print(f'{avg_2nd = :.1f}')\n",
    "print(f'{avg_3rd = :.1f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1af9472a-77f9-4e69-a3cf-d340900941d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ['1st', '2nd', '3rd']\n",
    "y = [avg_1st, avg_2nd, avg_3rd]\n",
    "\n",
    "plt.bar(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b836bbd-39b1-4d17-bec9-c5bebb4798bf",
   "metadata": {},
   "source": [
    "## Count of passengers in each age group"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb603824-ed64-4762-a7ff-a8c83845b41c",
   "metadata": {},
   "outputs": [],
   "source": [
    "age_baby_count     = np.sum(counts[:, :, AGE_BABY,     :])\n",
    "age_toddler_count  = np.sum(counts[:, :, AGE_TODDLER,  :])\n",
    "age_child_count    = np.sum(counts[:, :, AGE_CHILD,    :])\n",
    "age_teenager_count = np.sum(counts[:, :, AGE_TEENAGER, :])\n",
    "age_young_count    = np.sum(counts[:, :, AGE_YOUNG,    :])\n",
    "age_middle_count   = np.sum(counts[:, :, AGE_MIDDLE,   :])\n",
    "age_senior_count   = np.sum(counts[:, :, AGE_SENIOR,   :])\n",
    "age_elderly_count  = np.sum(counts[:, :, AGE_ELDERLY,  :])\n",
    "age_unknown_count  = np.sum(counts[:, :, AGE_UNKNOWN,  :])\n",
    "\n",
    "print(f'{age_baby_count     = :3d}')\n",
    "print(f'{age_toddler_count  = :3d}')\n",
    "print(f'{age_child_count    = :3d}')\n",
    "print(f'{age_teenager_count = :3d}')\n",
    "print(f'{age_young_count    = :3d}')\n",
    "print(f'{age_middle_count   = :3d}')\n",
    "print(f'{age_senior_count   = :3d}')\n",
    "print(f'{age_elderly_count  = :3d}')\n",
    "print(f'{age_unknown_count  = :3d}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ded667a-d40c-4e3c-94ef-f3146a15d049",
   "metadata": {},
   "outputs": [],
   "source": [
    "age_baby_count + age_toddler_count + age_child_count + \\\n",
    "age_teenager_count + age_young_count + age_middle_count + \\\n",
    "age_senior_count + age_elderly_count + age_unknown_count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f99d5ed-3bc2-4f71-8c28-76fd9c90d96d",
   "metadata": {},
   "outputs": [],
   "source": [
    "np.sum(counts[:, :, 0:AGE_GROUPS, :])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "571c18c8-b632-4d46-834a-09b46e1ff67c",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ['baby', 'toddler', 'child', 'teenager', 'young',\n",
    "     'middle', 'senior', 'elderly', 'unknown']\n",
    "y = [age_baby_count, age_toddler_count, age_child_count,\n",
    "     age_teenager_count, age_young_count, age_middle_count,\n",
    "     age_senior_count, age_elderly_count, age_unknown_count]\n",
    "\n",
    "plt.bar(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a5f2797-249e-43ec-b24d-958ea91fecd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [age_baby_count + age_toddler_count, age_child_count,\n",
    "     age_teenager_count, age_young_count, age_middle_count,\n",
    "     age_senior_count + age_elderly_count]\n",
    "\n",
    "\n",
    "plt.pie(y, labels=['baby and toddler', 'child', \n",
    "                   'teenager', 'young','middle', \n",
    "                   'senior and elderly'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6ecba8c-f06f-43a4-ba5e-9727bc95c0f1",
   "metadata": {},
   "source": [
    "## Count of survivors and nonsurvivors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6be0794-76b0-40a9-a941-27e64b7b7ff7",
   "metadata": {},
   "outputs": [],
   "source": [
    "perished_count = np.sum(counts[SURVIVED_NO,  :, :, :])\n",
    "survived_count = np.sum(counts[SURVIVED_YES, :, :, :])\n",
    "\n",
    "print(f'{perished_count = }')\n",
    "print(f'{survived_count = }')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67a5bbec-d9d5-4d45-856d-effa05385134",
   "metadata": {},
   "outputs": [],
   "source": [
    "perished_count + survived_count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94c1d4ad-356b-4cdf-8c8a-4b79181de946",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ['survived', 'perished']\n",
    "y = [survived_count, perished_count]\n",
    "\n",
    "plt.bar(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9912918e-18d8-4d20-be05-d6968f5196bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.pie(y, labels=['survived', 'perished'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ad351e3-8cf4-48cb-9651-7041995da9c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "class_1_survivors = np.sum(counts[SURVIVED_YES, :, :, CLASS_1])\n",
    "class_2_survivors = np.sum(counts[SURVIVED_YES, :, :, CLASS_2])\n",
    "class_3_survivors = np.sum(counts[SURVIVED_YES, :, :, CLASS_3])\n",
    "\n",
    "print(f'{class_1_survivors = }')\n",
    "print(f'{class_2_survivors = }')\n",
    "print(f'{class_3_survivors = }')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9f5d69f-848a-4a45-ad6e-4a6fb2ed2be5",
   "metadata": {},
   "source": [
    "## Survivor percentages by class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0c1c2ec-a7cb-44b5-9b03-af253a0a1b98",
   "metadata": {},
   "outputs": [],
   "source": [
    "class_1_survivor_pct = class_1_survivors/class_1_count\n",
    "class_2_survivor_pct = class_2_survivors/class_2_count\n",
    "class_3_survivor_pct = class_3_survivors/class_3_count\n",
    "\n",
    "print(f'{class_1_survivor_pct = :.1%}')\n",
    "print(f'{class_2_survivor_pct = :.1%}')\n",
    "print(f'{class_3_survivor_pct = :.1%}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab1ec88a-22f1-4763-941d-170f5e53850c",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ['1st', '2nd', '3rd']\n",
    "y = [class_1_survivor_pct, class_2_survivor_pct, class_3_survivor_pct]\n",
    "\n",
    "plt.bar(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29dd31c6-8143-40e6-809d-539648227d49",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [class_1_survivors, class_1_count - class_1_survivors]\n",
    "\n",
    "plt.pie(y, labels=['survived in 1st class', 'perished in 1st class'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0ad1700-eb6d-46f0-810b-b579a0ce0322",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [class_2_survivors, class_2_count - class_2_survivors]\n",
    "\n",
    "plt.pie(y, labels=['survived in 2nd class', 'perished in 2nd class'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ee18a8e-6850-4788-9133-f086d0f4029b",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [class_3_survivors, class_3_count - class_3_survivors]\n",
    "\n",
    "plt.pie(y, labels=['survived in 3rd class', 'perished in 3rd class'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ebe81f5-931b-4da4-875a-920378fccc10",
   "metadata": {},
   "source": [
    "## Survivor percentages by age group"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "feaa4cd9-0c8d-4d29-be8b-78d080362ddd",
   "metadata": {},
   "outputs": [],
   "source": [
    "baby_survivor_pct     = np.sum(counts[SURVIVED_YES, :, AGE_BABY,     :])/age_baby_count\n",
    "toddler_survivor_pct  = np.sum(counts[SURVIVED_YES, :, AGE_TODDLER,  :])/age_toddler_count\n",
    "child_survivor_pct    = np.sum(counts[SURVIVED_YES, :, AGE_CHILD,    :])/age_child_count\n",
    "teenager_survivor_pct = np.sum(counts[SURVIVED_YES, :, AGE_TEENAGER, :])/age_teenager_count\n",
    "young_survivor_pct    = np.sum(counts[SURVIVED_YES, :, AGE_YOUNG,    :])/age_young_count\n",
    "middle_survivor_pct   = np.sum(counts[SURVIVED_YES, :, AGE_MIDDLE,   :])/age_middle_count\n",
    "senior_survivor_pct   = np.sum(counts[SURVIVED_YES, :, AGE_SENIOR,   :])/age_senior_count\n",
    "elderly_survivor_pct  = np.sum(counts[SURVIVED_YES, :, AGE_ELDERLY   :])/age_elderly_count\n",
    "\n",
    "print(f'{baby_survivor_pct     = :5.1%}')\n",
    "print(f'{toddler_survivor_pct  = :5.1%}')\n",
    "print(f'{child_survivor_pct    = :5.1%}')\n",
    "print(f'{teenager_survivor_pct = :5.1%}')\n",
    "print(f'{young_survivor_pct    = :5.1%}')\n",
    "print(f'{middle_survivor_pct   = :5.1%}')\n",
    "print(f'{senior_survivor_pct   = :5.1%}')\n",
    "print(f'{elderly_survivor_pct  = :5.1%}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0018b101-c123-48cb-aafe-1617c21ec51d",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ['baby', 'toddler', 'child', 'teenager', \n",
    "     'young', 'middle', 'senior', 'elderly']\n",
    "y = [baby_survivor_pct, toddler_survivor_pct, child_survivor_pct, teenager_survivor_pct,\n",
    "     young_survivor_pct, middle_survivor_pct, senior_survivor_pct, elderly_survivor_pct]\n",
    "\n",
    "plt.bar(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b462f271-2cbc-4d4f-9821-c72f6c7b93cc",
   "metadata": {},
   "source": [
    "## Survivor percentages by sex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09177327-2b99-4726-b567-02e5fa7ef3e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "female_survivors = np.sum(counts[SURVIVED_YES, SEX_FEMALE, :, :])\n",
    "\n",
    "print(f'{female_survivors = }')\n",
    "print('% Female survivors: '\n",
    "      f'{female_survivors/female_count:.1%}') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a5a7181-4e6c-49df-9db5-393ae95a5b07",
   "metadata": {},
   "outputs": [],
   "source": [
    "male_survivors = np.sum(counts[SURVIVED_YES, SEX_MALE, :, :])\n",
    "\n",
    "print(f'{male_survivors = }')\n",
    "print('% Male survivors: '\n",
    "      f'{male_survivors/male_count:.1%}') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3250dc21-51f5-4e97-9f92-b83f50247c1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "total_survivors = np.sum(counts[SURVIVED_YES, :, :, :])\n",
    "\n",
    "print(f'{total_survivors = }')\n",
    "print('% All survivors: '\n",
    "      f'{total_survivors/total_count:.1%}') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2a08688-8c9f-40f8-bb64-133298d7ad44",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [female_survivors, female_count - female_survivors]\n",
    "\n",
    "plt.pie(y, labels=['females survived', 'females perished'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80333f09-01f7-4cf2-95c3-63da177b762b",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [male_survivors, male_count - male_survivors]\n",
    "\n",
    "plt.pie(y, labels=['males survived', 'males perished'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "115e4076-673c-48f6-a931-1292a4fd9af3",
   "metadata": {},
   "source": [
    "## Count of female survivors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b233f22c-3da2-4ed8-8ecf-627695f38b0c",
   "metadata": {},
   "source": [
    "#### In 1st class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8913b51d-e5d1-45b7-8569-36a5033f328d",
   "metadata": {},
   "outputs": [],
   "source": [
    "female_survived_1st = np.sum(counts[SURVIVED_YES, SEX_FEMALE, :, CLASS_1])\n",
    "\n",
    "print(f'{female_survived_1st = }')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7a40217-a766-44dd-9ac6-4b4b7eaf6efc",
   "metadata": {},
   "source": [
    "#### In both 2nd and 3rd class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c11aa05a-5392-4a38-b4ff-97d815bc415f",
   "metadata": {},
   "outputs": [],
   "source": [
    "female_survived_2nd3rd = np.sum(counts[SURVIVED_YES, SEX_FEMALE, :, [CLASS_2, CLASS_3]])\n",
    "\n",
    "print(f'{female_survived_2nd3rd = }')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61cc230a-fc35-43fd-add5-cd2ed39112aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ['female survivors: 1st class', '2nd+3rd']\n",
    "y = [female_survived_1st, female_survived_2nd3rd]\n",
    "\n",
    "plt.bar(x, y)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae1d814c-9d92-4afc-b586-b59948948e7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.pie(y, labels=['1st', '2nd+3rd'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e934bc21-f2e7-4d64-b245-d533d90d878b",
   "metadata": {},
   "source": [
    "#### (C) Copyright 2023 by Ronald Mak"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96dcdd30-9afe-41a4-a841-35de7d364fc2",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}