{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "###
San Jose State University
Department of Applied Data Science
\n", "#
DATA 220
Mathematical Methods for Data Analysis
\n", "###
Spring 2021
Instructor: Ron Mak
\n", "#
MIDTERM EXAMINATION SOLUTIONS
\n", "####
Six problems, each worth 25 points, 150 points total.

Open book, notes, and internet. Individual work only!
Be sure to explain your work in comments or by printing intermediate results.
You can use Python code and any Python functions.

Don't forget the 25 multiple-choice questions in Canvas!
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 1. A set of 5000 exam scores are normally distributed with a mean of 72 and a standard deviation of 6. To the nearest integer value, how many scores are there between 63 and 75?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SOLUTION:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "z_lo = -1.50\n", "z_hi = 0.50\n" ] } ], "source": [ "mean = 72\n", "sigma = 6\n", "lo_score = 63\n", "hi_score = 75\n", "\n", "z_lo = (lo_score - mean)/sigma\n", "z_hi = (hi_score - mean)/sigma\n", "\n", "print(f'z_lo = {z_lo:5.2f}')\n", "print(f'z_hi = {z_hi:5.2f}')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "area = 0.62\n", "score count = 3123\n" ] } ], "source": [ "# From the standard normal distribution table:\n", "area_lo = 0.06681\n", "area_hi = 0.69146\n", "\n", "area = area_hi - area_lo\n", "score_count = int(area*5000)\n", "\n", "print(f'area = {area:.2f}')\n", "print(f'score count = {score_count}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### ALTERNATE SOLUTION:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "area = 0.62\n", "score count = 3123\n" ] } ], "source": [ "import scipy.stats\n", "\n", "mean = 72\n", "sigma = 6\n", "lo_score = 63\n", "hi_score = 75\n", "\n", "# No conversions to z-scores or table lookups required.\n", "distrib = scipy.stats.norm(mean, sigma);\n", "area = distrib.cdf(hi_score) - distrib.cdf(lo_score)\n", "score_count = int(area*5000)\n", "\n", "print(f'area = {area:.2f}')\n", "print(f'score count = {score_count}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "#### PROBLEM 2. How many distinct solutions does the following equation have?\n", "\n", "
x1 + x2 + x3 + x4 = 100
\n", "\n", "#### where x1 can be 1, 2, 3, ... and x2 can be 2, 3, 4, ... and x3 and x4 can each be 0, 1, 2, 3,..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SOLUTION:\n", "\n", "This problem is complicated by the fact that x1 starts with 1, x2 starts with 2, and x3 and x4 each starts with 0. We can simplify the problem by setting y1 = x1 - 1, y2 = x2 - 2, y3 = x3, and y4 = x4. Now the problem is to find how many distinct solutions there are of\n", "
y1 + y2 + y3 + y4 = 97
\n", "where y1, y2, y3, and y4 each can be 0, 1, 2, 3, ..." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "solution1 = 161,700 solutions\n", "solution2 = 161,700 solutions\n" ] } ], "source": [ "import scipy.special\n", "\n", "n = 4 # the number of addends\n", "k = 97 # the number of possible values\n", "\n", "solution1 = int(scipy.special.binom(n + k - 1, k))\n", "solution2 = int(scipy.special.binom(n + k - 1, n - 1))\n", "\n", "print(f'solution1 = {solution1:,d} solutions')\n", "print(f'solution2 = {solution2:,d} solutions')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "#### PROBLEM 3. Andy, Bernice, and Carl each takes a turn shooting an arrow at a target. Andy hits the bullseye 1/2 of the time, Bernice hits it 1/3 of the time, and Carl hits it 1/4 of the time. Somebody hit the bullseye! What is the probability that it was Bernice?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SOLUTION:\n", "\n", "Solve using Bayes' Theorem. You are told that one arrow hit the target, and you are asked to calculate the probability that the arrow belonged to Bernice.\n", "
\n", "
\n", "Let:
\n", "H = event that one arrow hit the bullseye (the observable event)
\n", "A = event that the arrow was Andy's
\n", "B = event that the arrow was Bernice's
\n", "C = event that the arrow was Carl's
\n", "
\n", "Calculate: P(B|H)\n", "
\n", "
\n", "With the initial information, the arrow could have come from any one of the three, so the prior probabilities are P(A) = P(B) = P(C) = 1/3 that the arrow came from Andy, Bernice, or Carl. Your additional information is the proportion of time that each can hit the bulleye. Therefore:
\n", "P(H|A) = P(A)P(H|A) = (1/3)(1/2)
\n", "P(H|B) = P(B)P(H|B) = (1/3)(1/3)
\n", "P(H|C) = P(C)P(H|C) = (1/3)(1/4)
\n", "
\n", "The total probability that one arrow hit the bulleye is
\n", "P(H) = P(A)P(H|A) + P(B)P(H|B) + P(C)P(H|C)\n", "
\n", "
\n", "Therefore: P(B|H) = P(B)P(H|B) / P(H)\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P(H|A) = 0.1667\n", "P(H|B) = 0.1111\n", "P(H|C) = 0.0833\n", "P(H) = 0.3611\n", "\n", "P(B|H) = 0.3077\n" ] } ], "source": [ "P_HgA = (1/3)*(1/2) # P(H|A)\n", "P_HgB = (1/3)*(1/3) # P(H|B)\n", "P_HgC = (1/3)*(1/4) # P(H|C)\n", "\n", "print(f'P(H|A) = {P_HgA:.4f}')\n", "print(f'P(H|B) = {P_HgB:.4f}')\n", "print(f'P(H|C) = {P_HgC:.4f}')\n", "\n", "# P(H) = P(A)P(H|A) + P(B)P(H|B) + P(C)P(H|C)\n", "P_H = P_HgA + P_HgB + P_HgC\n", "\n", "# P(B|H) = P(B)P(H|B) / P(H)\n", "P_BgH = P_HgB / P_H\n", "\n", "print(f'P(H) = {P_H:.4f}')\n", "print()\n", "print(f'P(B|H) = {P_BgH:.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "#### PROBLEM 4. Suppose that at one of the annual pumpkin contests in Half Moon Bay, the weights of the entered pumpkins are approximately normally distributed with mean 125 pounds and standard deviation 18 pounds. Farmer Brown's pumpkin entry is at the 90th percentile in weight of all the pumpkins at the contest. What is the approximate weight of Farmer Brown's pumpkin?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SOLUTION:\n", "\n", "Farmer Brown's pumpkin is heavier than 90% of all the pumpkins at the contest. Therefore, from the standard normal distribution table, the z-score is about 1.28" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "weight = 148.04\n" ] } ], "source": [ "mean = 125\n", "sigma = 18\n", "z = 1.28\n", "\n", "# 1.28 = z = (x - mean)/sigma ==> x = z*sigma + mean\n", "x = z*sigma + mean\n", "\n", "print(f'weight = {x:.2f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### ALTERNATE SOLUTION:\n", "\n", "Function stats.norm.ppf() takes a percentage and returns by default the z-score of the standard normal distribution that gives that percentage area. To use it with any normal distribution, set parameter loc to the mean and parameter scale to the standard deviation. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "weight = 148.04\n" ] } ], "source": [ "from scipy.stats import norm\n", "\n", "mean = 125\n", "sigma = 18\n", "\n", "# No conversions to z-scores or table lookups required.\n", "weight = norm.ppf(0.90, loc=mean, scale=18)\n", "\n", "print(f'weight = {x:.2f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "#### PROBLEM 5. A call center receives an average of 0.6 complaints per hour. Management's goal is to receive fewer than three complaints per hour. What is the probability that management will achieve that goal each hour for the next four hours?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SOLUTION:\n", "\n", "This is a Poisson distribution problem. Use the scipy.stats.poisson.pmf probability mass function to add the probabilities that 0, 1, or 2 complaints per hour. To calculate the probability for the next four hours, raise to the fourth power the probability for one hour." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P(0 complaints) = 0.549\n", "P(1 complaints) = 0.329\n", "P(2 complaints) = 0.099\n", "\n", "P(fewer than 3 complaints per hour) = 0.977\n", "P(for the next 4 hours) = 0.911\n" ] } ], "source": [ "import scipy.stats as stats\n", "\n", "avg_complaints = 0.6\n", "max_complaints = 2\n", "hours = 4\n", "total = 0\n", "\n", "for c in range(0, max_complaints + 1):\n", " p = stats.poisson.pmf(c, avg_complaints)\n", " total += p;\n", " \n", " print(f'P({c} complaints) = {p:.3f}')\n", " \n", "print()\n", "print(f'P(fewer than {max_complaints + 1} complaints per hour) = {total:.3f}')\n", "print(f'P(for the next {hours} hours) = {total**hours:.3f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "#### PROBLEM 6. Here's my NASA problem (although a very sad one indeed). After the Challenger space shuttle exploded on takeoff in 1986, it was calculated that the probability of a single O-ring failure was 0.003. There were twelve O-rings on the shuttle. What was the probability that at least one of them would fail?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SOLUTION:\n", "\n", "This is a binomial probability problem. Use scipy.stats.binom(n, p) and the pmf probability mass function." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P(no failures) = 0.9646 = 96.46%\n", "P(at least one failure) = 0.0354 = 3.54%\n" ] } ], "source": [ "import scipy.stats as stats\n", "\n", "n = 12; # number of O-rings\n", "p = 0.003 # probability of an O-ring failure\n", "x = 0 # number of failures\n", "\n", "pmf = stats.binom(n, p).pmf(x)\n", "\n", "print(f'P(no failures) = {pmf:.4f} = {100*pmf:.2f}%')\n", "print(f'P(at least one failure) = {1 - pmf:.4f} = {100 - 100*pmf:.2f}%')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }