{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "###
San Jose State University
Department of Applied Data Science
\n", "#
DATA 220
Mathematical Methods for Data Analysis
\n", "###
Spring 2021
Instructor: Ron Mak
\n", "#
Assignment #5
Probability Problem Set
SOLUTIONS
\n", "####
100 points total (10 points each)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 1. Suppose San Jose State University accepts 75% of all applicants. What is the probability that it will accept exactly three of the next six applicants?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is a binomial probability problem. Success = acceptance.\n", "# Use scipy.stats.binom(n, p) and the pmf probability mass function.\n", "\n", "import scipy.stats as stats\n", "\n", "n = 6; # number of trials (applicants)\n", "p = 0.75 # probability of success in each trial\n", "x = 3 # number of successes\n", "\n", "print(f'stats.binom(n, p).pmf(x) = {stats.binom(n, p).pmf(x):.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 2. In DATA 220, if the probability is 0.20 that any one student will get an A, what is the probability that in a random sample of eight students in the class, exactly three of them will get A's?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is a binomial probability problem. Success = getting an A.\n", "# Use scipy.stats.binom(n, p) and the pmf probability mass function.\n", "\n", "import scipy.stats as stats\n", "\n", "n = 8; # number of trials (students)\n", "p = 0.20 # probability of success in each trial\n", "x = 3 # number of successes\n", "\n", "print(f'stats.binom(n, p).pmf(x) = {stats.binom(n, p).pmf(x):.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 3. You get an average rate of two visits per minute at your website. What is the probability that you will have at least one visit during a given minute?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# It's a Poisson distribution problem whenever you're given \n", "# the average rate of an event occurring within a given interval\n", "# of time and then you're asked for the probability of a certain\n", "# number of those events occurring within the same amount of time.\n", "# Use scipy.stats.poisson.pmf probability mass function.\n", "# Since we're asked for at least one customer, compute the\n", "# probability of no customers and subtract that from 1.\n", "\n", "import scipy.stats as stats\n", "\n", "average_customers = 2\n", "no_customers = 0\n", "\n", "p = 1 - stats.poisson.pmf(no_customers, average_customers)\n", "\n", "print(f'The average customers per minute = {average_customers}')\n", "print(f'The probability of at least one customer = {p:.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is not sufficient simply to add the probability of 1 customer plus the probability of 2 customers. You must add the probabilities of 1 though an infinity of customers! Of course, the probability decreases rapidly as the number of customers increases. Note how the total probability converges towards to the solution calculated above. We stop at 12 customers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import scipy.stats as stats\n", "\n", "average_customers = 2\n", "\n", "p = 0\n", "for customers_counted in range(1, 13):\n", " p += stats.poisson.pmf(customers_counted, average_customers)\n", " print(f'The probability of up to {customers_counted:2d} customers = {p:.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 4. An average of 3.6 students complain about each of Prof. Mak's midterms. What is the probability that no more than three students will complain about his next midterm?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Another Poisson distribution problem. Here, the time period\n", "# is one midterm. Use the scipy.stats.poisson.pmf probability \n", "# mass function and sum the probabilities of 0, 1, 2, or 3\n", "# students complaining.\n", "\n", "import scipy.stats as stats\n", "\n", "average_complaints = 3.6\n", "max_complaints = 3\n", "\n", "p_sum = 0\n", "for complaining_students in range(0, max_complaints + 1):\n", " p = stats.poisson.pmf(complaining_students, average_complaints)\n", " p_sum += p\n", " print(f'The probability of {complaining_students} complaints = {p:.4f}')\n", "\n", "print()\n", "print(f'The probability of at most {max_complaints} complaints = {p_sum:.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 5. A telephone political poll produced the following results regarding a proposed new law:\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PartyIn favorNot in favorTotal
Republican9854152
Democrat7929108
Total17783260
\n", "\n", "#### What is the probability that a person on the phone favors the new law given that the person is a Democrat?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is a conditional probability problem.\n", "# Let Event F: The person favors the new law.\n", "# Event D: The person is a Democrat.\n", "\n", "# By table inspection, we can see that there are \n", "# a total of 108 Democrats, of whom 79 are in \n", "# favor of the law. Therefore, P(F|D) = 79/108.\n", "\n", "# To solve using conditional probability formulas:\n", "# P(F|D) = P(F and D)/P(D) = (79/260) / (108/260) = 79/108\n", "P_FgD = 79/108\n", "\n", "print(f'P(favor|Democrat) = {P_FgD:.2f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 6. Based on the above poll results, what is the probability that a person on the phone was a Republican, given that the person favors the new law?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is a conditional probability problem.\n", "# Let Event F: The person favors the new law.\n", "# Event R: The person is a Republican.\n", "\n", "# By table inspection, we can see that 177 people\n", "# favor of the new law, of whom 98 are Republicans.\n", "# Therefore, P(R|F) = 98/177.\n", "\n", "# To solve using conditional probability formulas:\n", "# P(R|F) = P(R and F)/P(F) = (98/260) / (177/260) = 98/177\n", "P_RgF = 98/177\n", "\n", "print(f'P(Republican|favor) = {P_RgF:.2f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 7. A student randomly guesses at a 12-question multiple-choice exam where each question has five choices. What is the probability that the student will correctly answer exactly six questions?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is a binomial probability problem. Success = right answer.\n", "# Use scipy.stats.binom(n, p) and the pmf probability mass function.\n", "\n", "import scipy.stats as stats\n", "\n", "n = 12; # number of trials (questions)\n", "p = 0.20 # probability of success in each trial\n", "x = 6 # number of successes\n", "\n", "print(f'stats.binom(n, p).pmf(x) = {stats.binom(n, p).pmf(x):.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 8. 20% of the employees of a company are college graduates. Of the college graduates, 75% are managers. Of the non-college graduates, 20% are managers. What is the probability that a randomly selected manager is a college graduate?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can solve this one using Bayes' Theorem.\n", "# Let Event M: The employee is a manager.\n", "# Event G: The employee is a college graduate\n", "# Event N: The employee is not a college graduate\n", "\n", "# Prior probabilities: P(G) = 0.20\n", "# P(N) = 1 - P(G) = 0.80\n", "P_G = 0.20\n", "P_N = 0.80\n", "\n", "P_MgG = 0.75 # P(manager|graduate)\n", "P_MgN = 0.20 # P(manager|nongraduate)\n", "\n", "P_G_P_MgG = P_G*P_MgG # P(graduate)*P(manager|graduate)\n", "P_N_P_MgN = P_N*P_MgN # P(nongraduate)*P(manager|nongraduate)\n", "print(f'P(graduate) *P(manager|graduate) = {P_G_P_MgG:.2f}')\n", "print(f'P(nongraduate)*P(manager|nongraduate) = {P_N_P_MgN:.2f}')\n", "print()\n", "\n", "P_M = P_G_P_MgG + P_N_P_MgN # P(M) = P(G)P(M|G) + P(N)*P(M|N)\n", "print(f'P(manager) = {P_M:.2f}')\n", "\n", "P_GgM = P_G_P_MgG/P_M # P(G|M) = [P(G)P(M|G)]/P(M)\n", "print(f'P(graduate|manager) = {P_GgM:.2f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 9. You receive an average of 4.2 email messages per hour. What is the probability that you will receive more than two emails in the next hour?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Another Poisson distribution problem. Use stats.poisson.pmf.\n", "# The time period is one hour. We calculate the probability \n", "# p of receiving 0, 1, or 2 mails. Then the probability \n", "# of receiving more than 2 mails is 1 - p.\n", "\n", "import scipy.stats as stats\n", "\n", "average_mails = 4.2\n", "\n", "p_sum = 0\n", "for mails in range(0, 3):\n", " p = stats.poisson.pmf(mails, average_mails)\n", " p_sum += p\n", " print(f'The probability of {mails} mails is {p:.4f}')\n", " \n", "print()\n", "print(f'The probability of 0, 1, or 2 mails is {p_sum:.4f}')\n", "print(f'The probability of more than 2 mails is {1 - p_sum:.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### PROBLEM 10. Alice, Bill, and Carol toss a fair coin but you don't see them do it. Then Alice reports to you that the coin came up heads, but Bill and Carol claim that it was tails. Alice tells the truth 4/5 of the time, Bill 3/5 of the time, and Carol 5/7 of the time. What is the probability that it was indeed heads?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is another problem for Bayes' Theorem. Let H be the event that the coin toss was heads, and T be event that the coin toss was tails. Therefore, if the coin was fair, the prior probabilities are P(H) = 1/2, P(T) = 1/2.\n", "\n", "If we let A be the event Alice tells the truth, B be the event Bill tells the truth, and C be the event Carol tells the truth. Then additional information are P(A) = 4/5, P(B) = 3/5, and P(C) = 5/7. \n", "\n", "What are the probabilities of the coin toss actually being heads or tails? Let E be the event that Alice reported the coin toss was heads, but Bill and Carol claim that it was tails.\n", "If the toss was heads, P(E|H) is the probability that Alice told the truth and Bill and Carol both lied. Therefore, P(E|H) = P(A)(1 - P(B))(1 - P(C)). But if the toss was tails, Alice lied and Bill and Carol each told the truth. Therefore, P(E|T) = (1 - P(A))P(B)P(C). And so P(E) = P(H)P(A)(1 - P(B))(1 - P(C)) + P(T)(1 - P(A))P(B)P(C).\n", "\n", "According to Bayes' Theorem, the posterior probability that the coin toss was indeed heads, given that Alice reported the coin toss was heads, but Bill and Carol claim that it was tails, is P(H|E) = [P(H)P(E|H)]/P(E)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Events:\n", "# H = coin toss was heads\n", "# T = coin toss was tails\n", "# A = Alice told the truth\n", "# B = Bill told the truth\n", "# C = Carol told the truth\n", "# E = Alice reported heads but Bill and Carol claim that it was tails\n", "\n", "# Prior probabilities:\n", "P_H = 0.5 # P(H)\n", "P_T = 0.5 # P(T)\n", "\n", "print('Prior:')\n", "print(f'P(H) = {P_H:.4f}')\n", "print(f'P(T) = {P_T:.4f}')\n", "\n", "# Additional\n", "print()\n", "print('Additional:')\n", "\n", "P_A = 4/5 # P(A)\n", "print(f'P(A) = {P_A:.4f}')\n", "\n", "P_B = 3/5 # P(B)\n", "print(f'P(B) = {P_B:.4f}')\n", "\n", "P_C = 5/7 # P(C)\n", "print(f'P(C) = {P_C:.4f}')\n", "\n", "# P(E|H): Alice told the truth, but Bill and Carol both lied.\n", "P_EgH = P_H*P_A*(1 - P_B)*(1 - P_C)\n", "print(f'P(E|H) = {P_EgH:.4f}')\n", "\n", "# P(E|T): Alice lied, but Bill and Carol both told the truth.\n", "P_EgT = P_T*(1 - P_A)*P_B*P_C \n", "print(f'P(E|T) = {P_EgT:.4f}')\n", "\n", "# P(E) = P(H)P(E|H) + P(T)P(E|T)\n", "P_E = P_EgH + P_EgT\n", "print(f'P(E) = {P_E:.4f}')\n", "\n", "# Posterior:\n", "# P(H|E) = [P(H)P(E|H)]/P(E)\n", "P_HgE = P_EgH/P_E\n", "\n", "print()\n", "print('Posterior:')\n", "print(f'P(H|E) = {P_HgE:.4}')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }