San Jose State University Department of Applied Data Science
**DATA 200 Computational Programming for Data Analytics**
Spring 2023 Instructor: Ron Mak
**Assignment #5.2: Sets**
SAMPLE SOLUTIONS
"
]
},
{
"cell_type": "markdown",
"id": "7f054d51-5208-4463-a88f-3f7b78b65462",
"metadata": {},
"source": [
"#### **PROBLEM 2:** Great software design involves not only writing good code, but also choosing the right way to store your data in memory.\n",
"#### Suppose you have a large collection of unsorted random values and you need to search the collection for particular target values. Should you store the random values in a list or in a set? Which data structure is more efficient for searches?"
]
},
{
"cell_type": "markdown",
"id": "6b43ce8a-f435-4d48-85cc-ce15271f2d21",
"metadata": {},
"source": [
"#### **5.2. a.** [20 points] Generate 1,000,000 random integer values in the range 0 through 9,999,999 inclusive. Each time you generate a random value, append it to a list named `randoms_list` and also enter it into a set named `randoms_set`. Both the list and the set should initially be empty, and when you're done with this step, both will contain the same random values. Do not sort the list."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e401792a-859a-447e-9e0b-1d498610caca",
"metadata": {},
"outputs": [],
"source": [
"# Global constants\n",
"RANDOMS_COUNT = 1_000_000\n",
"RANDOMS_RANGE = 10_000_000"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "61f14c72-5d46-49c4-9cb6-16417d20b66f",
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"randoms_list = []\n",
"randoms_set = set()\n",
"\n",
"for _ in range(RANDOMS_COUNT):\n",
" r = random.randrange(RANDOMS_RANGE)\n",
" \n",
" randoms_list.append(r)\n",
" randoms_set.add(r)"
]
},
{
"cell_type": "markdown",
"id": "e0687f1b-56c7-4831-ad22-baadfb37d1ad",
"metadata": {},
"source": [
"#### **5.2. b.** [20 points] Initialize another list named `target_values`. Fill `target_values` with random values in the range 0 through 9,999,999 inclusive. Each time you generate a random target value, use the `in` operator to search for the value in `randoms_list` and in `randoms_set`. Since both `randoms_list` and `randoms_set` contain the same values, each target value should be found in both the list and the set, or in neither. Stop generating target values after you've generated 10 values that were found. At the end of this step, `target_values` will be a list of random values, 10 of which were found and the rest were not."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "782bdb30-797c-44b3-a292-96ffa3db3dfc",
"metadata": {},
"outputs": [],
"source": [
"target_values = []\n",
"found_count = 0\n",
"\n",
"while found_count < 10:\n",
" r = random.randrange(RANDOMS_COUNT)\n",
" target_values.append(r)\n",
" \n",
" found_in_list = r in randoms_list\n",
" found_in_set = r in randoms_set\n",
" \n",
" #print(f'{r:10} in list: {found_in_list}, in set: {found_in_set}')\n",
" \n",
" if found_in_list:\n",
" found_count += 1"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "666ff51b-2d68-4cce-ba84-a9b87c99f66f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"len(randoms_list) = 1,000,000\n",
"len(randoms_set) = 951,360\n",
"len(target_values) = 91\n"
]
}
],
"source": [
"# Just checking.\n",
"print(f'{len(randoms_list) = :,d}')\n",
"print(f'{len(randoms_set) = :,d}')\n",
"print(f'{len(target_values) = }')"
]
},
{
"cell_type": "markdown",
"id": "a4629164-0cf8-4420-9c6f-20511b1d3561",
"metadata": {},
"source": [
"#### **5.2. c.** [30 points] Time how long it takes to search `randoms_list` using the `in` operator for all the values in `target_values`, one at a time. Calculate and print the elapsed time in milliseconds that it took some code to execute like this:\n",
"``` python\n",
"import time\n",
"\n",
"start_time = time.process_time_ns()\n",
"\n",
"# Code to be timed here.\n",
" \n",
"end_time = time.process_time_ns()\n",
"\n",
"elapsed_time = round((end_time - start_time)/10**6, 3)\n",
"print(f\"Elapsed time: {elapsed_time:,.3f} ms\")\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c843b7a3-bdcc-4515-9ed9-f0b435a475f0",
"metadata": {},
"outputs": [],
"source": [
"import time"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b390653a-b35f-478c-b274-0738495ea2d0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Elapsed time for list: 892.701 ms\n"
]
}
],
"source": [
"start_time_list = time.process_time_ns()\n",
"\n",
"# Search for each target value in the list.\n",
"for value in target_values:\n",
" value in randoms_list\n",
" \n",
"end_time_list = time.process_time_ns()\n",
"\n",
"elapsed_time_list = round((end_time_list - start_time_list)/10**6, 3)\n",
"print(f\"Elapsed time for list: {elapsed_time_list:,.3f} ms\")"
]
},
{
"cell_type": "markdown",
"id": "a4eb1353-9a9a-414f-ae51-08a570fec63e",
"metadata": {},
"source": [
"#### **5.2. d.** [30 points] Time how long it takes to search `randoms_set` using the `in` operator for all the values in `target_values`, one at a time. Calculate and print the elapsed time in milliseconds it took the code to execute."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c02f1e1f-85ea-46d7-84ca-09a1254e0155",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Elapsed time for set: 0.088 ms\n"
]
}
],
"source": [
"start_time_set = time.process_time_ns()\n",
"\n",
"# Search for each target value in the set.\n",
"for value in target_values:\n",
" value in randoms_set\n",
" \n",
"end_time_set = time.process_time_ns()\n",
"\n",
"elapsed_time_set = round((end_time_set - start_time_set)/10**6, 3)\n",
"print(f\"Elapsed time for set: {elapsed_time_set:,.3f} ms\")"
]
},
{
"cell_type": "markdown",
"id": "e02b6ea3-d5fb-4563-9f08-4776fda4c8fc",
"metadata": {},
"source": [
"#### Therefore ..."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1560b5a6-cb74-4bcf-a100-a90c9de6de72",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The elapsed list time is longer than the set time by a factor of 10,144.\n"
]
}
],
"source": [
"factor = round(elapsed_time_list/elapsed_time_set)\n",
"\n",
"print( 'The elapsed list time is longer than '\n",
" f'the set time by a factor of {factor:,d}.')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1b0f6d0-27ca-49f3-bcbb-619da7f195b6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}