{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "042469fa-ab68-47f1-93dd-22d1160b8ac7",
   "metadata": {},
   "source": [
    "### <center>San Jose State University<br>Department of Applied Data Science<br><br>**DATA 200<br>Computational Programming for Data Analytics**<br><br>Spring 2024<br>Instructor: Ron Mak</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51ffc2fa-af0f-4ab5-bce5-e0fcc762b7a9",
   "metadata": {},
   "source": [
    "# `pandas`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c853e87b-fab2-446c-a986-cff52c57ccf7",
   "metadata": {},
   "source": [
    "#### `numpy` arrays are optimized for homogeneous numeric data that's accessed via integer indices. But \"Big Data\" applications must support mixed datatypes, custom indexing, missing data, and data that's not structured consistently.\n",
    "#### `pandas` is a module from the Python Standard Library that offers data structures and methods to manipulate different types of data. They are easy to use and highly optimized for performance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8ef7426-bf4b-4884-95eb-ca87f43e0dda",
   "metadata": {},
   "source": [
    "#### `pandas` has two key data collections, `Series` and `DataFrame`. Both are based on `numpy` arrays. Many `Series` and `DataFrame` operations can take `numpy` arrays as arguments, and many `numpy` operations can take `Series` and `DataFrame` arguments."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f376642f-1a5e-45d4-86d3-b7cf78c66d65",
   "metadata": {},
   "source": [
    "#### The original developer of `pandas` derived its name from \"panel data\" when he was working with data for measurements over time, such as stock prices and historical temperature readings."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ee4a57e-4b08-47c8-85e3-89a9e31a2a33",
   "metadata": {},
   "source": [
    "## Advantages of `pandas` over `numpy`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35d7b998-b622-46d1-a788-abcdd46e34b2",
   "metadata": {},
   "source": [
    "#### **Higher level of abstraction.** `pandas` offers a simplier API for developers by abstracting away some of the complex concepts.\n",
    "#### **Less intuition.** Many `pandas` methods require less intuition by developers but are still very powerful.\n",
    "#### **Faster processing.** Some `DataFrame` operations can be much faster, depending on the data and their structure.\n",
    "#### **Designed for \"Big Data\".** A `DataFrame` is ideal for operating on large datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "522db182-b5e9-4211-bc0a-c97218356dcb",
   "metadata": {},
   "source": [
    "## Disadvantages of `pandas` over `numpy`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da0ae4c8-c9c8-458f-9092-22f8011ff9f6",
   "metadata": {},
   "source": [
    "#### **Less applicable.** The higher level of abstraction can make `pandas` less applicable. Some operations can become more complex.\n",
    "#### **More momory and disk space.** `pandas` dataframes require more memory and disk space than `numpy` arrays.\n",
    "#### **Performance problems.** Heavy joins can cause performance and memory usage problems.\n",
    "#### **Hidden complexity.** The simple API can hide complexity from programmers and result in inefficient code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4630b19-b1ed-45d5-848b-a6c94372f01a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# (C) Copyright 2023 by Ronald Mak"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3affb551-5ad6-42bf-818a-d1faeb420d3f",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}