{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "###
San Jose State University
Department of Applied Data Science

**DATA 200
Computational Programming for Data Analytics**

Spring 2024
Instructor: Ron Mak
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read a CSV File into a `DataFrame`\n", "#### We'll use our familiar `TitanicSurvival.csv` file as an example. We can read the file off the internet via a URL rather than from a local file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic = pd.read_csv(\n", " 'https://vincentarelbundock.github.io/Rdatasets/csv/carData/TitanicSurvival.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Viewing Some of the Rows in the Titanic Dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pd.set_option('display.precision', 2) # format for floating-point values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Customizing the Column Names" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.columns = ['name', 'survived', 'sex', 'age', 'class']\n", "titanic.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9.12.4 Simple Data Analysis with the Titanic Disaster Dataset " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The statistics only include the numeric fields (i.e., the ages) and ignore the missing ages (the `NA`s). Of the 1309 passengers, 1046 had known ages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(titanic.survived == 'yes').describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The expression `titanic.survived == 'yes'` yielded two unique values, `True` (survived) and `False` (perished). The top (highest frequency) value was `False` -- most passengers perished. `freq` is the count value of the top value -- 809 passengers perished." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9.12.5 Passenger Age Histogram\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "histogram = titanic.hist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Delete a column\n", "#### If we don't care about a passenger's sex, we can delete that column from the dataframe. You can delete one column at a time." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "del titanic['sex']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Drop rows\n", "#### We can drop (delete) all the rows with `NA` ages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.head(20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.tail(20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.age.isna()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "na_indices = titanic[ titanic.age.isna() ].index\n", "na_indices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.drop(na_indices, inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.age.isna()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.head(20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.tail(20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "##########################################################################\n", "# (C) Copyright 2019 by Deitel & Associates, Inc. and #\n", "# Pearson Education, Inc. All Rights Reserved. #\n", "# #\n", "# DISCLAIMER: The authors and publisher of this book have used their #\n", "# best efforts in preparing the book. These efforts include the #\n", "# development, research, and testing of the theories and programs #\n", "# to determine their effectiveness. The authors and publisher make #\n", "# no warranty of any kind, expressed or implied, with regard to these #\n", "# programs or to the documentation contained in these books. The authors #\n", "# and publisher shall not be liable in any event for incidental or #\n", "# consequential damages in connection with, or arising out of, the #\n", "# furnishing, performance, or use of these programs. #\n", "##########################################################################\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Additional material (C) Copyright 2024 by Ronald Mak" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }