### <center>San Jose State University<br>Department of Applied Data Science<br><br>**DATA 200<br>Computational Programming for Data Analytics**<br><br>Spring 2024<br>Instructor: Ron Mak</center>

# 7.14.2 `DataFrame`
#### A `DataFrame` is an **enhanced two-dimensional array**. It can have custom row and column indices, and it has built-in methods for data analytics. It can support missing data. Each column is a `Series`, and the `Series` representing each column can contain different datatypes.

### Creating a `DataFrame` from a Dictionary

In [None]:
import pandas as pd

In [None]:
grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90],
               'Sam': [94, 77, 90], 'Katie': [100, 81, 82],
               'Bob': [83, 65, 85]}

In [None]:
grades = pd.DataFrame(grades_dict)
grades

### Customizing a `DataFrame`’s Indices with the `index` Attribute 
#### The number of indices in the one-dimensional array of indices must equal the number of rows in the dataframe.

In [None]:
grades = pd.DataFrame(grades_dict, index=['Exam1', 'Exam2', 'Exam3'])
grades

In [None]:
grades.index = ['Test1', 'Test2', 'Test3']
grades

### Accessing a `DataFrame`’s ***Columns***

In [None]:
grades

In [None]:
grades['Eva']

In [None]:
grades.Sam

In [None]:
type(grades.Sam)

In [None]:
grades[['Eva', 'Katie']]

In [None]:
type(grades[['Eva', 'Katie']])

### Selecting ***Rows*** via the `loc` and `iloc` Attributes

In [None]:
grades

#### Use `loc` to select a dataframe row. Note the difference between the following two:

In [None]:
grades.loc['Test1']

In [None]:
grades.loc[['Test1']]

#### The first result is a `Series`. But when the indices are in a list, the result is a `DataFrame`.

In [None]:
print(f"{type(grades.loc['Test1'])   = }")
print(f"{type(grades.loc[['Test1']]) = }")

#### Use `iloc` if the index is an integer value.

In [None]:
grades.iloc[1]

In [None]:
grades.iloc[[1]]

In [None]:
grades.iat[1, 2]

### Selecting Rows via Slices and Lists with the `loc` and `iloc` Attributes

In [None]:
grades.loc['Test1':'Test3']  # includes upper limit

In [None]:
grades.iloc[0:2]  # upper limit not included

In [None]:
grades.loc[['Test1', 'Test3']]

In [None]:
grades.iloc[[0, 2]]

### Selecting Subsets of the Rows and Columns 

#### When you use `loc` to slice a dataframe, the high index is ***included***.

In [None]:
grades.loc['Test1':'Test2', ['Eva', 'Katie']]

#### But if you use `iloc` to slice, the high index is ***excluded***.

In [None]:
grades.iloc[[0, 2], 0:3]

### Boolean Indexing

In [None]:
grades

#### Display the grades that are >= 90:

In [None]:
grades[grades >= 90]

#### Test whether each grade is >= 90:

In [None]:
grades >= 90

#### `NaN` is "not a number", a dataframe's notation for a missing value.

In [None]:
grades[(grades >= 80) & (grades < 90)]

In [None]:
grades[(grades >= 80) & (grades < 90)]['Bob']

In [None]:
grades[(grades >= 80) & (grades < 90)].Bob

In [None]:
grades[(grades >= 80) & (grades < 90)][['Bob']]

In [None]:
grades[(grades >= 80) & (grades < 90)][['Wally', 'Bob']]

### Accessing a Specific `DataFrame` Cell by Row and Column

In [None]:
grades

In [None]:
grades.at['Test2', 'Eva']

In [None]:
grades.iat[2, 0]

In [None]:
grades.at['Test2', 'Eva'] = 100

In [None]:
grades.at['Test2', 'Eva']

In [None]:
grades.iat[1, 2] = 0

In [None]:
grades

### Descriptive Statistics

In [None]:
grades.describe()

In [None]:
pd.set_option('display.precision', 3)

In [None]:
grades.describe()

### What's each student's mean (average) score?

In [None]:
grades

In [None]:
grades.mean()

### Transposing the `DataFrame` with the `T` Attribute

In [None]:
grades

In [None]:
grades.T

In [None]:
grades.T.describe()

### What's the average score of each test?

In [None]:
grades.T.mean()

### Sorting By Rows by Their Indices

In [None]:
grades.sort_index(ascending=False)

### Sorting By Column Indices
![Screenshot 2023-04-11 at 10.14.55 PM.png](attachment:cd01ac05-e4cf-4ea3-9698-37baaad7f45d.png)

In [None]:
grades.sort_index(axis=1)

#### The default is `axis=0` (sort by rows).

### Sorting By Column Values

In [None]:
grades.sort_values(by='Test1', axis=1, ascending=False)

### Sorting by Row Values

In [None]:
grades.T.sort_values(by='Test1', ascending=False)  # default is axis=0

In [None]:
grades.loc['Test1'].sort_values(ascending=False)

### Copy vs. In-Place Sorting

#### By default, methods `sort_index()` and `sort_values()` each creates and returns a ***copy*** of the dataframe, which may be undesirable for large dataframes. To sort a dataframe in place, include the keyword argument `inplace=True`.

In [None]:
grades

In [None]:
grades.sort_index(axis=1, inplace=True)

In [None]:
grades

In [None]:
##########################################################################
# (C) Copyright 2019 by Deitel & Associates, Inc. and                    #
# Pearson Education, Inc. All Rights Reserved.                           #
#                                                                        #
# DISCLAIMER: The authors and publisher of this book have used their     #
# best efforts in preparing the book. These efforts include the          #
# development, research, and testing of the theories and programs        #
# to determine their effectiveness. The authors and publisher make       #
# no warranty of any kind, expressed or implied, with regard to these    #
# programs or to the documentation contained in these books. The authors #
# and publisher shall not be liable in any event for incidental or       #
# consequential damages in connection with, or arising out of, the       #
# furnishing, performance, or use of these programs.                     #
##########################################################################


In [None]:
# Additional material (C) Copyright 2024 by Ronald Mak