Th | 6:00 - 8:45 PM | room Health Building HB 106 |
# | Assigned | Due | Assignment |
---|---|---|---|
1 | Jan 23 | Jan 30 |
CSV datasets and Jupyter notebooks
Jupyter notebooks: TitanicCSV.ipynb AirlineSafetyCSV.ipynb BostonCrimeCSV.ipynb CSV files: crimes-in-boston.zip |
2 | Jan 30 | Feb 6 | Seaborn Bar Charts of Random Values |
3 | Feb 6 | Feb 13 |
Analysis of a Dataset
Example analysis: TitanicAnalysis.ipynb TitanicSurvival.csv |
4 | Feb 13 | Feb 20 |
Combinatorics and Probability Problem Set
Solutions: Assignment4-solutions.ipyn |
5 | Feb 20 | Feb 27 |
Probability Problem Set
Solutions: Assignment5-solutions.ipyn |
6 | Feb 27 | Mar 5 | The Central Limit Theorem |
7 | Mar 19 | Mar 26 | Linear regression |
8 | Mar 26 | Apr 9 | Multiple Regression Analysis |
9 | Apr 9 | Apr 16 | Supervised and Unsupervised Machine Learning |
10 | Apr 16 | Apr 23 | Text analysis |
11 | Apr 25 | Apr 30 |
Matrix Operations Problem Set
Solutions: Assignment11-solutions.ipyn |
12 | May 2 | May 7 |
Polynomial Regression and Markov Chain Problem Set
Solutions: Assignment12-solutions.ipyn PolynomialRegression.py |
Week | Date | Content |
---|---|---|
1 | Jan 23 |
Slides:
Introduction to data analytics; What is Data Science?
history of data collection; history of data analysis;
Python libraries; datasets; load CSV files into dataframes;
statistics and machine learning; data scientist skillset;
Lab: Install Anaconda Lab: Load CSV datasets into dataframes Jupyter notebooks: TitanicCSV.ipynb AirlineSafetyCSV.ipynb BostonCrimeCSV.ipynb CSV files: crimes-in-boston.zip |
2 | Jan 30 |
Slides:
Big Data; Jupyter notebooks; IPython; lists; indexing;
length; mutable; two-dimensional; unpacking; sort;
search; list comprehension; list operations; tuples;
NumPy arrays; Seaborn; bar chart; random values
Lab: Seaborn Bar Charts Jupyter notebooks: AgeBarChart.ipynb 5.02-Lists.ipynb 5.03-Tuples.ipynb 5.05-Slicing.ipynb 5.06-Deletion.ipynb 5.08-Sorting.ipynb 5.09-Searching.ipynb 5.10-OtherMethods.ipynb 5.12-Comprehensions.ipynb |
3 | Feb 6 |
Slides:
Statistics; sums; measures of central tendency; mean;
weighted average; median; mode; measures of variability;
range; percentiles; quartiles; interquartile range (IRQ);
variance; standard deviation; zip; data analysis with the
Titanic Survival dataset
Lab: Descriptive Statistics Jupyter notebooks: sum.ipynb mean.ipynb weighted_average.ipynb median.ipynb mode.ipynb range.ipynb quartiles.ipynb IQR.ipynb stdev.ipynb ziptest.ipynb TitanicAnalysis.ipynb TitanicSurvival.csv boxplot.png |
4 | Feb 13 |
Slides:
Counting principles; factorial notation; count the
complement; counting when order doesn't matter;
binomial coefficients; collections that allow repetitions;
permutations and combinations; uncertainty; probability:
classical and relative frequency interpretations; basic
probability laws; Venn diagrams
Lab: Combinatorics and probability problem set |
5 | Feb 20 |
Slides:
Conditional probability; independent vs. dependent events;
Bayes' Theorem; Thomas Bayes and Bayesian statistics;
disease test example; Monty Hall Problem;
discrete and continuous random variables; probability
distributions: uniform, normal, exponential,
binomial, Poisson; expected value; animated graphs
Lab: Probability problem set Python program: RollDieDynamic.py Jupyter notebook: PltAnimation.ipynb |
6 | Feb 27 |
Slides:
Computer simulations; Monty Hall simulation program;
Monty Hall with n doors and k cars; statistics;
sampling; random vs. biased sampling; sampling error;
estimates of the population mean; the Central Limit Theorem;
sampling distribution of the sample means; number of samples;
size of samples; standard error; sampling distributions
of the mean, median, and standard deviation
Lab: Central Limit Theorem Jupyter notebook: MontyHall.ipynb |
7 | Mar 5 |
Slides:
Discrete vs. continuous random variables; area under the
curve; normal probability distribution; standard normal
distribution; standard normal distribution probabilities;
confidence interval; critical values; level of significance;
margin of error; small sample estimates;
Student's t distribution; t confidence interval;
interpretation of confidence intervals; hypothesis testing;
test procedure; Type I and Type II errors; test statistic;
null hypothesis rejection regions; hypothesis testing examples
Jupyter notebook: StandardNormal.ipynb Fall 2019 midterm: Midterm-Fall2019.ipynb Midterm-Solutions-Fall2019 TitanicSurvival.csv |
8 | Mar 12 |
Midterm Video recording Slides: Null vs. alternative hypothesis; small sample hypothesis tests; testing two population means with large and small samples |
9 | Mar 19 |
Video recording
Slides: Tactics for solving probability problems; midterm solutions; hypothesis testing and experiments; different significant levels; using P-values; dependent and independent variables; scatter plots; regression analysis; regression line; slope-intercept; residual values; least-squares line; coefficient of determination; correlation coefficient; correlation and causation; Assignment #7 Midterm solutions: midterm solutions Jupyter notebooks: StandardNormal.ipynb LeastSquaresLine.ipynb CoeffOfDet.ipynb Correlation.ipynb |
10 | Mar 26 |
Video recording
Slides: Python regression analysis functions; NY City temperatures example; time-series analysis; linear trend over time; moving averages; exponential smoothing; another perspective on linear regression; multiple linear regression; normal equations; home prices example; introduction to machine learning; supervised; unsupervised; steps for doing ML; time-series analysis via ML example; split the data; train the estimator; test the model; make predictions; multiple regression via ML; California housing example; underfitting and overfitting; Jupyter notebooks: NYCTemps.ipynb TimeSeries.ipynb HomePrices.ipynb NYCTempsML.ipynb CaliforniaHousing.ipynb Least squares module: LeastSquaresLine.py Dataset: ave_hi_nyc_jan_1895-2018.csv |
11 | Apr 9 |
Video recording
Slides: scikit-learn machine learning algorithms; "Big Data"; supervised ML; k-nearest neighbor ML classification algorithm with the Digits dataset; training and testing the KNN model; confusion matrix; classification report; unsupervised ML; dimensionality reduction; TSNE estimator; k-means clustering algorithm with the Iris dataset; principal component analysis; Assignment #9 Jupyter notebooks: k-NearestNeighbors.ipynb k-DimensionalityReduction.ipynb k-MeansClustering.ipynb |
12 | Apr 16 |
Video recording Password B6*2$OE?
Slides: Natural language processsing; NLP examples; Assignment #10; linear algebra; vectors; vector arithmetic; normalize a vector; vectors and NumPy; matrices; matrix arithmetic; matrix inverse; matrices and NumPy; Hilbert matrices; graphic transformation matrix Jupyter notebooks: nlp.zip matrices.zip |
13 | Apr 23 |
Video recording Password 2W!=1^O!
Slides: Linear equations; home prices example; solve a system of linear equations; equivalent systems; graphical solution; consistent and inconsistent systems; ill-conditioned systems; augmented matrix solution; row echelon form; Gaussian elimination solution; matrix representation; calculate a matrix inverse; solution using an inverse; singular matrix and system; least-squares solution with matrices; QR factorization; linearly independent; orthonormal columns; upper-triangular; Gram-Schmidt Orthonormalization Process Jupyter notebooks: LinearEquationPlots.ipynb MatrixInverse.ipynb HomePricesSolve.ipynb LeastSquaresMatrix.ipynb MultivariateMatrix.ipynb QR.ipynb Splines.ipynb |
14 | Apr 30 |
Video recording Password 0F%o5CoI
Slides: Nonlinear relationships; normal equations; polynomial regression; polynomial regression with matrices and QR factorization; Markov chain; Markov steady state Jupyter notebooks: PolynomialRegression.py PolynomiaRegression1.ipynb PolynomiaRegression2.ipynb PolynomiaRegression3.ipynb PolynomiaRegression4.ipynb PolynomialRegressionMatrices.ipynb MarkovChain.ipynb |
15 | May 7 |
Video recording Password 4O+Tvg1N
Slides: Eigenvalues and eigenvectors; uses; geometric interpretations; normalized eigenvectors; compute eigenvalues; compute eigenvectors; numpy.linalg.eig() function; matrix powers Jupyter notebooks: EigenFactorization.py MatrixPowers.py VectorPlots.ipynb TestEigenFactorization1.ipyn TestEigenFactorization2.ipyn TestEigenFactorization3.ipyn TimeEigen.ipynb TestMatrixPower.ipynb Fall 2019 final exam: Final-Fall2019.ipynb Final-Solutions-Fall2019 |
Instructor consent.
Introduction to Statistics & Data Analysis,
6th edition Roxy Peck, Tom Short, and Chris Olsen Cengage, 2019 978-1-337-79361-2 |
Linear Algebra: A Modern Introduction,
4th edition David Poole Cengage Learning, 2015 978-1-285-46324-7 |
Linear Algebra (Schaum’s Outline),
6th edition Seymour Lipschutz and Marc Lipson McGraw-Hill, 2018 978-1-260-01144-9 |
Data Science from Scratch: First Principles with
Python, 2nd edition Joel Grus O’Reilly, 2019 978-1-492-04113-9 |
Python Data Science Handbook Jake VanderPlas O’Reilly, 2017 978-1-491-91205-8 |