CS298 Proposal
Clustering Organ Cell Types
Swathi M.V.S (venkatasatyaswathi.mattaparthi@sjsu.edu)
Advisor: Dr. Chris Pollett
Committee Members: Dr. Robert Chun, Dr. Wendy Lee
Abstract:
Every organ in the human body is composed of specific tissues. In turn, each tissue is made of
specific cells. Cells are the basic building blocks of all living organisms. The human body
contains trillions of cells. Each cell has a nucleus which contains DNA - the blueprint of all
organisms. DNA contains genes that carry genetic information. This information has all the
instructions for the production of proteins by organisms. The Human Cell Atlas is an
international project whose initiative is to create a comprehensive reference map of all human
cells. The data for the Human Cell Atlas is obtained from Tabula Sapiens, a project conducted by
Chan Zuckerberg Biohub in California. This dataset is a first-draft human cell atlas of nearly
500,000 cells from 24 organs of 15 human donors. It provides insights into the molecular
composition of different cell types, containing gene expression patterns, signaling pathways, and
so on. The Tabula Sapiens dataset is in H5ad (Hierarchical Data Format 5 Annotated) format and
contains single-cell transcriptomic data. Transcriptomic data gives information about the RNA
molecules present in cells, focusing on gene expression.
My project will use The Human Cell Atlas as a reference to study the Tabula Sapiens Dataset to
cluster the cells present in various organs and gain insight into the different types of cells present
in the human body.
CS297 Results
- Studied and explored the Heart data of the Tabula Sapiens dataset to understand the
different cells and features present in the data.
- Performed Classification and Clustering techniques on Heart data to distinguish the
different types of cells.
- Implemented Binary Classification algorithms to identify a particular cell type of the
Heart data.
- Built a Neural Network Classifier to identify the various cells present in the Heart data.
Proposed Schedule
Week 1:
Jan 30 - Feb 5 | Submit CS298 Proposal |
Week 2:
Feb 6 - Feb 12 | Explore and preprocess Tabula Sapiens All Cells data |
Week 3 - Week 4:
Feb 13 - Feb 26 | Perform k-means clustering and explore various methods of choosing k |
Week 5 - Week 6:
Feb 27 - Mar 11 | Implement Hierarchical clustering to identify the number of clusters |
Week 7 - Week 9:
Mar 12 - Apr 1 | Compare the clustering of different organ types |
Week 10 - Week 13:
Apr 9 - May 6 | Work on CS298 Report/ Presentation |
Key Deliverables:
- Software
- Explore the All Cells dataset and use it to perform k-means clustering, and experiment with different methods of choosing 'k'.
- Use the dataset containing all cells, and implement Hierarchical clustering to find
out the number of clusters.
- Compare the clustering of different tissue (organ) types and check if there is a
similarity among the clusters to identify any new cell types.
- Report
- CS298 Report
- CS298 Presentation
Innovations and Challenges
- One of the challenges is understanding and preprocessing single-cell human data of all
the cells as it contains biological features of various organs.
- Selecting the optimal number of clusters in k-means and Hierarchical clustering is an
important task. Different methods may generate varying results, and identifying the most
suitable approach is challenging.
- The Human Cell Atlas is an ongoing research project, hence not many references are
available. We aim to see if any new cell types can be identified from clustering, any new
findings would be helpful in this field.
References:
[1] A Cartography of Human Histology in the Making
https://www.economist.com/science-and-technology/2023/03/08/a-cartography-of-human-histolo
gy-is-in-the-making, March 2023
[2] O. Rozenblatt-Rosen, M. Stubbington, A. Regev, and S. Teichmann, "The Human Cell Atlas:
from vision to reality", Nature 550, 451-453, 2017, doi: 10.1038/550451a
[3] Tabula Sapiens Dataset https://tabula-sapiens-portal.ds.czbiohub.org/
[4] The Tabula Sapiens Consortium, "The Tabula Sapiens: A multiple-organ, single-cell
transcriptomic atlas of humans", Science376, eabl4896, 2022, doi: 10.1126/science.abl4896 |