Chris Pollett >
Students > [Bio] [Blog] [D1:Dataset Overview - pdf(ipynb)] |
Deliverable 2:Classification and Clustering on Heart DataDescription: This deliverable aimed to implement Clustering and Classification Techniques on Heart data from Tabula Sapiens. Logistic Regression, Support Vector Machine, K-Means and Hierarchical Clustering were performed on the Heart cells data to check for similar groups or clusters.
Implementation Steps:
Results:
Classification The Logistic Regression model trained with 1000 iterations gave fairly good results with an accuracy of 98.6% on the test data. Support Vector Machine using a linear kernel also resulted in an accuracy of 98.8%, slightly better than Logistic regression. Clustering Trained the K-Means model with 6 clusters on the genes of Heart data. Most of the cells were grouped into different clusters Implemented K-Means with 6 clusters on the first 10 principal components of the genes of Heart data. The clusters can be distinguished better than the ones plotted on actual data Hierarchical Clustering was done on the Heart cell data with max_distance as 2000. The clusters aren't very clear from the plotted principal components. The below image shows the dendrogram of the clusters.
|