Deliverable 2:Classification and Clustering on Heart DataDescription: This deliverable aimed to implement Clustering and Classification Techniques on Heart data from Tabula Sapiens. Logistic Regression, Support Vector Machine, K-Means and Hierarchical Clustering were performed on the Heart cells data to check for similar groups or clusters.
Implementation Steps:
Results:
Classification The Logistic Regression model trained with 1000 iterations gave fairly good results with an accuracy of 98.6% on the test data. Support Vector Machine using a linear kernel also resulted in an accuracy of 98.8%, slightly better than Logistic regression. Clustering Trained the K-Means model with 6 clusters on the genes of Heart data. Most of the cells were grouped into different clusters ![]() ![]() Implemented K-Means with 6 clusters on the first 10 principal components of the genes of Heart data. The clusters can be distinguished better than the ones plotted on actual data ![]() ![]() Hierarchical Clustering was done on the Heart cell data with max_distance as 2000. The clusters aren't very clear from the plotted principal components. ![]() The below image shows the dendrogram of the clusters.
|