Chris Pollett > Students > Dhole
[Bio] [Blog] [Deliverable #1: Naive Bayes Classifier] [Hierarchical Agglomerative Clustering - PDF] [Deliverable #2: Hierarchical Agglomerative Clustering] [Deliverable #3: Classifiers and Clustering in Yioop] [Deliverable #4: Recipe plugin scale out] |
Naive Bayes Classifier For Email Spam ClassificationAimTo implement Naive Bayes Classifier from scratch to classify Email Spams. IntroductionIn machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple. DescriptionMy primary task was to understand the Naive Bayes Classifier in Machine Learning and apply it on Email Spams classification Fork me on GitHub OR Source Code - ZIP Email spams are manually pulled from spam inbox and pre-classified into 3 classes: 1) I [Internet Advertising] 2) M [Medical Traps] 3) P [Phishing] We make 2 datasets from the given emails set: 1) Training Data 2) Test Data. On training data, we apply the Naive Bayes Classifier, implemented in Python, and make the classifier model ready. Now, this model is applied on the Training Data and we predict the Class for every spam email. Eventually, we find the accuracy by comparing actual classes and predicted classes.
Data Preprocessing and Data Format:
|