Stamp's Master's and Honors Students' Defenses: Fall 2024

Who	When	Where	Title
Aniket Mishra	December 16 @ 11:00am	DH 450	Cluster Analysis for Concept Drift Detection in Malware
Jonathan Jiang	December 17 @ 1:00pm	DH 450	Multimodal Techniques for Malware Classification
Grace Li	December 17 @ 2:30pm	SCI 311	The Art of Detecting AI-Generated Art

Cluster Analysis for Concept Drift Detection in Malware

by Aniket Mishra

This research addresses concept drift in malware detection, that is, gradual or sudden changes in malware properties that lower detection accuracy. We propose and analyze a clustering-based approach to detect and adapt to concept drift. Using a subset of the KronoDroid dataset, data is segmented into temporal batches and analyzed with MiniBatch K-Means clustering. The silhouette coefficient is used as a metric to evaluate clustering quality and identify drift by detecting significant changes in cluster patterns. We experiment with three scenarios: Static models, periodic retraining, and drift-aware retraining. In each case, we consider four supervised classifiers, namely, Linear SVM, Random Forest, MLP neural networks, and XGBoost. Experimental results show that drift-aware retraining guided by silhouette score thresholds improves classification accuracy, as compared to static or periodic retraining. This provides strong evidence that our clustering-based approach is effective at detecting concept drift, and it illustrates an automated approach to improved malware detection via concept drift detection.

Multimodal Techniques for Malware Classification

by Jonathan Jiang

Malware continues to be a significant threat to computer systems and networks. This research utilizes structured information from PE files and employs a multimodal machine learning approach to differentiate between malware types. The proposed multimodal approach considers a variety of features derived from PE headers and the malware body. We then train several types of learning models independently on header features and body features, and we combine the output of these models to obtain multimodal models. We compare these multimodal models to models trained only on the PE header and models trained only on the body, and we also compare to models trained on the entire file. We consider SVM, LSTM, and CNN models, and combinations thereof in the multimodal cases. We find that the multimodal approach yields a slight, but meaningful, improvement in accuracy.

The Art of Detecting AI-Generated Art

by Grace Li

In this honors project, we first construct a large dataset consisting of human-generated art and AI-generated art, each consisting of samples from three different styles. We then consider the problem of distinguishing the human-generated from the AI-generated art, both as a binary classification problem and when classifying each sample according to its respective type. We attain high accuracies using various features and learning techniques. We consider directions for future work on this research topic.