Stamp's Students' Defenses: Fall 2025

Who	When	Where	Title
Christofer Washington Berruz Chungata	December 5 @ 11:00am	ISB 130	Concept Drift Detection and Adaptive Retraining of Malware Classification Models
Jhanvi Lotwala	TBD	TBD	TBD
Nathan Durrant	TBD	TBD	TBD

Concept Drift Detection and Adaptive Retraining of Malware Classification Models

by Christofer Washington Berruz Chungata

Concept drift refers to changes over time in the statistical properties of data, as compared to the data that was used to train a learning model. Machine learning models for malware detection or classification are particularly susceptible to performance degradation caused by concept drift, as attackers constantly modify existing malware. In this paper, we analyze two machine learning-based approaches to automated concept drift detection—a novel approach based on One-Class Support Vector Machines (OCSVM) and a previously-studied technique based on Minibatch K-Means (MK-Means). For comparison we also consider Maximum Mean Discrepancy (MMD), a statistical technique for detecting changes in multidimensional data. We conduct an extensive series of experiments comparing the effectiveness of four learning models, namely, Multilayer Perceptron (MLP), Random Forest (RF), Support Vector Machines (SVM), and eXtreme Gradient Boosting (XGB). For each of these models, we consider three distinct scenarios: A static scenario where no model retraining occurs, a periodic scenario where models are constantly retrained irrespective of concept drift, and a drift-aware scenario where models are only retrained when concept drift is detected. Under the drift-aware scenario, we analyze the tradeoff between accuracy and training efficiency using Pareto Front analysis. We find that all three concept drift detection techniques achieve classification accuracy comparable to periodic retraining, while offering substantially greater efficiency in terms of the number of models that must be retrained. In addition, drift-aware retraining based on our OCSVM technique generally outperforms the MK-Means and MMD approaches. Overall, these results provide strong evidence that we are able to accurately detect concept drift in malware classification models. Furthermore, our concept drift detection techniques are efficient and practical, and the process of updating learning models can easily be fully automated.

Stamp's Students' Defenses: Fall 2025

Concept Drift Detection and Adaptive Retraining of Malware Classification Models

by Christofer Washington Berruz Chungata

TBD

by Jhanvi Lotwala

TBD

by Nathan Durrant