Stamp's Students' Defenses: Fall 2025






Who
When
Where
Title
Christofer Washington Berruz Chungata
December 5 @ 11:00am
ISB 130
Concept Drift Detection and Adaptive Retraining of Malware Classification Models
Jhanvi Lotwala
TBD
TBD
TBD
Nathan Durrant
TBD
TBD
TBD






Concept Drift Detection and Adaptive Retraining of Malware Classification Models

by Christofer Washington Berruz Chungata

Concept drift refers to changes over time in the statistical properties of data, as compared to the data that was used to train a learning model. Machine learning models for malware detection or classification are particularly susceptible to performance degradation caused by concept drift, as attackers constantly modify existing malware. In this paper, we analyze two machine learning-based approaches to automated concept drift detection—a novel approach based on One-Class Support Vector Machines (OCSVM) and a previously-studied technique based on Minibatch K-Means (MK-Means). For comparison we also consider Maximum Mean Discrepancy (MMD), a statistical technique for detecting changes in multidimensional data. We conduct an extensive series of experiments comparing the effectiveness of four learning models, namely, Multilayer Perceptron (MLP), Random Forest (RF), Support Vector Machines (SVM), and eXtreme Gradient Boosting (XGB). For each of these models, we consider three distinct scenarios: A static scenario where no model retraining occurs, a periodic scenario where models are constantly retrained irrespective of concept drift, and a drift-aware scenario where models are only retrained when concept drift is detected. Under the drift-aware scenario, we analyze the tradeoff between accuracy and training efficiency using Pareto Front analysis. We find that all three concept drift detection techniques achieve classification accuracy comparable to periodic retraining, while offering substantially greater efficiency in terms of the number of models that must be retrained. In addition, drift-aware retraining based on our OCSVM technique generally outperforms the MK-Means and MMD approaches. Overall, these results provide strong evidence that we are able to accurately detect concept drift in malware classification models. Furthermore, our concept drift detection techniques are efficient and practical, and the process of updating learning models can easily be fully automated.




TBD

by Jhanvi Lotwala

TBD




TBD

by Nathan Durrant

TBD