|Dennis Dang||Malware classification using LSTMs|
|Jing Zhao||Malware classification with GMM-HMMs|
|Jason Do||A NEAT approach to malware classification|
Signature and anomaly based detection have long been quintessential techniques used in malware detection. However, these techniques have become increasingly ineffective as malware becomes more complex. Researchers have therefore turned to deep learning to construct better performing models. In this project, we create four different long-short term memory (LSTM) models and train each model to classify malware by family type. Our data consists of opcodes extracted from malware executables. We employ techniques used in natural language processing (NLP) such as word embedding and bidirection LSTMs (biLSTM). We also use convolutional neural networks (CNN). Our model consisting of word embedding, biLSTM, and CNN layers performed the best in classifying malware.
Discrete hidden Markov models (HMM) are often applied to the malware detection and classification problems. However, the continuous analog of discrete HMMs, that is, Gaussian mixture model-HMMs (GMM-HMM), are rarely considered in the field of cybersecurity. In this study, we apply GMM-HMMs to the malware classification problem and we compare our results to those obtained using discrete HMMs. As features, we consider opcode sequences and entropy-based sequences. For our opcode features, GMM-HMMs produce results that are comparable to those obtained using discrete HMMs, whereas for our entropy-based features, GMM-HMMs generally improve on the classification results that we can attain with discrete HMMs.
Current malware detection software often relies on machine learning, which is seen as an improvement over signature-based techniques. Problems with a machine learning based approach can arise when malware writers modify their code with the intent to evade detection. This leads to a cat and mouse situation where new models must constantly be trained to detect new malware variants. In this research, we experiment with genetic algorithms as a means of evolving machine learning models to detect malware. Genetic algorithms, which simulate natural selection, provide a way for models to adapt to continuous changes in a malware families, and thereby improve detection rates. Specifically, we use the Neuro-Evolution of Augmenting Topologies (NEAT) algorithm to optimize machine learning classifiers based on decision trees and neural networks. We compare the performance of our NEAT approach to standard models, including random forest and support vector machines.
Password (encrypted with a Caesar's cipher): 377355
Password (encrypted with a Caesar's cipher): 878354
Password (encrypted with a Caesar's cipher): 607249