Stamp's Master's Students' Defenses: Spring 2022






Who
When
Where
Title
Huy Nguyen
May 17 @ 10:00am
MH 225 Zoom (details below)
Generative adversarial networks for image-based malware classification
Andrew Miller
May 18 @ noon
MH 422
Hidden Markov models with momentum
Katrina Tran
May 19 @ 2:00pm
MH 225
Robustness of image-based malware analysis
Nhien Rust-Nguyen
May 24 @ 11:00am
MH 422
Darknet traffic classification
Xiaoli Tong
May 24 @ noon
MH 225
Concept drift and malware detection






Generative adversarial networks for image-based malware classification

by Huy Nguyen

Malware detection and analysis are important topics in cybersecurity. For efficient malware removal, determination of malware threat levels, and damage estimation, malware family classification plays a critical role. For example, Windows Defender often lists the malware type for detected malicious files so that the victim can assess the damage and search for information and removal tools online. With the rise in computing power and the advent of cloud computing, deep learning models for malware analysis have gained in popularity. In this research, we extract features from malware executable files and represent them as images using various approaches. We then focus on Generative Adversarial Networks (GAN) for multiclass classification and compare our GAN results to other popular machine learning techniques, including Support Vector Machine (SVM), XGBoost, and Restricted Boltzmann Machines (RBM). We also evaluate the utility of the GANs generative models for adversarial attacks on image-based malware detection. We find that the AC-GAN discriminator is competitive with other machine learning techniques.

Zoom details: Join from PC, Mac, Linux, iOS or Android:
https://sjsu.zoom.us/j/84981555463?pwd=chV2-pyqYBSCxwxheEG9YR9m4Fcd25.1
Password: 272138




Hidden Markov models with momentum

by Andrew Miller

Momentum is a popular technique for improving convergence rates during gradient descent. In this research, we experiment with adding momentum to the Baum-Welch expectation-maximization algorithm for training Hidden Markov Models. Discrete Hidden Markov Models with and without momentum are trained on English text and malware opcode data and compared. The performance of momentum is measured by the change in score and classification accuracy per iteration. Experiments indicate that adding momentum to Baum-Welch can reduce the number of iterations required for initial convergence during HMM training, particularly in cases where the model is slow to converge. However, momentum does not appear to improve the final model performance at higher numbers of iterations.




Robustness of image-based malware analysis

by Katrina Tran

Being able to identify malware is important in preventing attacks. Image-based malware analysis is the study of images that are created from malware. Analyzing these images can help identify patterns in malware families, such as evolutionary changes over time. In previous work, "gist descriptor" features extracted from images have been used in malware classification problems and have shown promising results. In this research, we determine whether gist descriptors are robust with respect to malware obfuscation techniques, as compared Convolutional Neural Networks (CNN) that are trained directly on malware images. Using the Python Image Library (PIL), we create images from malware executables and also from malware that we have obfuscated. We conduct experiments to compare classifying these images with a CNN compared to extracting the gist descriptor features from these images to use in classification. For the gist descriptors, we consider a variety of classification algorithms, including k-nearest neighbors (k-NN), random forest, support vector machine (SVM), and multi-layer perceptron (MLP). We find that gist descriptors are more robust with respect to our obfuscation techniques, as compared to a CNN.




Darknet traffic classification

by Nhien Rust-Nguyen

The anonymous nature of darknets is commonly exploited for illegal activities. Previous research has employed machine learning and deep learning techniques to automate the detection of darknet traffic to block these criminal activities. This research aims to improve darknet traffic detection by assessing Support Vector Machines (SVM), Random Forest (RF), Convolutional Neural Networks (CNN), and Auxiliary-Classifier Generative Adversarial Networks (AC-GAN) for classification of network traffic and the underlying application types. We find that our RF model outperforms state-of-the-art machine learning techniques used in prior research with the CIC-Darknet2020 dataset. To evaluate the robustness of our RF classifier, we degrade its performance through an obfuscation scenario where we confuse application types by transforming their traffic features. We demonstrate that our best-performing classifier could be defeated by obfuscation, then show how to defend against such obfuscation.




Concept drift and malware detection

by Xiaoli Tong

In software development, new software is often based on a previous version with some improvements or new features. A similar software development practice holds true for malware writers, that is, hackers tend to add features to existing malware and release revised versions, which can be viewed as belonging to existing malware families. Therefore, a malware family typically evolves over time. In this paper, we build on recent research that has demonstrated that malware evolution can be detected using machine learning techniques. Specifically, we account for concept drift in the context of malware evolution, in the sense that we retrain our models whenever substantial evolution is detected. By accounting for concept drift, we obtain improved results as compared to models that do not consider concept drift.