Stamp's Master's Students' Defenses: Spring 2022

Who	When	Where	Title
Huy Nguyen	May 17 @ 10:00am	~~MH 225~~ Zoom (details below)	Generative adversarial networks for image-based malware classification
Andrew Miller	May 18 @ noon	MH 422	Hidden Markov models with momentum
Katrina Tran	May 19 @ 2:00pm	MH 225	Robustness of image-based malware analysis
Nhien Rust-Nguyen	May 24 @ 11:00am	MH 422	Darknet traffic classification
Xiaoli Tong	May 24 @ noon	MH 225	Concept drift and malware detection

Generative adversarial networks for image-based malware classification

by Huy Nguyen

Malware detection and analysis are important topics in cybersecurity. For efficient malware removal, determination of malware threat levels, and damage estimation, malware family classification plays a critical role. For example, Windows Defender often lists the malware type for detected malicious files so that the victim can assess the damage and search for information and removal tools online. With the rise in computing power and the advent of cloud computing, deep learning models for malware analysis have gained in popularity. In this research, we extract features from malware executable files and represent them as images using various approaches. We then focus on Generative Adversarial Networks (GAN) for multiclass classification and compare our GAN results to other popular machine learning techniques, including Support Vector Machine (SVM), XGBoost, and Restricted Boltzmann Machines (RBM). We also evaluate the utility of the GANs generative models for adversarial attacks on image-based malware detection. We find that the AC-GAN discriminator is competitive with other machine learning techniques.

Zoom details: Join from PC, Mac, Linux, iOS or Android:
https://sjsu.zoom.us/j/84981555463?pwd=chV2-pyqYBSCxwxheEG9YR9m4Fcd25.1
Password: 272138

Hidden Markov models with momentum

by Andrew Miller

Momentum is a popular technique for improving convergence rates during gradient descent. In this research, we experiment with adding momentum to the Baum-Welch expectation-maximization algorithm for training Hidden Markov Models. Discrete Hidden Markov Models with and without momentum are trained on English text and malware opcode data and compared. The performance of momentum is measured by the change in score and classification accuracy per iteration. Experiments indicate that adding momentum to Baum-Welch can reduce the number of iterations required for initial convergence during HMM training, particularly in cases where the model is slow to converge. However, momentum does not appear to improve the final model performance at higher numbers of iterations.

Robustness of image-based malware analysis

by Katrina Tran

Being able to identify malware is important in preventing attacks. Image-based malware analysis is the study of images that are created from malware. Analyzing these images can help identify patterns in malware families, such as evolutionary changes over time. In previous work, "gist descriptor" features extracted from images have been used in malware classification problems and have shown promising results. In this research, we determine whether gist descriptors are robust with respect to malware obfuscation techniques, as compared Convolutional Neural Networks (CNN) that are trained directly on malware images. Using the Python Image Library (PIL), we create images from malware executables and also from malware that we have obfuscated. We conduct experiments to compare classifying these images with a CNN compared to extracting the gist descriptor features from these images to use in classification. For the gist descriptors, we consider a variety of classification algorithms, including k-nearest neighbors (k-NN), random forest, support vector machine (SVM), and multi-layer perceptron (MLP). We find that gist descriptors are more robust with respect to our obfuscation techniques, as compared to a CNN.

Darknet traffic classification

by Nhien Rust-Nguyen

The anonymous nature of darknets is commonly exploited for illegal activities. Previous research has employed machine learning and deep learning techniques to automate the detection of darknet traffic to block these criminal activities. This research aims to improve darknet traffic detection by assessing Support Vector Machines (SVM), Random Forest (RF), Convolutional Neural Networks (CNN), and Auxiliary-Classifier Generative Adversarial Networks (AC-GAN) for classification of network traffic and the underlying application types. We find that our RF model outperforms state-of-the-art machine learning techniques used in prior research with the CIC-Darknet2020 dataset. To evaluate the robustness of our RF classifier, we degrade its performance through an obfuscation scenario where we confuse application types by transforming their traffic features. We demonstrate that our best-performing classifier could be defeated by obfuscation, then show how to defend against such obfuscation.

Concept drift and malware detection

by Xiaoli Tong

In software development, new software is often based on a previous version with some improvements or new features. A similar software development practice holds true for malware writers, that is, hackers tend to add features to existing malware and release revised versions, which can be viewed as belonging to existing malware families. Therefore, a malware family typically evolves over time. In this paper, we build on recent research that has demonstrated that malware evolution can be detected using machine learning techniques. Specifically, we account for concept drift in the context of malware evolution, in the sense that we retrain our models whenever substantial evolution is detected. By accounting for concept drift, we obtain improved results as compared to models that do not consider concept drift.