Huy Nguyen | Generative adversarial networks for image-based malware classification | ||
Andrew Miller | Hidden Markov models with momentum | ||
Katrina Tran | Robustness of image-based malware analysis | ||
Nhien Rust-Nguyen | Darknet traffic classification | ||
Xiaoli Tong | Concept drift and malware detection |
Malware detection and analysis are important topics in cybersecurity.
For efficient malware removal, determination of
malware threat levels, and damage estimation, malware family classification plays
a critical role. For example, Windows Defender often lists the malware type for
detected malicious files so that the victim can assess the damage and search for
information and removal tools online. With the rise in computing
power and the advent of cloud computing, deep learning models
for malware analysis have gained in popularity. In this research, we extract
features from malware executable files and represent them as images using
various approaches. We then focus on Generative Adversarial Networks (GAN)
for multiclass classification and compare our GAN results to other popular
machine learning techniques, including Support Vector Machine (SVM),
XGBoost, and Restricted Boltzmann Machines (RBM).
We also evaluate the utility of the GANs generative models for
adversarial attacks on image-based malware detection. We find that the
AC-GAN discriminator is competitive with other machine learning techniques.
Zoom details:
Join from PC, Mac, Linux, iOS or Android:
https://sjsu.zoom.us/j/84981555463?pwd=chV2-pyqYBSCxwxheEG9YR9m4Fcd25.1
Password: 272138
Momentum is a popular technique for improving convergence rates during
gradient descent. In this research, we experiment with adding momentum to the
Baum-Welch expectation-maximization algorithm for training Hidden Markov Models.
Discrete Hidden Markov Models with and without momentum are trained on English
text and malware opcode data and compared. The performance of momentum is measured
by the change in score and classification accuracy per iteration.
Experiments indicate that adding momentum to Baum-Welch can reduce the number
of iterations required for initial convergence during HMM training,
particularly in cases where the model is slow to converge. However,
momentum does not appear to improve the final model performance at
higher numbers of iterations.
Being able to identify malware is important in preventing
attacks. Image-based malware analysis is the study of images
that are created from malware. Analyzing these images can help
identify patterns in malware families,
such as evolutionary changes over time.
In previous work, "gist descriptor" features
extracted from images have been used in malware classification
problems and have shown promising results. In this research,
we determine whether gist descriptors are robust with
respect to malware obfuscation techniques, as compared
Convolutional Neural Networks (CNN)
that are trained directly on malware images.
Using the Python Image Library (PIL), we create images from
malware executables and also from malware that we have obfuscated.
We conduct experiments to compare classifying these images
with a CNN compared to extracting the gist descriptor features
from these images to use in classification. For the gist descriptors,
we consider a variety of classification algorithms, including k-nearest
neighbors (k-NN), random forest, support vector machine (SVM),
and multi-layer perceptron (MLP). We find that gist descriptors
are more robust with respect to our obfuscation techniques,
as compared to a CNN.
The anonymous nature of darknets is commonly exploited for illegal activities.
Previous research has employed machine learning and deep learning techniques to
automate the detection of darknet traffic to block these criminal activities.
This research aims to improve darknet traffic detection by assessing
Support Vector Machines (SVM), Random Forest (RF), Convolutional Neural Networks (CNN),
and Auxiliary-Classifier Generative Adversarial Networks (AC-GAN) for classification
of network traffic and the underlying application types. We find that our RF model
outperforms state-of-the-art machine learning techniques used in prior research
with the CIC-Darknet2020 dataset. To evaluate the robustness of our RF classifier,
we degrade its performance through an obfuscation scenario where we confuse
application types by transforming their traffic features. We demonstrate that our
best-performing classifier could be defeated by obfuscation, then show how to
defend against such obfuscation.
In software development, new software is often based on a previous version
with some improvements or new features. A similar software development
practice holds true for malware writers, that is, hackers tend to add
features to existing malware and release revised versions, which can
be viewed as belonging to existing malware families. Therefore, a malware
family typically evolves over time. In this paper, we build on
recent research that has demonstrated that malware evolution can be
detected using machine learning techniques. Specifically, we account for
concept drift in the context of malware evolution, in the sense that we
retrain our models whenever substantial evolution is detected.
By accounting for concept drift, we obtain improved results
as compared to models that do not consider concept drift.