Stamp's Master's Students' Defenses: Spring 2021

Who	When	Where	Title
Lolitha Sresta Tupadha	May 5 @ 2:00pm	Zoom	Machine learning to detect malware evolution
Han-Chih Chang	May 13 @ 11:00am	Zoom	Keystroke dynamics based on machine learning
Jianwei Li	May 17 @ 3:00pm	Zoom	Keystroke dynamics for user authentication with fixed and free text
Rakesh Nagaraju	May 14 @ noon	Zoom	Malware analysis with auxiliary-classifier GAN
Xinxin Yang	May 7 @ 1:00pm	Zoom	Computer-aided diagnosis of low grade endometrial stromal sarcoma (LGESS)
Ruchira Gothankar	May 11 @ 11:00am	Zoom	Clickbait detection in YouTube videos
Shamli Singh	May 14 @ 1:00pm	Zoom	Hidden Markov model-based clustering for malware classification
Aditi Walia	May 17 @ noon	Zoom	Data augmentation of malware images
Deanne Charan	May 20 @ 2:00pm	Zoom	Classic cryptanalysis with GANs

Machine learning to detect malware evolution

by Lolitha Sresta Tupadha

Malware evolves over time and antivirus must adapt to such evolution. Hence, it is critical to detect those points in time where malware has evolved so that appropriate countermeasures can be undertaken. In this research, we perform a variety of experiments on a significant number of malware families to determine when malware evolution is likely to have occurred. All of the evolution detection techniques that we consider are based on machine learning and can be fully automated—in particular, no reverse engineering or other labor-intensive manual analysis is required. Specifically, we consider analysis based on hidden Markov models and various word embedding techniques, including Word2Vec and HMM2Vec.

Keystroke dynamics based on machine learning

by Han-Chih Chang

The development of active and passive biometric authentication and identification technology plays an increasingly important role in cybersecurity. Biometrics that utilize features derived from keystroke dynamics have been studied in this context. Keystroke dynamics can be used to analyze the way that a user types by monitoring various keyboard input. Previous work has considered the feasibility of user authentication and classification based on keystroke dynamics. In this research, we analyze a wide variety of machine learning and deep learning models based on keystroke-derived features, we optimize the resulting models, and we compare our results to those obtained in related research. We find that a model that combines a convolutional neural network (CNN) and a gated recurrent unit (GRU) preforms best in our experiments. This model also outperforms previous research in this field.

Keystroke dynamics for user authentication with fixed and free text

by Jianwei Li

In this research, we consider the problem of verifying user identity based on keystroke dynamics obtained from fixed text or free text. For fixed-text typing behavior, multiple machine learning and deep learning methods are employed, with XGBoost and Multi-Layer Perceptron (MLP) performing best. For free-text typing, we employ a novel feature engineering method that generates image-like transition matrices. For these image-like features, a convolution neural network (CNN) with cutout achieves results that are competitive with previous work. A hybrid model consisting of a CNN and a recurrent neural network (RNN) is shown to outperform previous research for free-text typing.

Malware analysis with auxiliary-classifier GAN

by Rakesh Nagaraju

Generative adversarial networks (GAN) are a class of powerful machine learning techniques, where both a generative and discriminative model are trained simultaneously. A recent trend in malware research consists of treating executables as images and employing image-based analysis techniques. In this research, we generate fake malware images using GANs, and we consider the effectiveness of GANs for malware classification. Specifically, we consider auxiliary classifier GAN (AC-GAN), which enables us to work with multiclass data. We find that AC-GAN generates "deep fake" images, in the sense that our GAN-generated malware images cannot be reliably distinguished from real malware images, based on convolutional neural networks (CNN). In addition, the detection capabilities of AC-GAN is shown to exceed other image-based techniques that have appeared in the literature.

Computer-aided diagnosis of low grade endometrial stromal sarcoma (LGESS)

by Xinxin Yang

Low grade endometrial stromal sarcoma (LGESS) is rare form of cancer, accounting for about 0.2% of all uterine cancer cases. Approximately 75% of LGESS patients are initially misdiagnosed with leiomyoma, which is a type of benign tumor that is also known as fibroids. In this research, uterine tissue biopsy images of potential LGESS patients are preprocessed using segmentation and staining normalization algorithms. A variety of classic machine learning and leading deep learning models are then applied to classify tissue images as either benign or cancerous. For the classic techniques considered, the highest classification accuracy we attain is 85%, while our best deep learning model achieves an accuracy of 87%. These results clearly indicate that properly trained learning algorithms can play a useful role in the diagnosis of LGESS.

Clickbait detection in YouTube videos

by Ruchira Gothankar

YouTube videos often include captivating descriptions and intriguing thumbnails designed to increase the number of views, and thereby increase the revenue for the person who posted the video. This creates an incentive for people to post clickbait videos, in which the content might deviate significantly from the title, description, or thumbnail. In effect, users are tricked into clicking on clickbait videos. In this research, we consider the challenging problem of detecting clickbait YouTube videos. We experiment with multiple state-of-the-art machine learning techniques using a variety of textual features.

Hidden Markov model-based clustering for malware classification

by Shamli Singh

Automated techniques to classify malware samples into their respective families are of critical importance in cybersecurity. Previous research applied k-means clustering to scores generated by hidden Markov models (HMM) as a means of dealing with the malware classification problem. In this research, we follow an analogous approach, but instead of using HMMs to generate scores, we directly cluster trained HMMs. We carefully analyze the results obtained over a large and challenging malware dataset.

Data augmentation of malware images

by Aditi Walia

Machine learning and deep learning techniques for malware detection and classification play an important role in the mitigation of cybersecurity threats. However, such techniques are often limited by a lack of training data. Previous research has shown promising classification results by treating malware executables as images. In this research, we consider auxiliary classifier GANs (AC-GAN) for data augmentation of malware images. We train convolution neural networks (CNN) to determine how accurately our generated images model the original malware samples. We also consider adversarial scenarios, where our augmented data can be used to corrupt the CNN training process.

Classic cryptanalysis with GANs

by Deanne Charan

The necessity of protecting critical information has been understood for millennia. Although classic ciphers have inherent weaknesses in comparison to modern ciphers, many classic ciphers are extremely challenging to break in practice. Machine learning techniques, such as hidden Markov models (HMM), have recently been applied with success to various classic cryptanalysis problems. In this research, we consider the effectiveness of the deep learning technique CipherGAN—which is based on the well-established generative adversarial network (GAN) architecture—for classic cipher cryptanalysis. We experiment extensively with CipherGAN on a number of classic ciphers, and we compare our results to those obtained using HMMs.

Stamp's Master's Students' Defenses: Spring 2021

Machine learning to detect malware evolution

by Lolitha Sresta Tupadha

Keystroke dynamics based on machine learning

by Han-Chih Chang

Keystroke dynamics for user authentication with fixed and free text

by Jianwei Li

Malware analysis with auxiliary-classifier GAN

by Rakesh Nagaraju

Computer-aided diagnosis of low grade endometrial stromal sarcoma (LGESS)

by Xinxin Yang

Clickbait detection in YouTube videos

by Ruchira Gothankar

Hidden Markov model-based clustering for malware classification

by Shamli Singh

Data augmentation of malware images

by Aditi Walia

Classic cryptanalysis with GANs

by Deanne Charan

Zoom details for Lolitha's talk

Zoom details for Han-Chih's talk

Zoom details for Jianwei's talk

Zoom details for Rakesh's talk

Zoom details for Xinxin's talk

Zoom details for Ruchira's talk

Zoom details for Shamli's talk

Zoom details for Aditi's talk

Zoom details for Deanne's talk