Stamp's Master's Students' Defenses: Spring 2021






Who
When
Where
Title
Lolitha Sresta Tupadha
May 5 @ 2:00pm
Zoom
Machine learning to detect malware evolution
Han-Chih Chang
May 13 @ 11:00am
Zoom
Keystroke dynamics based on machine learning
Jianwei Li
May 17 @ 3:00pm
Zoom
Keystroke dynamics for user authentication with fixed and free text
Rakesh Nagaraju
May 14 @ noon
Zoom
Malware analysis with auxiliary-classifier GAN
Xinxin Yang
May 7 @ 1:00pm
Zoom
Computer-aided diagnosis of low grade endometrial stromal sarcoma (LGESS)
Ruchira Gothankar
May 11 @ 11:00am
Zoom
Clickbait detection in YouTube videos
Shamli Singh
May 14 @ 1:00pm
Zoom
Hidden Markov model-based clustering for malware classification
Aditi Walia
May 17 @ noon
Zoom
Data augmentation of malware images
Deanne Charan
May 20 @ 2:00pm
Zoom
Classic cryptanalysis with GANs






Machine learning to detect malware evolution

by Lolitha Sresta Tupadha

Malware evolves over time and antivirus must adapt to such evolution. Hence, it is critical to detect those points in time where malware has evolved so that appropriate countermeasures can be undertaken. In this research, we perform a variety of experiments on a significant number of malware families to determine when malware evolution is likely to have occurred. All of the evolution detection techniques that we consider are based on machine learning and can be fully automated—in particular, no reverse engineering or other labor-intensive manual analysis is required. Specifically, we consider analysis based on hidden Markov models and various word embedding techniques, including Word2Vec and HMM2Vec.


Keystroke dynamics based on machine learning

by Han-Chih Chang

The development of active and passive biometric authentication and identification technology plays an increasingly important role in cybersecurity. Biometrics that utilize features derived from keystroke dynamics have been studied in this context. Keystroke dynamics can be used to analyze the way that a user types by monitoring various keyboard input. Previous work has considered the feasibility of user authentication and classification based on keystroke dynamics. In this research, we analyze a wide variety of machine learning and deep learning models based on keystroke-derived features, we optimize the resulting models, and we compare our results to those obtained in related research. We find that a model that combines a convolutional neural network (CNN) and a gated recurrent unit (GRU) preforms best in our experiments. This model also outperforms previous research in this field.


Keystroke dynamics for user authentication with fixed and free text

by Jianwei Li

In this research, we consider the problem of verifying user identity based on keystroke dynamics obtained from fixed text or free text. For fixed-text typing behavior, multiple machine learning and deep learning methods are employed, with XGBoost and Multi-Layer Perceptron (MLP) performing best. For free-text typing, we employ a novel feature engineering method that generates image-like transition matrices. For these image-like features, a convolution neural network (CNN) with cutout achieves results that are competitive with previous work. A hybrid model consisting of a CNN and a recurrent neural network (RNN) is shown to outperform previous research for free-text typing.


Malware analysis with auxiliary-classifier GAN

by Rakesh Nagaraju

Generative adversarial networks (GAN) are a class of powerful machine learning techniques, where both a generative and discriminative model are trained simultaneously. A recent trend in malware research consists of treating executables as images and employing image-based analysis techniques. In this research, we generate fake malware images using GANs, and we consider the effectiveness of GANs for malware classification. Specifically, we consider auxiliary classifier GAN (AC-GAN), which enables us to work with multiclass data. We find that AC-GAN generates "deep fake" images, in the sense that our GAN-generated malware images cannot be reliably distinguished from real malware images, based on convolutional neural networks (CNN). In addition, the detection capabilities of AC-GAN is shown to exceed other image-based techniques that have appeared in the literature.


Computer-aided diagnosis of low grade endometrial stromal sarcoma (LGESS)

by Xinxin Yang

Low grade endometrial stromal sarcoma (LGESS) is rare form of cancer, accounting for about 0.2% of all uterine cancer cases. Approximately 75% of LGESS patients are initially misdiagnosed with leiomyoma, which is a type of benign tumor that is also known as fibroids. In this research, uterine tissue biopsy images of potential LGESS patients are preprocessed using segmentation and staining normalization algorithms. A variety of classic machine learning and leading deep learning models are then applied to classify tissue images as either benign or cancerous. For the classic techniques considered, the highest classification accuracy we attain is 85%, while our best deep learning model achieves an accuracy of 87%. These results clearly indicate that properly trained learning algorithms can play a useful role in the diagnosis of LGESS.


Clickbait detection in YouTube videos

by Ruchira Gothankar

YouTube videos often include captivating descriptions and intriguing thumbnails designed to increase the number of views, and thereby increase the revenue for the person who posted the video. This creates an incentive for people to post clickbait videos, in which the content might deviate significantly from the title, description, or thumbnail. In effect, users are tricked into clicking on clickbait videos. In this research, we consider the challenging problem of detecting clickbait YouTube videos. We experiment with multiple state-of-the-art machine learning techniques using a variety of textual features.


Hidden Markov model-based clustering for malware classification

by Shamli Singh

Automated techniques to classify malware samples into their respective families are of critical importance in cybersecurity. Previous research applied k-means clustering to scores generated by hidden Markov models (HMM) as a means of dealing with the malware classification problem. In this research, we follow an analogous approach, but instead of using HMMs to generate scores, we directly cluster trained HMMs. We carefully analyze the results obtained over a large and challenging malware dataset.


Data augmentation of malware images

by Aditi Walia

Machine learning and deep learning techniques for malware detection and classification play an important role in the mitigation of cybersecurity threats. However, such techniques are often limited by a lack of training data. Previous research has shown promising classification results by treating malware executables as images. In this research, we consider auxiliary classifier GANs (AC-GAN) for data augmentation of malware images. We train convolution neural networks (CNN) to determine how accurately our generated images model the original malware samples. We also consider adversarial scenarios, where our augmented data can be used to corrupt the CNN training process.


Classic cryptanalysis with GANs

by Deanne Charan

The necessity of protecting critical information has been understood for millennia. Although classic ciphers have inherent weaknesses in comparison to modern ciphers, many classic ciphers are extremely challenging to break in practice. Machine learning techniques, such as hidden Markov models (HMM), have recently been applied with success to various classic cryptanalysis problems. In this research, we consider the effectiveness of the deep learning technique CipherGAN—which is based on the well-established generative adversarial network (GAN) architecture—for classic cipher cryptanalysis. We experiment extensively with CipherGAN on a number of classic ciphers, and we compare our results to those obtained using HMMs.




Zoom details for Lolitha's talk

https://sjsu.zoom.us/j/89264948001?pwd=Tjg0Rk40SVNOVFgyZ010SkpHVlAwQT09

Password (encrypted with a Caesar's cipher): 875485

Zoom details for Han-Chih's talk

https://sjsu.zoom.us/j/84774068125?pwd=Wi9uOUZRL2ZtUU84RzRiUFpvK0JvZz09

Password (encrypted with a Caesar's cipher): 083283

Zoom details for Jianwei's talk

https://sjsu.zoom.us/j/81883032995?pwd=U1o0SnQzUUIzb1MrZmIrb2QraGtVZz09

Password (encrypted with a Caesar's cipher): 887446

Zoom details for Rakesh's talk

https://sjsu.zoom.us/j/82472298802?pwd=bTRzcVZkMjBUd0hsUi8wSzFUOHpaZz09

Password (encrypted with a Caesar's cipher): 081368

Zoom details for Xinxin's talk

https://sjsu.zoom.us/j/85927634979?pwd=Rks0VElpOWFyakdaV09zRTMwblNIZz09

Password (encrypted with a Caesar's cipher): 522389

Zoom details for Ruchira's talk

https://sjsu.zoom.us/j/85775665384?pwd=RERFVVdHb09qdnA2Vkdvd0FUN0NKQT09

Password (encrypted with a Caesar's cipher): 570953

Zoom details for Shamli's talk

https://sjsu.zoom.us/j/88234305389?pwd=NWE5YzN3WkFtNzUzQk1XUWxNVXIwUT09

Password (encrypted with a Caesar's cipher): 247448

Zoom details for Aditi's talk

https://sjsu.zoom.us/j/85053362127?pwd=Mlhob2piNHFKeDJQd2pXWXU0RnIwZz09

Password (encrypted with a Caesar's cipher): 809697

Zoom details for Deanne's talk

https://sjsu.zoom.us/j/89452422530?pwd=VEx5dUxXSVRmWFduc0dnclg2d1A4UT09

Password (encrypted with a Caesar's cipher): 618255