Stamp's Master's Students' Defenses: Spring 2020

Sunhera Paul
May 8 @ 11:00am
Detection and Analysis of Malware Evolution
Aniket Chandak
May 11 @ 10:00am
Word Embedding Techniques for Malware Classification
Aparna Kale
May 13 @ 10:00am
Malware Classification Based on HMM and Word2Vec Features
Zidong Jiang
May 13 @ 11:00am
Troll Detection on Weibo using Sentiment Analysis

Detection and Analysis of Malware Evolution

by Sunhera Paul

Malware is malicious software that causes disruption, allows access to unapproved resources, or performs other unauthorized activity. Developing effective malware detection techniques is a critical aspect of information security. One difficulty that arises is that malware often evolves over time, due to changing goals of malware developers, or to counter advances in detection. This evolution can occur through various modifications in malware code. To maintain effective malware detection, it is necessary to detect and analyze malware evolution so that appropriate countermeasures can be taken. We perform a variety of experiments to detect points in time where a malware family has likely evolved. We then conduct further experiments to confirm that such evolution has actually occurred. We validate our approach by considering a number of malware families, each of which includes a significant number of samples collected over an extended period of time. All of our experiments are based on machine learning models, and hence our techniques require minimal human intervention and can easily be automated.

Word Embedding Techniques for Malware Classification

by Aniket Chandak

Word embeddings are often used in natural language processing as a means to quantify relationships between words. More generally, these same word embedding techniques can be used to quantify relationships between features. In this paper, we conduct a series of experiments that are designed to determine the effectiveness of word embedding techniques in the context of malware classification. First, we conduct experiments where hidden Markov models (HMM) are directly applied to opcode sequences. These results serve to establish a baseline for comparison with our subsequent word embedding experiments. We then experiment with word embedding vectors derived from HMMs—a technique that we refer to as HMM2Vec. In another set of experiments, we generate vector embeddings based on principal component analysis, which we refer to as PCA2Vec. And, for a third set of word embedding experiments, we consider the well-known neural network based technique, Word2Vec. In each of these word embedding experiments, we derive feature embeddings based on opcode sequences for malware samples from a variety of different families. We show that in most cases, we obtain improved classification accuracy using feature embeddings, as compared to our baseline HMM experiments. These results provide strong evidence that word embedding techniques can play a useful role in feature engineering within the field of malware analysis.

Malware Classification Based on Hidden Markov Model and Word2Vec Features

by Aparna Kale

Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on a wide variety of features, including opcode sequences, API calls, and byte n-grams, among many others. In this research, we implement hybrid machine learning techniques, where we train hidden Markov models (HMM) and compute Word2Vec encodings based on opcode sequences. The resulting trained HMMs and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), k-nearest neighbor (k-NN), random forest (RF), and deep neural network (DNN) classifiers. We conduct substantial experiments over a variety of malware families. Our results surpass those of comparable classification experiments.

Troll Detection on Weibo using Sentiment Analysis

by Zidong Jiang

The impact of social media on the modern world is difficult to overstate. Virtually all companies and public figures have social media accounts on popular platforms, such as Twitter and Facebook. In China, the micro-blogging service provider Sina Weibo is the most popular such service. To overcome negative publicity, Weibo trolls—the so-called Water Army—can be hired to post deceptive comments.

In recent years, troll detection and sentiment analysis have been studied, but we are not aware of any research that considers troll detection based on sentiment analysis. In this research, we focuses on troll detection via sentiment analysis with other user activity data gathered on the Sina Weibo platform, where the content is mainly in Chinese. We implement techniques for Chinese sentence segmentation, word embeddings, and sentiment score calculations. We employ the resulting techniques to develop and test a sentiment analysis approach for troll detection, based on a variety of machine learning strategies. Experimental results are generated and analyzed. A Chrome extension is presented that implements our proposed technique, which enables real-time troll detection when a user browses Sina Weibo tweets and comments.

Zoom details for Sunhera Paul's talk

Password (encrypted with a Caesar's cipher): 230223

Zoom details for Aniket Chandak's talk

Password (encrypted with a Caesar's cipher): 062368

Zoom details for Aparna Kale's talk

Password (encrypted with a Caesar's cipher): 799804

Zoom details for Zidong Jiang's talk

Password (encrypted with a Caesar's cipher): 233056