Stamp's Master's Students' Defenses: Spring 2018

Who	When	Where	Title
Rachel Gonsalves	May 11 @ 9:00am	CL 100H	Data Poisoning Attacks on HMMs
Wei-Chung Huang	May 15 @ noon	MH 422	Image Robust Hashing for Malware Detection
Samuel Kim	May 11 @ 10:00am	MH 320	PE Headers for Malware Classification
Aditya Raghavan	May 11 @ noon	MH 425	Boosted HMMs for Malware Detection
Anish Singh Shekhawat	May 14 @ 1:00pm	MH 225	Analysis of Encrypted Malicious Traffic
Supraja Suresh	May 15 @ 11:00am	MH 225	Analyzing Android Adware

Data Poisoning Attacks on Hidden Markov Models

by Rachel Gonsalves

With the ever increasing use of ever increasing volumes of data, machine learning systems involving minimal human oversight are crucial for classification and analysis tasks. Machine learning algorithms used for such purposes have revolutionized the way we sort, classify, and analyze data. The accuracy of any machine learning algorithm depends heavily on the data it is trained on. In some circumstances, an attacker can attempt to poison the training data to subvert a machine learning system. In this research, we analyze the effects of training data poisoning attacks on hidden Markov models (HMMs), in the context of malware classification. We find that HMMs are surprisingly sensitive to such attacks.

Image Robust Hashing for Malware Detection

by Wei-Chung Huang

Robust hashing is a technique that has been successfully used to detect similarity in images. In this research, we consider a novel robust-hashing inspired approach for detecting malware families. Specifically, we treat each executable file as a two-dimensional image and use robust hashing techniques to determine whether a given executable belongs to a particular family or not. The robust hashing stage comprises two steps, namely, feature extraction, and compression, while the classification phase is based on machine learning. We compare our robust hashing approach to other machine learning based malware classification techniques.

PE Headers for Malware Classification

by Samuel Kim

Recent research indicates that effective malware detection can be based on analyzing portable executable (PE) file headers. Such research typically relies on prior knowledge of the header to extract relevant features. However, it is also possible to consider the entire header as a whole, and use this directly to determine whether the file is malware. In this research, we collect a large and diverse malware dataset. We then analyze the effectiveness of various machine learning techniques based on PE headers to classify the malware samples. We compare the accuracy and efficiency of each technique considered.

Boosted Hidden Markov Models for Malware Detection

by Aditya Raghavan

Digital security is an important issue today, and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection has recently seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has found widespread application in the field of pattern matching in general—and malware detection in particular—is hidden Markov models (HMMs). Since HMM training relies on a hill climb technique, we can often significantly improve a model by training multiple times with different initial values. In contrast, boosting is a general technique for combining weaker models to yield a stronger model. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained using multiple random restarts, in the context of the malware detection problem. These techniques are applied to a variety of challenging malware datasets and we analyze and compare the results in terms of effectiveness and efficiency.

Analysis of Encrypted Malicious Traffic

by Anish Singh Shekhawat

In recent years there has been a dramatic increase in the number of malware attacks that use encrypted HTTP traffic for propagation and communication. Due to the volume of legitimate encrypted data, it can be difficult to filter encrypted malicious traffic from the vast background of benign traffic. Since antivirus software and firewalls will not typically have access to encryption keys, this poses a serious challenge for antivirus software and firewalls. Hence, detection techniques are needed that do not require decrypting the traffic. In this research, we apply a variety of machine learning techniques to the problem of distinguishing malicious from benign encrypted HTTP traffic. We show that we can obtain high accuracy with practical systems.

Analyzing Android Adware

by Supraja Suresh

Most Android smartphone apps are free—to generate revenue, the app developers embed ad libraries so that advertisements are displayed when the app is being used. Billions of dollars are lost annually due to ad fraud on Android devices. In this research, we propose a machine learning based scheme to detect Android adware. We consider both static and dynamic features, and combinations thereof. Specifically, we collect static features from the manifest file, while our dynamic features are derived from network traffic. Using these features, we develop and analyze a tiered approach, where we initially classify Android applications into broad categories (e.g., adware, malware, benign) and then further classify each application into a more specific family. We employed a variety of machine learning techniques including neural networks, random forests, AdaBoost, and support vector machines.