|Detection and Analysis of Malware Evolution
|Word Embedding Techniques for Malware Classification
|Malware Classification Based on HMM and Word2Vec Features
|Troll Detection on Weibo using Sentiment Analysis
Malware is malicious software that causes disruption,
allows access to unapproved resources,
or performs other unauthorized activity. Developing effective malware detection
techniques is a critical aspect of information security. One difficulty
that arises is that malware often evolves over time, due
to changing goals of malware developers, or to counter advances in detection.
This evolution can occur through various modifications in malware code.
To maintain effective malware detection, it is necessary to detect
and analyze malware evolution so that appropriate countermeasures can be taken.
We perform a variety of experiments to detect points in time where a malware family
has likely evolved. We then conduct further experiments to confirm that such
evolution has actually occurred. We validate our approach by
considering a number of malware families, each of which
includes a significant number of samples collected over an extended
period of time. All of our experiments are based on machine learning models,
and hence our techniques require minimal human intervention and can easily
Word embeddings are often used in natural language processing
as a means to quantify relationships between words. More generally,
these same word embedding techniques can be used to quantify relationships
between features. In this paper, we conduct a series of experiments
that are designed to determine the effectiveness of word embedding
techniques in the context of malware classification.
First, we conduct experiments where hidden Markov models (HMM)
are directly applied to opcode sequences.
These results serve to establish a baseline for
comparison with our subsequent word embedding experiments.
We then experiment with word embedding vectors derived from
HMMs—a technique that we refer to as HMM2Vec.
In another set of experiments,
we generate vector embeddings based on principal component analysis,
which we refer to as PCA2Vec. And, for a third set of word embedding
experiments, we consider the well-known neural network based
technique, Word2Vec. In each of these word embedding experiments,
we derive feature embeddings based on opcode sequences
for malware samples from a variety of different families.
We show that in most cases, we obtain improved classification accuracy
using feature embeddings, as compared to our baseline HMM experiments.
These results provide strong evidence that word embedding techniques
can play a useful role in feature engineering within the field of
Malware classification is an important and challenging problem in
information security. Modern malware classification
techniques rely on machine learning models that can be
trained on a wide variety of features, including opcode sequences,
API calls, and byte n-grams, among many others. In this research,
we implement hybrid machine learning techniques, where we train
hidden Markov models (HMM) and compute Word2Vec encodings based on
opcode sequences. The resulting trained HMMs and Word2Vec embedding vectors
are then used as features for classification algorithms. Specifically, we consider
support vector machine (SVM), k-nearest neighbor (k-NN),
random forest (RF), and deep neural network (DNN) classifiers.
We conduct substantial experiments over a variety of malware families.
Our results surpass those of comparable classification experiments.
The impact of social media on the modern world is difficult to overstate. Virtually all companies and public figures have social media accounts on popular platforms, such as Twitter and Facebook. In China, the micro-blogging service provider Sina Weibo is the most popular such service. To overcome negative publicity, Weibo trolls—the so-called Water Army—can be hired to post deceptive comments.
In recent years, troll detection and sentiment analysis have
been studied, but we are not aware of any research that considers
troll detection based on sentiment analysis. In this research,
we focuses on troll detection via sentiment analysis
with other user activity data gathered on the Sina Weibo
platform, where the content is mainly in Chinese. We
implement techniques for Chinese sentence segmentation, word embeddings, and
sentiment score calculations. We employ the resulting techniques to
develop and test a sentiment analysis approach for troll detection,
based on a variety of machine learning strategies.
Experimental results are generated
and analyzed. A Chrome extension is presented that implements
our proposed technique, which enables real-time troll detection
when a user browses Sina Weibo tweets and comments.
Password (encrypted with a Caesar's cipher): 230223
Password (encrypted with a Caesar's cipher): 062368
Password (encrypted with a Caesar's cipher): 799804
Password (encrypted with a Caesar's cipher): 233056