Ritik Mehta | A Natural Language Processing Approach to Malware Classification | ||
Anant Shukla | Social Media Bot Detection using Dropout-GAN |
Many different machine learning and deep learning techniques
have been successfully employed for malware detection and classification.
Examples of popular learning techniques in the malware domain include
Hidden Markov Models (HMM), Random Forests (RF), Convolutional Neural Networks (CNN),
Support Vector Machines (SVM), and Recurrent Neural Networks (RNN)
such as Long Short-Term Memory (LSTM) networks.
In this research, we consider a hybrid architecture,
where HMMs are trained on opcode sequences, and the resulting hidden states
of these trained HMMs are used as feature vectors in various classifiers.
In this context, extracting the HMM hidden state sequences can be viewed as
a form of feature engineering that is somewhat analogous to techniques
that are commonly employed in Natural Language Processing (NLP).
We find that this NLP-based approach outperforms other popular techniques
on a challenging malware dataset, with an HMM-Random Forrest model
yielding the best results.
Bot activity on social media platforms is a pervasive problem,
undermining the credibility of online discourse and potentially
leading to cybercrime. We propose an approach to bot detection using
Generative Adversarial Networks (GAN). We discuss how we overcome the
issue of mode collapse by utilizing multiple discriminators to train
against one generator, while decoupling the discriminator to perform
social media bot detection and utilizing the generator for data augmentation.
In terms of classification accuracy, our approach outperforms the state-of-the-art
techniques in this field. We also show how the generator in the GAN can be
used to evade such a classification technique.