Stamp's Master's Students' Defenses: Spring 2024

Who	When	Where	Title
Gauri Anil Godghase	May 17 @ 3:00pm	MH 225	Distinguishing Chatbot from Human
Atharva Khadilkar	May 14 @ 11:00am	MH 210	Malware Detection Using QR and Aztec Code Representations
Rohit Mapakshi	May 13 @ 9:00am	MH 229	An Empirical Analysis of Adversarial Attacks in Federated Learning
Sarvagya Bhargava	May 16 @ 11:00am	MH 229	Robustness of Learning Models to Label Flipping Attacks

Distinguishing Chatbot from Human

by Gauri Anil Godghase

There have been many recent advances in the field of Generative Artificial Intelligence and Large Language Models (LLM), with the GPT-3 (ChatGPT) model being one of the frontrunners. These LLMs have become so powerful that they can produce human-like text. In this research, we consider the problem of distinguishing chatbot generated text from human generated text using machine learning. We collect a large dataset of human generated paragraphs, and we use ChatGPT to generate a comparable collection paragraphs. We then train a wide range of machine learning models on various features extracted from this data. We find that it is surprisingly easy to distinguish chatbot generated text from human generated text, with our best models yielding accuracies of about 99%. We analyze our models and features so as to better understand why chatbot text stands out from human text.

Malware Detection Using QR and Aztec Code Representations

by Atharva Khadilkar

In recent years, the use of image-based techniques for malware detection has gained prominence, with numerous studies demonstrating the efficacy of deep learning approaches such as convolutional neural networks (CNNs) in classifying images derived from executable files. In this paper, we consider an innovative method that relies on an image conversion process that consists of transforming executable files into QR and Aztec codes. These codes capture structural patterns in a format that may enhance the learning capabilities of CNNs. We design and implement CNN architectures tailored to the unique properties of these codes and apply them in a comprehensive analysis involving two extensive malware datasets, alongside a significant corpus of benign executables. Our results, which surpass those of comparable studies, suggest that the choice of image conversion strategy is crucial, and that using QR and Aztec codes offers a promising direction for future research in malware detection.

An Empirical Analysis of Adversarial Attacks in Federated Learning

by Rohit Mapakshi

In this research, we experimentally analyze the susceptibility of selected Federated Learning (FL) systems to the presence of adversarial clients. We find that temporal attacks significantly affect model performance in FL, especially when the adversaries are active throughout and during the ending rounds of the FL process. We consider a wide variety of machine learning models, including Multinominal Logistic Regression, Support Vector Classifier (SVC), Neural Network models like Multilayer Perceptron (MLP), Convolution Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), as well as tree-based machine learning models; specifically, Random Forest and XGBoost. Our results highlight the effectiveness of temporal attacks and the need to develop strategies to make the FL process more robust. We also explore defense mechanisms, including outlier detection in the aggregation algorithm.

Robustness of Learning Models to Label Flipping Attacks

by Sarvagya Bhargava

In this research, we compare traditional machine learning and deep learning models trained on a malware dataset when subjected to adversarial attacks based on label flipping. We investigate the robustness of different models when faced with misleading labels, assessing their ability to maintain their accuracy in the face of such adversarial manipulations. We find that different models differ substantially in their robustness to label flipping attacks.