CS298 Proposal

Quantifying Deep Fake Detection Accuracy for a Variety of Natural Settings.

Pratikkumar Prajapati (pratikkumar.prajapati@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Teng Moh, Dr. Mark Stamp

Abstract:

Advances in artificial intelligence have made it possible to generate fake videos that look very realistic. The techniques
used to generate fake videos are collectively called Deepfakes. Deepfake video can lead to catastrophic consequences in
society. This research focuses on quantifying if the given video is a Deepfake video or real, irrespective of techniques
used to generate Deepfake videos. We would design and implement a Neural Network model to detect Deepfakes and analyze
the accuracy of the model.

CS297 Results

Learned to develop various Neural Networks in PyTorch, including ANNs, CNNs, GANs, and AutoEncoders. Applied
these models to classify and generate Sanskrit characters.
Explored Deep-Fake generation techniques like entire face synthesis, face identify swap, facial attributes manipulation,
and facial expression manipulation.
Studied techniques to generate images of fake human faces and customized existing Deep-Privacy model to generate
images of fake human faces.
Explored recurrent convolution strategies for face manipulation detection in videos.
Collected the Deep-Fake-Detection-Challenge (DFDC) dataset and Face Forensic++ (FF++) dataset.

Proposed Schedule

Week 1: Aug 25 - Aug 31	Kick-off meeting and discuss various techniques to design the model. Specifically, using pre-trained CNN models like EfficientNets to extract features and use RNN to find temporal inconsistencies in consecutive frames of the videos.
Week 2-3: Sept 1 - Sept 14	Develop a PyTorch-based framework to parse DFDC dataset to extract frames from videos, create data augmentation like rotation, adding random noise, adding distractors like text and shapes, etc, and prepare data-loaders to pass to the model. Verify frames are extracted, augmentation is applied to the frames and all data are passed in batches. We would use a dummy model, to begin with.
Week 4-5: Sept 15 - Sept 28	Develop a basic Neural Network model, implement train, test, and validation methods. Also, implement methods to quantify model accuracy. The model will use pre-trained CNN models like EfficientNets to extract features from frames of videos and use RNN to find hidden features of fake and real videos in consecutive frames of the videos. RNN will have options for multiple layers, uni, and bi-directional architecture. Implement methods to save model artifacts like epochs and model weights to prepare the framework to run for a longer duration. Train the model on a subset of DFDC data and quantify the model accuracy.
Week 6-8: Sept 29 - Oct 19	Enhance the model by using various techniques like Attention-based model to focus, facial expression discrepancy, temporal discrepancies in video frames, optimize classifiers using GAN training, etc. Optimize the loss function.
Week 9-11: Oct 20 - Nov 02	Further tune the model by optimizing feature selection methods, hyper-parameters, and loss function. Train for a longer duration on the entire DFDC dataset. Quantify the output of the model.
Week 12-16: Nov 03 - Dec 07	Complete project report and slides.

Key Deliverables:

Design
- A Neural Network architecture that learns to classify if the given video is fake irrespective of the Deepfake technique
  used to generate the fake video.
Software
- A PyTorch-based framework to parse DFDC dataset to extract frames from videos, create data augmentation like rotation,
  adding random noise, adding distractors like text and shapes, etc, and prepare data-loaders to pass to a dummy model.
- Implementation of a basic model using pre-trained EfficientNets CNN to extract features and RNN to learn about the hidden
  features of the video. Enhance the framework by adding train, test, validation loops. Quantify accuracy by training the model
  with a subset of the DFDC dataset.
- The enhanced model with applied techniques like the Attention-based model to focus on part of the video, facial expression
  discrepancy, temporal discrepancies in video frames, optimize classifiers using GAN training, etc.
- A fully functional and tuned PyTorch-based Neural Network framework which quantifies if the input videos are real or Deepfake,
  irrespective of the method used to generate the Deepfake videos.
Report
- CS 298 Report.
- CS 298 Presentation.

Innovations and Challenges

A PyTorch-based framework to parse DFDC is innovative and challenging because data augmentation like adding random noise, adding
distractors like text and shapes etc need a deeper knowledge of various augmentation techniques and deeper understanding of PyTorch framework.
Using pre-trained models e.g. EffiecientNets to extract features of a given image, involves understanding of state-of-the-art models
and applying transfer-learning to utilize the pre-trained models effectively. This can improve overall training time and efficiency but
requires knowledge of advanced PyTorch framework methods to apply it to the given problem set.
Applying the latest state-of-the-art techniques like attention modeling, detecting facial expressions discrepancies, temporal discrepancies
in the consecutive frames, etc are very difficult to apply. Also, the latest GAN-generated Deepfakes do not leave GAN-water-marks, so
detections of such techniques are more challenging.
Various techniques, like entire face synthesis, face-identify swap, facial attributes manipulation, and facial expression manipulation
are used to generate fake images and videos. Designing model to detect DeepFakes is challenging because model trained with Deepfakes
generated with one technique does not work well with Deepfakes generated with other techniques [4]. Designing overall model architecture
that classifies Deepfakes irrespective of the Deepfake-generation technique used, is innovative and complex.

References:

[1] E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and P. Natarajan, "Recurrent Convolutional Strategies for Face
Manipulation Detection in Videos," in Proc. Conference on Computer Vision and Pattern Recognition Workshops, 2019.

[2] https://ai.facebook.com/datasets/dfdc/ Accessed: 2020-01-30

[3] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press

[4] Tolosana, Ruben, et al. "DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection." arXiv preprint
arXiv:2001.00179 (2020).

[5] Vaswani, A., et al.: Attention is all you need. arXiv (2017). arXiv:1706.03762

[6] Shan Li and Weihong Deng. Deep facial expression recognition: A survey. arXiv:1804.08348, 2018.