CS297 Proposal

Synthesis Video From Giving Frames

Lei Zhang (lei.zhang01@sjsu.edu)

Advisor: Dr. Chris Pollett

Description:

Since Ian Goodfellow invented Generative Adversarial Network (GAN) in 2014, it's a hot research area. People have published a batch of papers to use GAN to create fake human faces, turn an image of a horse into an image of zebra, and to generate animations from screenplays. In this project, we will develop an app that takes a single photo as input and generates a short output video of a particular kind. For example, if the input is a head, one kind might be the head-turning. These kinds are based on a learning training set of videos. With this technology, AI can be used to generate longer fake videos sequences. This work extends earlier work of [1] [3] where a start and end keyframe was used to generate videos. Here rather than the end frame, we supply a category of video we want.

Schedule:

Week 1: Aug 21 - Aug 27	First meeting.
Week 2: Aug 28 - Sep 3	Read paper [7]
Week 3: Sep 4 - Sep 10	Read GANs in Action chapter 1-4
Week 4: Sep 11 - Sep 17	Deliverable 1 due
Week 5: Sep 18 - Sep 24	Read paper [6]
Week 6: Sep 25 - Oct 1	Read paper [5]
Week 7: Oct 2 - Oct 8	Deliverable 2 due
Week 8: Oct 9 - Oct 15	Read paper [4]
Week 9: Oct 16 - Oct 22	Work on Deliverable 3
Week 10: Oct 23 - Oct 29	Deliverable 3 due
Week 11: Oct 30 - Nov 5	Use keras convlstm to predict next frame
Week 12: Nov 6 - Nov 12	Read paper [8]
Week 13: Nov 13 - Nov 19	Try to improve video generation with technologies that similar with TGAN and MoCoGAN
Week 14: Nov 20 - Nov 26	Deliverable 4 due
Week 15: Nov 27 - Dec 3	Work on CS 297 report
Week 16: Dec 4 - Dec 10	CS 297 report due

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Implement a simple GAN with Python and use it to generate Chinese character digits

2. Implement a video GAN to create fake videos

3. Explore to create video with pix2pix

4. Create a GAN-based framework to generate fake videos with input from a single picture

5. Final CS 297 report.

References:

[1] Li, Yunpeng, Dominik Roblek, and Marco Tagliasacchi. "From Here to There: Video Inbetweening Using Direct 3D Convolutions." arXiv preprint arXiv:1905.10240 (2019).

[2] E. Zakharov, A. Shysheya, E. Burkov, and V. Lempitsky, “Few-shot adversarial learning of realistic neural talking head models,” 2019.

[3] Clark, Aidan, Jeff Donahue, and Karen Simonyan. "Efficient Video Generation on Complex Datasets." arXiv preprint arXiv:1907.06571 (2019).

[4] Wang, Ting-Chun, et al. "High-resolution image synthesis and semantic manipulation with conditional gans." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

[5] Ji, Shuiwang, et al. "3D convolutional neural networks for human action recognition." IEEE transactions on pattern analysis and machine intelligence 35.1 (2012): 221-231.

[6] Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. "Generating videos with scene dynamics." Advances In Neural Information Processing Systems. 2016.

[7] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In NIPS’2014.

[8] M. Saito, E. Matsumoto, and S. Saito, "Temporal generative adversarial nets with singular value clipping, " In IEEE International Conference on Computer Vision (ICCV), 2017.