CS297 Proposal
Synthesis Video From Giving Frames
Lei Zhang (lei.zhang01@sjsu.edu)
Advisor: Dr. Chris Pollett
Description:
Since Ian Goodfellow invented Generative Adversarial Network (GAN) in 2014, it's a hot research area. People have published a batch of papers to use GAN to create fake human faces, turn an image of a horse into an image of zebra, and to generate animations from screenplays. In this project, we will develop an app that takes a single photo as input and generates a short output video of a particular kind. For example, if the input is a head, one kind might be the head-turning. These kinds are based on a learning training set of videos. With this technology, AI can be used to generate longer fake videos sequences. This work extends earlier work of [1] [3] where a start and end keyframe was used to generate videos. Here rather than the end frame, we supply a category of video we want.
Schedule:
Week 1:
Aug 21 - Aug 27 | First meeting. |
Week 2:
Aug 28 - Sep 3 | Read paper [7] |
Week 3:
Sep 4 - Sep 10 | Read GANs in Action chapter 1-4 |
Week 4:
Sep 11 - Sep 17 | Deliverable 1 due |
Week 5:
Sep 18 - Sep 24 | Read paper [6] |
Week 6:
Sep 25 - Oct 1 | Read paper [5] |
Week 7:
Oct 2 - Oct 8 | Deliverable 2 due |
Week 8:
Oct 9 - Oct 15 | Read paper [4] |
Week 9:
Oct 16 - Oct 22 | Work on Deliverable 3 |
Week 10:
Oct 23 - Oct 29 | Deliverable 3 due |
Week 11:
Oct 30 - Nov 5 | Use keras convlstm to predict next frame |
Week 12:
Nov 6 - Nov 12 | Read paper [8] |
Week 13:
Nov 13 - Nov 19 | Try to improve video generation with technologies that similar with TGAN and MoCoGAN |
Week 14:
Nov 20 - Nov 26 | Deliverable 4 due |
Week 15:
Nov 27 - Dec 3 | Work on CS 297 report |
Week 16:
Dec 4 - Dec 10 | CS 297 report due |
Deliverables:
The full project will be done when CS298 is completed. The following will
be done by the end of CS297:
1. Implement a simple GAN with Python and use it to generate Chinese character digits
2. Implement a video GAN to create fake videos
3. Explore to create video with pix2pix
4. Create a GAN-based framework to generate fake videos with input from a single picture
5. Final CS 297 report.
References:
[1] Li, Yunpeng, Dominik Roblek, and Marco Tagliasacchi. "From Here to There: Video Inbetweening Using Direct 3D Convolutions." arXiv preprint arXiv:1905.10240 (2019).
[2] E. Zakharov, A. Shysheya, E. Burkov, and V. Lempitsky, “Few-shot adversarial learning of realistic neural talking head models,” 2019.
[3] Clark, Aidan, Jeff Donahue, and Karen Simonyan. "Efficient Video Generation on Complex Datasets." arXiv preprint arXiv:1907.06571 (2019).
[4] Wang, Ting-Chun, et al. "High-resolution image synthesis and semantic manipulation with conditional gans." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[5] Ji, Shuiwang, et al. "3D convolutional neural networks for human action recognition." IEEE transactions on pattern analysis and machine intelligence 35.1 (2012): 221-231.
[6] Vondrick, Carl, Hamed Pirsiavash, and Antonio Torralba. "Generating videos with scene dynamics." Advances In Neural Information Processing Systems. 2016.
[7] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In NIPS’2014.
[8] M. Saito, E. Matsumoto, and S. Saito, "Temporal generative adversarial nets with singular value clipping, " In IEEE International Conference on Computer Vision (ICCV), 2017.
|