CS298 Proposal

Sign Language Assistant.

Akshay Kajale (akshay.kajale@sjsu.edu)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Robert Chun, Mr. Kiran Salte

Abstract:

Emotion Recognition is one of the most researched topics in modern day machine learning arena. There are various ways in which emotion can be recognized. For example, facial expressions, body postures, speech tone etc. The focus of this research project is to develop a prototype of emotion recognition system by making a hybrid model using computer vision and natural language processing techniques. Our goal hybrid system would use video feeds of different facial expressions and speech features to recognize emotions. Finally, the machine learning model will be deployed on Android application to predict the emotion. The application can access both front and back camera simultaneously and can capture the audio features of the person talking in front of cameras. The hybrid model is implemented with the help of neural network. Our prototype will operate on videos created in Unity 3D humanoid model performing different facial expressions.

CS297 Results:

Developed a sequential and depth wise convolutional model to detect facial expressions.
Created a synthetic dataset covering different facial expressions which will be used for model training and testing. the dataset is created in unity.
Developed an android application on which the deep learning model will be deployed. the application should be able to access both front and rear camera simultaneously so that it can detect expressions of the persons who are facing both the cameras.

Deployed the initial model on Android application. The model was first converted into a tensorlite model and then was deployed on android device.

Proposed Schedule:

Week 1: Jan 27 - Feb 3	Kick-off meeting and review CS298 Proposal.
Week 2-4: Feb 4 - Feb 16	Continue to enhance Android application to pass the live video stream to ML model for prediction. Analyze and experiment with the existing emotion recognition techniques based on Audio.
Week 4-6:Feb 17 - Mar 9	Finalize the technique based on experiments. Generate synthetic dataset which includes both facial expressions and speech.
Week 7-8: Mar 10 - Mar 24	Combine both ML models to generate a Hybrid model which will predict emotion based on both facial expression and audio.
Week 9-11: Mar 25 - Apr 13	Work on the accuracy of the model by performing necessary experiments and hyper parameter tuning.
Week 12-16: Apr 14- May 11	Complete the final report of the project slides.

Key Deliverables:

Design

Design a neural network architecture to detect emotion based on facial expressions and audio.

Software

Implement hybrid model technique which can detect emotions based on both facial expressions and speech.
Developed an Android application which can access both the cameras simultaneously. The model will be deployed on this application. The application will detect emotions of the persons facing both front and back camera
Update the application to record the conversation between the people to detect emotion based on speech
Evaluate and enhance the accuracy of the model on synthetic and real-world dataset.

Report

Document the code.
Write final report.

Innovations and Challenges:

The android application which can access both the cameras simultaneously and predict the emotions of persons facing both front and back camera at a same time.
Developing a hybrid model technique which is considers both facial expressions and speech for predicting the emotion of the person is challenging.
Developing model architecture accurate for the real-life videos.
Deployment of the hybrid model on Android application is challenging.

References

[1]L. Zhang, Y. Yang, W. Li, S. Dang and M. Zhu, "Research of Facial Expression Recognition Based on Deep Learning," 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 2018, pp. 1-4, doi: 10.1109/ICSESS.2018.8663777.

[2]Mao Xu, Wei Cheng, Qian Zhao, Li Ma and Fang Xu, "Facial expression recognition based on transfer learning from deep convolutional networks," 2015 11th International Conference on Natural Computation (ICNC), Zhangjiajie, 2015, pp. 702-708, doi: 10.1109/ICNC.2015.7378076.

[3]A. Fathallah, L. Abdi and A. Douik, "Facial Expression Recognition via Deep Learning," 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), Hammamet, 2017, pp. 745-750, doi: 10.1109/AICCSA.2017.124.

[4]D. Kalita, "Designing of Facial Emotion Recognition System Based on Machine Learning," 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 2020, pp. 969-972, doi: 10.1109/ICRITO48877.2020.9197771.

[5] L. B. Letaifa, M. I. Torres and R. Justo, "Adding dimensional features for emotion recognition on speech," 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 2020, pp. 1-6, doi: 10.1109/ATSIP49331.2020.9231766.

[6] Ekman P. Darwin's contributions to our understanding of emotional expressions. Philos Trans R Soc Lond B Biol Sci. 2009;364(1535):3449-3451. doi:10.1098/rstb.2009.0189