An AI-powered application that acts as a personal public speaking advisor. Upload a video of yourself speaking, and the AI will perform a holistic analysis of your communication skills, providing a comprehensive report with actionable feedback to help you improve.
Multimodal Analysis: The coach analyzes both verbal and non-verbal communication channels:
- 🎭 Visual Analysis: Detects facial expressions and emotions through computer vision techniques
- 👁️ Eye Contact: Measures how consistently you look at the camera
- 📝 Speech Analysis: Transcribes speech and analyzes pacing (WPM), use of filler words, and sentiment
Comprehensive Reports: Generate detailed assessments with:
- 📊 Emotion Distribution: Visual radar charts showing expression variety
- 📈 Speech Metrics: Word count, pace, filler word frequency
- 🖼️ Expression Samples: Visual examples of your different facial expressions
Actionable Feedback: Receive personalized suggestions for improvement in:
- 🗣️ Speaking Style: Pace, filler word reduction, speech clarity
- 😊 Emotional Expression: Expressiveness, variety, appropriateness
- 📹 Camera Presence: Positioning, eye contact, engagement
This project orchestrates several state-of-the-art open-source technologies:
- Backend:Python
- AI / Machine Learning:
- Speech Recognition:
openai-whisperfor accurate speech-to-text - Sentiment Analysis: Hugging Face
transformersfor text sentiment evaluation - Computer Vision:
OpenCVfor face detection and expression analysis
- Speech Recognition:
- Visualization:
matplotlibfor data visualization and charting - Frontend Interface:
gradiofor interactive web UI - Media Processing:
ffmpeg-pythonfor audio extraction
The application follows a streamlined data processing pipeline:
graph TD A[🎥 Video Upload] --> B{Extract Audio}; B --> C[🎧 Audio File]; A --> D[🎞️ Video Frames]; subgraph "Parallel Analysis" D --> E[👁️ OpenCV: Facial Analysis & Emotion Detection]; C --> G[📝 Whisper: Speech-to-Text]; end G --> H[🧐 Text Analysis: Pace, Fillers, Sentiment]; subgraph "Data Integration" E --> I{🧠 Analysis Engine}; H --> I; end I --> J[📊 Generate Visual Reports & Suggestions]; J --> K[🖥️ Display in Gradio Interface]; - Python 3.8+
- FFmpeg installed on your system
- Clone the repository:
git clone https://github.com/yourusername/ai-communication-coach.git cd ai-communication-coach- Install the required dependencies:
pip install -r requirements.txt- Run the application:
python app.pyThe web interface will be available at http://localhost:7860
- Upload Your Video: Use the upload button or record directly in your browser
- Wait for Processing: The AI will analyze your video (this may take a few moments)
- Review Your Report: Explore the comprehensive analysis of your presentation
- Implement Feedback: Apply the personalized suggestions to improve your skills
The AI Communication Coach provides a detailed assessment including:
- Speech Transcription: Full text of your presentation
- Sentiment Analysis: The emotional tone of your words
- Speaking Pace: Words per minute compared to ideal ranges
- Filler Word Detection: Frequency and types of verbal fillers
- Facial Expression Analysis: Distribution of emotions detected
- Eye Contact Measurement: Consistency of camera engagement
- Visual Presence: Assessment of positioning and framing
Contributions are welcome! If you'd like to improve the AI Communication Coach:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add some amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for speech recognition
- Hugging Face Transformers for sentiment analysis
- OpenCV for computer vision capabilities
- Gradio for the interactive web interface
- Matplotlib for data visualization
For questions or feedback, please open an issue on the GitHub repository.