Real-time detection of voice in mobile phone media playback and automatically generate subtitles in the corresponding language.