Abstract: During a speech voice is enriched to convey not only intended language message but also the emotion state of a person. Recent advancements suggested that emotion is a integral to our rational and intelligent decision, which helps to relate expressing our feelings. Speech prosody is one of the important communicative channel influences to express the emotional message. Pitch contour is one of the important features of speech that is affected by modulation by which emotions can be recognized whether it may be a neutral or emotion speech. In this project the pitch contour is analyzed in order to extract salient features for emotion detection. Fundamental frequency 𝐹0 is plotted based upon pitch contour features like mean, minimum, maximum, standard deviation, kurtosis, upper quartile ,lower quartile, skewness and inflexion. Here 𝐹0 is a rhythmic property of speech source using global statistics of pitch contour over entire utterance or sentence known as Sentence Level. Entrance of pitch features over voiced region is known as Voiced Level. Probability density function has to be plotted for each feature and for all emotional and reference data bases. In order to discriminate neutral and emotional speech KLD is used. If the distance between reference neutral and emotion are similar it is best features. If the distance between reference neutral and emotion are different then it is lowest.GMM which is a parametric model used for building these best features and testing the GMM to identify the emotional speech. Result will be indicating that emotional modulation is not uniformly distributed in time and space across different communicative languages.
Keywords: Emotional speech analysis, emotional speech recognition, expressive speech, KLD, pitch contour analysis.
[1] J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallet, and N. Dahlgren, "TIMIT Acoustic- Phonetic Continuous Speech Corpus,' 1993. [2] S. Lee, S. Yildirim, A. Kazemzadeh, and S. Narayanan, "An articulatory study of emotional speech production," in Proc. 9th Eur. Conf. Speech Commun. Technol. (Interspeech'05—Eurospeech), Lisbon, Portugal, Sep. 2005, pp. 497–500. [3] M. Grimm, K. Kroschel, E. Mower, and S. Narayanan, "Primitives- based evaluation and estimation of emotions in speech," Speech Commun., vol. 49, no. 10–11, pp. 787–800, Oct.–Nov. 2007. [4] M. Liberian, K. Davis, M. Grossman, N. Marty, and J. Bell, "Emotional prosody speech and transcripts," in Proc. Linguist. Data Consortium, Philadelphia, PA, 2002, CD-ROM. [5] F. Burckhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, "A data base of German emotional speech," in 9th European Conf. Speech Communication and Technology (Interspeech'2005—Eurospeech), Lisbon, Portugal, Sep. 2005, pp. 1517–1520. [6] J. Montero, J. Gutiérrez-Arriola, S. Palazuelos, E. Enrique, S. Aguilera, and J. Pardon, "Emotional speech synthesis: From speech database to TTS," in 5th Int. Conf. Spoken Lang. Process. (ICSLP'98),Sydney, Australia, Nov.–Dec. 1998, pp. 923–925. [7] E. Douglas-Cowie, N. Campbell, R. Cowie, and P. Roach, "Emotional speech: Towards a new generation of databases," Speech Commun., vol.40, no. 1–2, pp. 33–60, Apr. 2003. [8] S. Ananthakrishnan and S. Narayanan, "Automatic prosody labeling Using acoustic, lexical, and syntactic evidence," IEEE Trans. Speech.