Emotions & Emotional Prosody
Understanding how sound encodes and conveys emotion.
🌐 Emotions
Emotional interpretation can vary by language, country, culture, age, and more. There are so many variables, it is a wonder we can communicate emotions at all - and it is fascinating that some emotions can be interpreted universally (anger, for example) and even across animal species (your dog surely knows when you are angry).
Adding to the variability, each emotion has its own spectrum. Anger for example: Low Anger (low pitch, slow/deliberate rate, monotone) → Hot Anger (high/varied pitch, yell). Try starting with low anger and building to hot anger. This is a familiar progression.
🎙 Emotions in Singing
Beyond lyrics, emotions are conveyed through various "para-linguistic" aspects of speech. Lyrics are what you say. But how you say it is what conveys the emotion. This nuance of speech that conveys emotion is called Emotional Prosody - all the non-verbal aspects of your voice: the pitch, the tone, the loudness or softness, etc.
- Prosody
- The patterning of sound that shapes meaning in spoken or sung communication, including both linguistic structure and emotional expression.
- Linguistic Prosody
- The encoding of language through variations in sound (phrase and word structure, intonation, emphasis).
- Emotional Prosody
- The conveying of emotional meaning through variations in sound outside the use of language.
📊 Emotional Prosody - General Characteristics
How does sound alone convey emotion? The following general characteristics are being interpreted:
🧠 Three Acoustic Dimensions
Emotional meaning in sound is realized when the brain interprets three acoustic dimensions:
The evaluation of these aspects of the sound and the resulting bodily response is what we feel as an emotion. This occurs in humans and many animals alike.
If you've ever attempted to communicate with a pet, you are using emotional prosody. The meaning of the words is irrelevant - the pet recognizes the audible characteristics, identifies patterns, and the sounds trigger their own emotional responses.
These extremes of arousal and valence are pretty universally interpreted. But, even when obvious to the ear, it can be difficult to quantify the characteristics that convey a particular emotion. Take sarcasm - it may be easy to identify in familiar company but more difficult to describe the exact inflections, and some people may miss it entirely. Mapping specific sound characteristics to discrete emotions is really complex because there are so many variables.
🗺 Mapping Emotional Prosody
Emotional Prosody has been studied for many years, yet only a very general mapping exists. Modern studies are currently underway to train AI models to identify and use emotion in speech.
Mapping common acoustic properties to emotion categories examines: Pitch, Temporal (articulation, rhythm, rate), Loudness, and Timbre (intonation, vocal quality).
Most of the study of Emotional Prosody is focused on spoken communication - not singing.
Acoustic Emotion Mapping
Academic graph: vocal emotion mappings circa 1996 - illustration coming soon
○ Unemotional - A Contrast
Some songs or performances are “emotionally blank.” They may convey an energy level - excited, animated, sedate, lethargic - but don’t include a specific emotion. These states of arousal are not themselves emotions.
Sometimes, this is the desired effect. Punk is fast and loud but not necessarily angry; Emo can be emotionally drained while lyrically emotional. But sometimes the audible cues for a more specific emotion are simply missing - the singer is expressing something, but the cues for emotional valence (positive/negative) or appraisal (the reason behind the energy) aren’t clear enough to name it.
Non-Emotions / Unemotional Arousal States
High arousal and low arousal are not emotions in themselves - they are physiological states that can carry emotion, but don’t inherently encode one. A voice that is loud and fast reads as energized; a voice that is soft and slow reads as calm or sedate. Without the additional cues that signal valence and appraisal, the listener recognizes only arousal level, not the specific feeling behind it.