Emotions & Emotional Prosody

Understanding how sound encodes and conveys emotion.

🌐 Emotions

Emotional interpretation can vary by language, country, culture, age, and more. There are so many variables, it is a wonder we can communicate emotions at all - and it is fascinating that some emotions can be interpreted universally (anger, for example) and even across animal species (your dog surely knows when you are angry).

Adding to the variability, each emotion has its own spectrum. Anger for example: Low Anger (low pitch, slow/deliberate rate, monotone) → Hot Anger (high/varied pitch, yell). Try starting with low anger and building to hot anger. This is a familiar progression.

🎙 Emotions in Singing

Beyond lyrics, emotions are conveyed through various "para-linguistic" aspects of speech. Lyrics are what you say. But how you say it is what conveys the emotion. This nuance of speech that conveys emotion is called Emotional Prosody - all the non-verbal aspects of your voice: the pitch, the tone, the loudness or softness, etc.

Prosody
The patterning of sound that shapes meaning in spoken or sung communication, including both linguistic structure and emotional expression.
Linguistic Prosody
The encoding of language through variations in sound (phrase and word structure, intonation, emphasis).
Emotional Prosody
The conveying of emotional meaning through variations in sound outside the use of language.
We can all identify if a voice is "angry" without understanding the language or context. Consider a baby's cry, a dog's whine, a cat's hiss, a lion's roar - these all convey emotion using sound but without words. In Star Wars (1977), R2D2 communicated using only beeps and chirps, yet conveyed emotion very effectively. That is pure Emotional Prosody - by design.

📊 Emotional Prosody - General Characteristics

How does sound alone convey emotion? The following general characteristics are being interpreted:

01
Tempo
Fast → excited, frantic
Slow → calm, tired, disappointed
02
Intensity
Loud → urgent, angry
Soft → shy, uncertain, affectionate
03
Timbre
Bright → cheerful
Distorted → distressed · Pure tone → neutral
Tone color; Overtones and harmonics
04
Pitch Contour
Rising → hopeful, questioning
Falling → sad, resigned · Jagged leaps → panic or alarm

🧠 Three Acoustic Dimensions

Emotional meaning in sound is realized when the brain interprets three acoustic dimensions:

1 Arousal Fast/slow and loud/soft induce a level of stimulation.
2 Valence Bright/dark timbre generates a positive or negative feeling.
3 Appraisal Pitch contour and intensity elicit an emotional stance based on hardwired or learned response.

The evaluation of these aspects of the sound and the resulting bodily response is what we feel as an emotion. This occurs in humans and many animals alike.

If you've ever attempted to communicate with a pet, you are using emotional prosody. The meaning of the words is irrelevant - the pet recognizes the audible characteristics, identifies patterns, and the sounds trigger their own emotional responses.

Pet examples: A pet truly becomes happy when they hear their name (positive valence, high arousal, bright tone). A dog reacts to "No!" especially if a harsh tone is used (negative valence, high arousal, dark tone). Pets can be soothed by "it's okay" in a soft voice (positive valence, low arousal, warm tone).

These extremes of arousal and valence are pretty universally interpreted. But, even when obvious to the ear, it can be difficult to quantify the characteristics that convey a particular emotion. Take sarcasm - it may be easy to identify in familiar company but more difficult to describe the exact inflections, and some people may miss it entirely. Mapping specific sound characteristics to discrete emotions is really complex because there are so many variables.

🗺 Mapping Emotional Prosody

Emotional Prosody has been studied for many years, yet only a very general mapping exists. Modern studies are currently underway to train AI models to identify and use emotion in speech.

Mapping common acoustic properties to emotion categories examines: Pitch, Temporal (articulation, rhythm, rate), Loudness, and Timbre (intonation, vocal quality).

Most of the study of Emotional Prosody is focused on spoken communication - not singing.

📈

Acoustic Emotion Mapping

Academic graph: vocal emotion mappings circa 1996 - illustration coming soon

Unemotional - A Contrast

Some songs or performances are “emotionally blank.” They may convey an energy level - excited, animated, sedate, lethargic - but don’t include a specific emotion. These states of arousal are not themselves emotions.

Sometimes, this is the desired effect. Punk is fast and loud but not necessarily angry; Emo can be emotionally drained while lyrically emotional. But sometimes the audible cues for a more specific emotion are simply missing - the singer is expressing something, but the cues for emotional valence (positive/negative) or appraisal (the reason behind the energy) aren’t clear enough to name it.

Non-Emotions / Unemotional Arousal States

High arousal and low arousal are not emotions in themselves - they are physiological states that can carry emotion, but don’t inherently encode one. A voice that is loud and fast reads as energized; a voice that is soft and slow reads as calm or sedate. Without the additional cues that signal valence and appraisal, the listener recognizes only arousal level, not the specific feeling behind it.

Listening exercise: Compare the Energized and Relaxed examples on the Examples page. Notice how each conveys a clear arousal level but remains emotionally unspecific - and consider what vocal choices would begin to shift each toward a named emotion.