Tracking Emotions of Unseen Persons by Their Context.

Martinez does a commentary on work by Chen and Whitney (open source) in the same issue of PNAS. Here is a clip from that commentary:

Face perception is a fundamental component of our cognitive system and, arguably, a core ability that allowed humans to create the large, advanced societies of today. When we look at someone else’s face, we recognize who they are, whether they are female or male, attractive or unattractive, and happy or sad; that is, their affective state. Correctly interpreting these signals is essential for a functional, cooperative society. For example, when looking at the faces in Fig. 1, most people identify a female expressing sadness on the left and an angry male on the right. But while identity and other attributes are recognized quite accurately, affect is not. To see this, look at the images in Fig. 2A and B. What expressions would you now say these two individuals express? Most of us classify them as expressing excitement or euphoria; that is, positive emotions. What is behind this radical change in our interpretation of these images? Context. Our interpretation of a facial configuration is dependent on the context in which the facial expression is situated. In an ambitious new study in PNAS, Chen and Whitney show that people make reasonably good predictions of people’s affect when only the contextual information is known; that is, when the face is not observable (Fig. 2C). This inference is shown to be accurate, even when the whole body of the person is masked (Fig. 2D), thus preventing an inference based on body pose. Context, therefore, is not only necessary for a correct interpretation of how others feel but, in some instances, it is sufficient. This surprising result will provide renewed interest in the value that context plays in our interpretation of how others feel.

Fig. 1. When asked to identify the emotions shown in these images, most people agree that the left image expresses sadness, while the right image is a clear display of anger. If asked whether these expressions communicate positive or negative valence, most people agree that both correspond to a negative expression. The problem with these assessments is that context is not observable, which may lead to incorrect interpretations. Images courtesy of (Left) Imgflip and (Right) Getty Images/Michael Steele.

Fig. 2. Adding context to the facial expressions previously seen in Fig. 1 radically changes our interpretation of the emotion being experienced by a person. (A and B) In these two images, most observers agree that the people shown are experiencing a joyful event (i.e., positive valence). (C and D) When the face and body are blurred out, inference of valence and arousal is still possible. Images courtesy of (Upper Left, Lower Left, and Lower Right) Imgflip and (Upper Right) Getty Images/Michael Steele.

Here is the Chen and Whitney abstract:

Emotion recognition is an essential human ability critical for social functioning. It is widely assumed that identifying facial expression is the key to this, and models of emotion recognition have mainly focused on facial and bodily features in static, unnatural conditions. We developed a method called affective tracking to reveal and quantify the enormous contribution of visual context to affect (valence and arousal) perception. When characters’ faces and bodies were masked in silent videos, viewers inferred the affect of the invisible characters successfully and in high agreement based solely on visual context. We further show that the context is not only sufficient but also necessary to accurately perceive human affect over time, as it provides a substantial and unique contribution beyond the information available from face and body. Our method (which we have made publicly available) reveals that emotion recognition is, at its heart, an issue of context as much as it is about faces.