One Step Closer to Reading Minds, the Representation of Words in the Brain.

By Neuroamer @Neuroamer

We can’t quite read thoughts yet, but we’ve gotten pretty good at figuring out what sorts of things you’re hearing about.

Check out this amazing video recently released alongside the publication from Huth et al. of the Gallant Lab, and then read my summary below of how they achieved this. This is the first study to study the semantic system of the brain using a data-driven approach to determine semantic categories in an unbiased way.

Words appearing in the Moth Podcasts 3,000 words listened to in the fMRI e.g. Twin

Common English Language Words

(from Wikipedia’s then in-accurately-titled List of 1000 Basic Words)
985 words used as ‘semantic features’ to categorize each of the 3000 words. (If the words often co-occurred with one another in a large corpus of text, the Moth Word would have a high value in that feature.) e.g. Brother, Sister, Truck, War

10,470 commonly words

(10,000 of the most commonly occurring words from the large text corpus, plus 470 words that appeared in the Moth podcasts that were not part of the common 10,000)
10,470 words, were also analyzed for co-occurrence and categorized with the 985 semantic features

Reduced Feature Space to Simply Further Analysis 4 dimensions explained significant variance in the 10,470 words in 985 dimensions, as shown by Principal Component Analysis (PCA)

E.g. In the first dimension one end “… favours categories related to humans and social interaction, including ‘social’, ‘emotional’, ‘violent’ and ‘communal’. The other end favours categories related to perceptual descriptions, quantitative descriptions and setting, including ‘tactile’, ‘locational’, ‘numeric’ and ‘visual’.”

Categories of Words within the 4 dimensions 12 clusters were found from the 10,470 words projected onto the 4 dimensions

“The labels assigned to the 12 categories were ‘tactile’ (a cluster containing words such as ‘fingers’), ‘visual’ (words such as ‘yellow’), ‘numeric’ (‘four’), ‘locational’ (‘stadium’), ‘abstract’ (‘natural’), ‘temporal’ (‘minute’), ‘professional’ (‘meetings’), ‘violent’ (‘lethal’), ‘communal’ (‘schools’), ‘mental’ (‘asleep’), ‘emotional’ (‘despised’) and ‘social’ (‘child’)”

a. Subjects listened to 2 hours of The Moth Podcast in an fMRI, which contained 3,000 words. Data-mining texts like Wikipedia, the experiments first determined the co-occurrence of these 3,000 Moth words with 985 of the most common words in the English language. So, for example if the word ‘Twins’ was in the set of Moth words, it might have a high co-occurrence with words like ‘Brother,’ ‘Sister,’ or ‘Pregnant,’ but a low co-occurence with words like ‘Truck’ or ‘War.’ Each of the Moth’s 3,000 words was categorized like this for how often it co-occured with the 985 common words.

As subjects listened to the Moth, their brains were scanned using the fMRI, and based on which areas were activated by which words, they created a model that estimated which areas were most selectively-responsive to each word. So an area that responds highly to words that co-occur with ‘Brother,’ but very little to words that co-occur with ‘Truck,’ ‘War,’ or other concepts, would be highly selective for the meaning, or semantic concept, of a brother.

A computational model was created of which voxels (3d volume pixels, see what they did there?) that represent a small area of brain, responded to which semantic features.

b. Experimenters tested their computational models by having the same subject listen to a new 10-min story they had never heard before (and therefore had not been included in the data used to create the model). They used their model to predict which voxels would be activated by which words in the new story, and calculated the performance of the model as the correlation between their predicted responses and the subject’s actual fMRI responses.


The two main important results from this study are that:

  1. Semantic maps are more universal than we might have thought (at least in English-speaking, academic elite, right-handers)
  2. Provides additional evidence semantic concepts are represented in a widespread network throughout the brain, and both hemispheres, as opposed to just the left temporal cortex as was clasically thought.
  3. Further, this study could provide an important map through which to assess specific patterns of brain damage from strokes or dementias that cause anomic aphasias.

Experiment around with the maps yourself, and see which semantic concepts are localized where! Also, if you appreciated this article please do me a solid and share it, and check out my article about brain scans of lucid dreams, that includes another video from the gallant lab, showing that fMRI scans can be used to reconstruct visual data as well!

If you want to see more articles like this write a comment below and follow me to stay updated on new posts!