Historical Linguistics and Population Genetics

Reich et al. provide a model of two ancient populations in India that are ancestral to modern populations—Ancestral North Indians (ANI) and Ancestral South Indians (ASI). According to Reich et al, ANI is, on average, more genetically similar to Middle Easterners, Central Asians, and Europeans. ASI, on the other hand, is distinct from ANI as well as from East Asian populations. This same study found that “ANI ancestry ranges from 39–71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers.” Furthermore, Reich et al. showed that the Indian caste system is old and historically implacable—high F_ST values indicate that “strong endogamy must have shaped marriage patterns in India for thousands of years.” This seriously contradicts the claims of Edward Said, Nicholas Dirks, and others who have argued that caste in India was more fluid and less systematized before British imperial rule.

However, a recent paper (Moorjani et al. 2013) does show fluid population admixture between Indian groups somewhere between 1,900 and 4,200 years ago.

Our analysis documents major mixture between populations in India that occurred 1,900 – 4,200 years BP, well after the establishment of agriculture in the subcontinent. We have further shown that groups with umixed ANI and ASI ancestry were plausibly living in India until this time. This contrasts with the situation today in which all groups in mainland India are admixed. These results are striking in light of the endogamy that has characterized many groups in India since the time of admixture. For example, genetic analysis suggests that the Vysya from Andhra Pradesh have experienced negligible gene flow from neighboring groups in India for an estimated 3,000 years. Thus, India experienced a demographic transformation during this time, shifting from a region where major mixture between groups was common and affected even isolated tribes such as the Palliyar and Bhil to a region in which mixture was rare.

As the researchers go on to indicate, ~2,000 to 3,000 years ago corresponds to the major transitions attendant to the end of the Harappan civilization and the influx of the Indo-Aryans. Can these genetic studies shed any light on the controversies of Indian language history?

Emeneau’s famous 1956 paper, “India as a Linguistic Area,” holds up reasonably well to contemporary scrutiny. The Indo-Aryan, Dravidian, and Munda language families have obviously influenced one another. Dravidian influence on Indo-Aryan is well attested. But this seems odd given the correlation, discovered by Reich et al. and others, between Indo-European speaking ancestry and upper caste status in India. Another population genetics study (Bamshad et al. 2001) puts it this way:

Indo-European-speaking people from West Eurasia entered India from the Northwest and diffused throughout the subcontinent. They purportedly admixed with or displaced indigenous Dravidic-speaking populations. Subsequently they may have established the Hindu caste system and placed themselves primarily in castes of higher rank.

These “Indo-European-speaking people” probably have something to do with Reich et al.’s Ancestral North Indians. But if these “invaders” were strong enough to admix with and displace the indigenous Dravidic-speaking populations, why does Emeneau find Dravidian influence on Indo-Aryan? Imagine Cherokee influencing English on the scale of 5%. It’s just not going to happen. Most linguistic history shows that dominant languages influence less dominant languages; the opposite rarely occurs, and if it does, its influence on the dominant language is minimal. In another paper, Emeneau has this to say:

[There has long been the assumption] that the Sanskrit-speaking invaders of Northwest India were people of a high, or better, a virile, culture, who found in India only culturally feeble barbarians, and that consequently the borrowings that patently took place from Sanskrit and later Indo-Aryan languages into Dravidian were necessarily the only borrowings that could have occurred . . . It was but natural to operate with the hidden, but anachronistic, assumption that the earliest speakers of Indo-European languages were like the classical Greeks or Romans—prosperous, urbanized bearers of a high civilization destined in its later phases to conquer all Europe and then a great part of the earth—rather than to recognize them for what they doubtless were–nomadic, barbarous looters and cattle-reivers whose fate it was through the centuries to disrupt older civilizations but to be civilized by them.

Rather than the image of Indo-European “invaders” whose civilized power subjugated indigenous Indian populations, Emeneau instead imagines barbarians at the gates. Certainly, the language of nomads would be more socially susceptible to indigenous Dravidian, but how does this picture fit with the recent discovery of early population admixture? Would indigenous Dravidians have been more likely to breed freely with uncivilized nomads roaming and slowly penetrating the borderlands? Possibly.

Michael Witzel might have a different solution. The oldest Indian text following the actual Harappan script itself is the Rigveda, a collection of sacred Vedic Sanskrit hymns. Witzel finds in the earliest sections of the Rigveda several hundred lexical items and a few morphological features that are clearly not of Sanskrit (and therefore, not of Indo-European) origin. His analysis of these features leads him to believe that the language spoken before the arrival of Indo-Europeans—i.e., spoken in the Harappan civilization—was more closely related to the Munda languages and the Austroasiatic language family. In other words, Witzel’s analysis suggests that an Indo-European “invasion” and domination of indigenous Dravidian speakers is probably not an accurate historical picture. A sacred Indo-European text like the Rigveda would not contain so many non-IE loanwords if its speakers had entered the scene as dominant bringers of hierarchy. And given that the non-IE loanwords and morphological features are more likely Austroasiatic than Dravidian, Witzel envisions a time when Indo-European speakers and Dravidian speakers immigrated slowly into Harappan civilization, neither dominant invaders nor barbarous raiders. This would explain the cross-linguistic influence in the Indian subcontinent. It would also explain Moorjani et al.’s recent paper showing major mixture between groups in India prior to the rise of the caste system several thousand years ago.

Or maybe not. Witzel’s theory is not well accepted among historical linguists. And if Indo-Aryan and Dravidian immigration was so gradual and perhaps even egalitarian (Witzel imagines that Harappan urban centers may have been trilingual), from whence came a caste system that so clearly favors one ancestral group over the others? And there’s a nagging question about timing: one study suggests that Reich’s ANI might not fit within the purported timeline of Indo-European speakers’ migration. There’s also the issue of linguistic distribution. Razib Khan notes:

It seems an almost default position by many that the Austro-Asiatics are the most ancient South Asians, marginalized by Dravidians, and later Indo-Europeans. I would not be surprised if it was actually first Dravidians, then Austro-Asiatics and finally Indo-Europeans. Dravidians are found in every corner of the subcontinent (Brahui in Pakistan, a few groups in Bengal, and scattered through the center) while the Austro-Asiatics exhibit a more restricted northeastern range.

It’s all quite messy, but my point is that linguists interested in language contact and linguistic evolution should be reading work in population genetics, too. Papers on population genetics often reference work in historical linguistics; however, I rarely see historical linguists citing population genetics.

Languages Magazine

Historical Linguistics and Population Genetics

About the author

Author's Latest Articles

Magazines

COMMUNITY LANGUAGES