Paul Heggarty and colleagues present a new framework for the chronology and divergence of languages in the Indo-European family, which places the family’s origin at around 8300 BP – older than previous analyses. Their study also reconciles current linguistic and ancient DNA evidence to suggest that Indo-European languages first arose south of the Caucasus and subsequently branched northward to the Steppe regions before expanding throughout Eurasia. The origins and spread of Indo-European languages, which are spoken by nearly half of the world’s population, have long been debated. Much of the dispute centers on where the language group originated, with some scholars supporting an origin in the eastern Fertile Crescent that subsequently spread alongside agriculture and others supporting an origin in the Steppe region with spread facilitated by horse-based pastoralism. The new analysis by Heggarty et al. uses a dataset of 100 modern languages and 51 non-modern languages, examining shared word origin among the core vocabulary in these languages. The new dataset increases language sampling and does not necessarily assume that modern spoken languages derive directly from ancient written languages, which the authors say have hampered previous analyses. The resulting phylolinguistic family trees do not fully support either an agriculture or pastoralism-based origin for the language family, but instead support a “hybrid” hypothesis that contains elements of both scenarios in the spread of Indo-European languages, the authors write.
For over two hundred years, the origin of the Indo-European languages has been disputed. Two main theories have recently dominated this debate: the ‘Steppe’ hypothesis, which proposes an origin in the Pontic-Caspian Steppe around 6000 years ago, and the ‘Anatolian’ or ‘farming’ hypothesis, suggesting an older origin tied to early agriculture around 9000 years ago. Previous phylogenetic analyses of Indo-European languages have come to conflicting conclusions about the age of the family, due to the combined effects of inaccuracies and inconsistencies in the datasets they used and limitations in the way that phylogenetic methods analyzed ancient languages.
To solve these problems, researchers from the Department of Linguistic and Cultural Evolution at the Max Planck Institute for Evolutionary Anthropology assembled an international team of over 80 language specialists to construct a new dataset of core vocabulary from 161 Indo-European languages, including 52 ancient or historical languages. This more comprehensive and balanced sampling, combined with rigorous protocols for coding lexical data, rectified the problems in the datasets used by previous studies.
Indo-European estimated to be around 8100 years old
The team used recently developed ancestry-enabled Bayesian phylogenetic analysis to test whether ancient written languages, such as Classical Latin and Vedic Sanskrit, were the direct ancestors of modern Romance and Indic languages, respectively. Russell Gray, Head of the Department of Linguistic and Cultural Evolution and senior author of the study, emphasized the care they had taken to ensure that their inferences were robust. “Our chronology is robust across a wide range of alternative phylogenetic models and sensitivity analyses”, he stated. These analyses estimate the Indo-European family to be approximately 8100 years old, with five main branches already split off by around 7000 years ago.
These results are not entirely consistent with either the Steppe or the farming hypotheses. The first author of the study, Paul Heggarty, observed that “Recent ancient DNA data suggest that the Anatolian branch of Indo-European did not emerge from the Steppe, but from further south, in or near the northern arc of the Fertile Crescent — as the earliest source of the Indo-European family. Our language family tree topology, and our lineage split dates, point to other early branches that may also have spread directly from there, not through the Steppe.”
New insights from genetics and linguistics
The authors of the study therefore proposed a new hybrid hypothesis for the origin of the Indo-European languages, with an ultimate homeland south of the Caucasus and a subsequent branch northwards onto the Steppe, as a secondary homeland for some branches of Indo-European entering Europe with the later Yamnaya and Corded Ware-associated expansions. “Ancient DNA and language phylogenetics thus combine to suggest that the resolution to the 200-year-old Indo-European enigma lies in a hybrid of the farming and Steppe hypotheses”, remarked Gray.
Wolfgang Haak, a Group Leader in the Department of Archaeogenetics at the Max Planck Institute for Evolutionary Anthropology, summarizes the implications of the new study by stating, “Aside from a refined time estimate for the overall language tree, the tree topology and branching order are most critical for the alignment with key archaeological events and shifting ancestry patterns seen in the ancient human genome data. This is a huge step forward from the mutually exclusive, previous scenarios, towards a more plausible model that integrates archaeological, anthropological and genetic findings.”
JOURNAL
Science