When humans take turns, there is a cyclic structure to the extremely short gaps between speakers' utterances (Sacks, Schegloff, & Jefferson, 1974; Wilson & Wilson, 2005; Wilson & Zimmerman, 1986). A between-turn gap of, say, 200 milliseconds is more likely to be broken by the second speaker at certain regular intervals (say, odd multiples of 50 ms) than during the "troughs" between those intervals. That is, short silences are not of arbitrary length, but reflect a cyclic passing back and forth of who has the "right" to speak next (Wilson & Zimmerman, 1986). The troughs represent moments when the right to speak has shifted back to the original speaker, hence the second speaker inhibits speech during those fractions of a second. And this is happening at the order of tens of milliseconds. This "structured silence" can only be explained by extremely tight coupling — entrainment — of some oscillatory mechanism in the brains of the two speakers. (For further research on this framework, see O'Dell, Neiminen & Lennes, 2012; Stivers et al., 2009).
Margaret Wilson has a guest post at Language Log that's questioning a recent article arguing that marmoset vocal interactions have a similar style of turn-taking. In the course of that argument she gives the following brief summary of the literature on human conversational turn-taking, which strongly implies that people are entrained to one another's rhythms: