It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.
— Andrej Karpathy (@karpathy) September 14, 2024
They…
Note that some time ago I pointed out that transformers would operate in the same way on strings of colored beads as they do on strings of word tokens.