Perplexing Language MODELS
https://arxiv.org/pdf/2303.18223.pdf
Theory and Principle. To understand the underlying work-
ing mechanism of LLMs, one of the greatest mysteries
is how information is distributed, organized, and utilized
through the very large, deep neural network. It is important
to reveal the basic principles or elements that establish the
foundation of the abilities of LLMs. In particular, scaling
seems to play an important role in increasing the capacity
of LLMs [31, 55, 59]. It has been shown that some emergent
abilities would occur in an unexpected way (a sudden per-
formance leap) when the parameter scale of language mod-
els increases to a critical size (e.g., 10B) [31, 33], typically in-
cluding in-context learning, instruction following, and step-
by-step reasoning. These emergent abilities are fascinating
yet perplexing: when and how they are obtained by LLMs
are not yet clear. Recent studies either conduct extensive
experiments for investigating the effect of emergent abilities
and the contributing factors to such abilities [250, 267, 451],
or explain some specific abilities with existing theoretical
frameworks [60, 261]. An insightful technical post also spe-
cially discusses this topic [47], taking the GPT-series models
as the target. While, more formal theories and principles
to understand, characterize, and explain the abilities or
behaviors of LLMs are still missing. Since emergent abilities
bear a close analogy to phase transitions in nature [31, 58],
cross-discipline theories or principles (e.g., whether LLMs
can be considered as some kind of complex systems) might
be useful to explain and understand the behaviors of LLMs.
These fundamental questions are worth exploring for the
research community, which are important for developing
the next-generation LLMs.