corpus
A large, organized collection of texts used for study.
A corpus is a large, organized collection of written or spoken texts gathered for a specific purpose. The word comes from Latin, where it meant “body,” which makes sense since a corpus is like the body of all available examples of something.
Linguists create a corpus when they collect thousands of books, articles, and recordings to study how people actually use language. For instance, a corpus of children's literature might include every Newbery Medal winner ever published, allowing researchers to analyze which words appear most frequently in books for young readers. A corpus of Shakespeare's works would contain all his plays and sonnets, helping scholars study his vocabulary and writing patterns.
Scientists use corpora too. A medical corpus might contain millions of patient records (with personal information removed) so researchers can spot patterns in diseases. A corpus of historical newspapers helps historians understand how people talked and thought in different time periods.
The plural of corpus is corpora. When you search online dictionaries or translation tools, you're often using technology powered by massive corpora that show how words are used in millions of real sentences. A corpus transforms random texts into organized knowledge, like turning a pile of puzzle pieces into a complete picture of how language works.