How Some Words Get Forgetted

915K+ views   |   34K+ likes   |   483 dislikes   |  
12:24   |   Aug 14, 2018


How Some Words Get Forgetted
How Some Words Get Forgetted thumb How Some Words Get Forgetted thumb How Some Words Get Forgetted thumb


  • Hey smart people, Joe here.
  • In your whole life, how many books have you readed?
  • Sorry, I mean read. But not red, like the color. Read, like the
  • past tense of reeead. I just misspeaked! Mispoke. This… reminds me of a poem:
  • The verbs in English are a fright.How can we learn to read and write?Today we write,
  • but first we wrote;We bite our tongues, but never bote.This tale I tell; this tale I told;I
  • smell the flowers, but never smold.If I still do as once I did,Then do cows moo, as they
  • once mid?
  • That was penned by linguist Richard Lederer. And it’s proof that English is…weird.
  • We can blame all this confusion on irregular verbs.
  • Most verbs in English are “regular”. We make their past tense by adding a letter or
  • two on the end. They’re the difference between what happens now and what happened. But irregular
  • verbs are… well, not regular. Like the difference between what is and what was.
  • It’s cute when kids say “I breaked my toy.” But why do the rest of us say “broke”?
  • Because that’s just what everyone else says, right? We say it how it’s always been said.
  • But if we were thinking scientifically, we’d ask “How did it get this way?”
  • And I don’t know about you, but I prefer to think scientifically.
  • A biologist studies how things are by looking at how they used to be. We find fossils. But
  • how does one go about finding a fossil… of language?
  • Well luckily, people tend to write language down.
  • James Joyce’s Ulysses contains 265,222 words I totally counted, and didn’t just google
  • that.
  • Of those words the word “time” is the 74th most frequent, used 376 times.
  • The word “the” is the most frequently used: 14,877 times. We know that thanks to
  • another type of book: a “concordance”, an index of words that lists every instance
  • of every word in a written work.
  • There’s concordances for Thoreau’s Walden. He enjoyed the woods more than the forest.
  • The poetry of Edgar Allan Poe, where we find the raven more than Eldorado.
  • The writings of Descartes (in the original French), medieval recipes, even for the Bible.
  • A linguist named George Kingsley Zipf looked at these ranked lists of written language
  • and noticed something funny: Not all words are created equal. Some get used a lot, while
  • most almost never get used. Like how we say “the” all the time, but almost never say
  • “hallux” –the anatomical name for your big toe.
  • When it comes to a trait like height, most people are pretty close to average, while
  • the very tallest people? Are only maybe three times taller than the shortest. We don’t
  • vary very much. Height is… normal, it’s literally a “normal distribution.”
  • But Zipf realized words aren’t normal. Only a few words are very common, while most words
  • are very un-common.  For instance, in Ulysses, there are a thousand words used more than
  • 26 times, a hundred words used more than 265 times, but only ten words used more than 2,653
  • times. Another way to say this: the 10th most frequently used word is ten times more common
  • than the 100th most used. This peculiar trend is called Zipf’s Law.
  • What the…
  • I’m Tacky! It looks like you’re talking about Zipf’s Law. Did you know Vsauce already
  • did a video about that?
  • Yeah, it’s a great video… it’s actually what got me thinking about this! But I’m
  • gonna tell them more than just about Zipf's Law. I want to…
  • Would you like me to help you click over to that video…
  • No! I want you to watch THIS video. But if you DID watch Michael’s video on Vsauce,
  • perhaps by clicking a link in the description–LATER–you’d learn that Zipf’s Law applies to tons of
  • stuff: Like wealth, the population of cities, how long audiences clap, web traffic, the
  • size of holes in Swiss cheese, and–especially–language.
  • Wherever people look, newspapers, other languages, even randomly generated words, pretty much
  • everything in language obeys Zipf’s Law… well, everything except irregular verbs.
  • The 12 most common verbs in the English language are be, have, do, say, get, make, go, know,
  • take, see, come, and think. All irregular. But irregulars are a tiny fraction of all
  • verbs. English only has around 200 irregular verbs, a mere 3 percent of total verbs.
  • Instead of having a few commonly used irregular verbs, and lots of rare ones, like Zipf’s
  • Law predicts, almost all irregular verbs are common, and almost none are rare. Irregular
  • verbs… are a Zipf exception.
  • Where do irregular verbs come from? They’re the oldest ones we have. Around
  • four to six thousand years ago, people stretching from Europe to Western Asia spoke an ancient
  • language known as Proto-Indo-European. A staggering number of modern languages descend from this.
  • In PIE, the meaning and tense of words could be changed through a system where vowel sounds
  • were swapped. This system, the ablaut, can still be heard today in irregular verbs: Dig,
  • dug. Sing, sang, sung.
  • At the time, it was just one of many competing systems for changing verbs. But a bit later,
  • people speaking Proto-Germanic, a dialect descended from PIE, began adding verbs to
  • the language that didn’t fit these old patterns, so they invented a new way of signifying the
  • past tense by simply adding “-t” or “-ed” sounds to the end. Back then, these new “regular”
  • verbs were actually the exception.
  • As English grew from this Proto-Germanic language, newly added words became automatically regular,
  • they followed this new rule. And many older verbs began to switch from the old way to
  • the new. Like how long ago, the knight slew the dragon, but Beyoncé slayed at her last
  • show.
  • By the time the Old English story of Beowulf was written, three out of every four verbs
  • had been “regularized.” There /were/ a handful of verbs that moved in the other direction,
  • going from regular to irregular, but for every havED or makED that was had or made, there
  • are dozens of verbs like holp that got helped along. Regular was no longer the exception,
  • it was the rule.
  • So why did some irregular verbs go extinct, while others have survived? We all know that
  • language evolves, similar to how living things do, changing slightly over time. Could language
  • also undergo some kind of natural selection, is there something about a word that decides
  • whether it’s strong enough to live on?
  • We can test this! We just need a bigger data set than one book.
  • Using ancient grammar textbooks along with databases of millions of written words, researchers
  • tracked the evolution of 177 verbs that were irregular at the time Beowulf was written.
  • By the time Chaucer wrote Canterbury Tales, 32 of these had become regular. By the time
  • we hit modern English, 79 had regularized. The trait that predicted whether or not a
  • verb would become regular was how often we use it.
  • The most frequently used verbs tend to stay irregular. The most rarely used become regular.
  • Surprisingly, there was a sort of hidden Zipfian pattern there after all. If a verb is used
  • 100 times less frequently, it will regularize 10 times as fast. If they’re used 10,000
  • times less frequently, they’ll regularize 100 times as fast.
  • Researchers were able to estimate the likely lifespan of irregular verbs. A word like “stink”,
  • that’s used once every 10,000-100,000 words, has a 50% chance of regularizing within 700
  • years. Drink, a more common word, will take more like 5,000 years. We can find words today
  • in the process of going extinct. Do you tend to say dived or dove? Now is your last chance
  • to be newly wed. Pretty soon, you might be newly wedded. “Wed” is the irregular verb
  • we think will most likely disappear next.
  • This seems to be natural selection for language. Usage frequency affects a word’s survival,
  • and this makes sense. Regular verbs follow a rule. When we encounter a word we don’t
  • know, we can still figure out its past tense, without memorizing each and every one. Irregular
  • verbs on the other hand, have to be memorized. If we don’t use them, we lose them. As they’re
  • slowly forgotten, the “regular” rule is used in their place.
  • In 1980, after thirty years of work, IBM was able to digitize the complete works of Thomas
  • Aquinas. Today, this is something that you or anyone who knows how to code, can do in
  • a few minutes, with a few keystrokes. Concordances, the indexes of language that inspired Zipf
  • and others to ask these questions, no one really writes those anymore. Except… maybe
  • they do. It’s called “Google”. A search engine is basically a list of words and phrases,
  • from around the web, and the pages where they appear. Concordances were just analog Google.
  • The Google Books project now contains 25 million scanned books stretching back more than 500
  • years. No matter how many books you read, you could never read every book, or even a
  • fraction of them, in a lifetime. If you tried to read just the English-language books from
  • the year 2000 in this collection, at a reasonable pace, without stopping, it would take you
  • 80 years.
  • But what could we learn if we made computers read for us? The Google Ngram Viewer is a
  • search tool we can use to study how human culture has changed over the centuries. It
  • plots the frequency of strings of one or more words, by year, found in those millions of
  • digitized books.
  • We can see when people stopped talking about the Great War, and started calling it World
  • War I instead. “Evolution” was on the decline until “DNA”
  • came along. Einstein took physics to the next level.
  • People like pizza more than hamburgers, but less than ice cream.
  • What’s the most interesting one you can find?
  • Of course, as much data as we can pull from millions of digitized books, we haven’t
  • read them. A computer has. And while it gives us access to an immense amount of data, it
  • doesn’t tell us perhaps the most important part: The story.
  • Stay curious.
  • If you thought that Ngram was pretty cool … Sarah over at Art Assignment used it to look at how and different artists
  • got famous… or not. Link description to that one too.

Download subtitle


It’s the Great American Read!
Vote for America's favorite novel: https://to.pbs.org/2Jes2X5
↓↓↓ More info and sources below ↓↓↓

English is a confusing language for many reasons. But the irregular verbs might be the most confusing part. Why is “told” the past tense of “tell” but “smold” isn’t the past tense of “smell”? It turns out that the study of irregular verbs can teach us a lot about how languages evolve. This week, we look at how the era of Big Data is unlocking secrets behind the weirdness of words.

“The Zipf Mystery” - Vsauce /watch?v=fCn8zs912OE
“Trending Artists of the 17th Century” - The Art Assignment /watch?v=7eq3D9Q9lUA


Uncharted: Big Data as a Lens on Human Culture - Erez Aiden and Jean-Baptiste Michel https://amzn.to/2MLBEHF

Words and Rules - Steven Pinker https://amzn.to/2vKL1kf

Lieberman, Erez, et al. "Quantifying the evolutionary dynamics of language." Nature 449.7163 (2007): 713.

Michel, Jean-Baptiste, et al. "Quantitative analysis of culture using millions of digitized books." Science (2010): 1199644.

Hanley, M. L., Joos, M., & Fein, T. (1937). Word index to James Joyce's Ulysses. Madison: University of Wisconsin Press.


Twitter: @DrJoeHanson @okaytobesmart
Instagram: @DrJoeHanson
Merch: https://store.dftba.com/collections/its-okay-to-be-smart
Facebook: http://www.facebook.com/itsokaytobesmart


It’s Okay To Be Smart
PO Box 303356
Austin, TX 78703


It’s Okay To Be Smart is hosted by Joe Hanson, Ph.D.
Writer: Joe Hanson
Creative Director/Director: David Schulte
Editor/animator: Derek Borsheim
Producers: Stephanie Noone and Amanda Fox

Produced by PBS Digital Studios
Music via APM
Stock images from Shutterstock http://www.shutterstock.com

Trending videos