Ad Space — Top Banner

Vocabulary Coverage Calculator

Estimate reading comprehension from vocabulary size using lexical frequency research.
See your percentage and how many unknown words appear per page of text.

Text Coverage

Why vocabulary size matters so much

In language learning, vocabulary size is the single biggest determinant of reading comprehension and listening ability. You can have perfect grammar but fail to understand text containing unfamiliar words. You can have weak grammar but understand a text if you know most of the words.

Paul Nation, Emeritus Professor at Victoria University of Wellington (New Zealand), is the world’s leading researcher on vocabulary in second language acquisition. His decades of research established the relationship between vocabulary size and lexical coverage of texts.

Lexical coverage = percentage of word tokens in a text that you already know. If a 100-word passage contains 95 known words and 5 unknown words, your lexical coverage is 95%.

Coverage thresholds and comprehension

The relationship between vocabulary size and text coverage is non-linear. Common everyday words are very frequent; rare words are very rare. This creates dramatic returns on the first 5,000 words and diminishing returns afterward.

Vocabulary size Text coverage Comprehension level
500 words 65-70% Very difficult; can’t follow most text
1,000 words 75-78% Heavy gaps; struggle with everyday content
2,000 words 84-87% Manageable with effort; 1 unknown per 7 words
3,000 words 90-92% Challenging; 1 unknown per 10 words
5,000 words 95-96% Comfortable reading; 1 unknown per 20 words
8,000 words 97-98% Near-native; 1 unknown per 50 words
10,000 words 98-99% Full native-level for everyday text
15,000+ words 99%+ Approaches educated native vocabulary

Why 98% is the critical threshold

Hu and Nation (2000) conducted a famous study on lexical coverage thresholds:

  • At 80% coverage: incomprehensible — too many unknowns
  • At 90% coverage: difficult — frequent dictionary use needed
  • At 95% coverage: possible — significant effort but feasible
  • At 98% coverage: comfortable — minimal dictionary use
  • At 99% coverage: essentially native — rare unknowns easily inferred

Their conclusion: 98% lexical coverage is the minimum threshold for reading without a dictionary. Below this, the cognitive load of looking up unknown words destroys comprehension flow.

For most languages, 98% coverage of everyday text requires approximately 8,000-9,000 word families.

The frequency principle

A small number of words account for most of any text. The math:

Word frequency rank Coverage of typical text
Most common 100 words ~50%
Most common 500 words 70-72%
Most common 1,000 words 75-78%
Most common 2,000 words 84-87%
Most common 3,000 words 90-92%
Most common 5,000 words 95-96%
Most common 10,000 words 98-99%
Most common 20,000 words 99.5%+

This is Zipf’s Law — the frequency of a word is approximately inversely proportional to its rank. The most common word (“the” in English) appears about twice as often as the second-most-common (“of”), three times as often as the third, etc.

For language learners, this means:

  • Learning the first 2,000 words gives you 85%+ coverage immediately
  • Learning the next 3,000 words (to 5,000) adds 8-10% more coverage
  • Learning 5,000 more words (to 10,000) adds 2-3% more coverage
  • Beyond 15,000 words, returns become very marginal

Native speaker vocabulary

Educated native English speakers know approximately:

  • 8,000-12,000 word families for casual conversation
  • 15,000-20,000 word families for adult literacy
  • 20,000-35,000+ word families for university-educated adults
  • 50,000+ words known passively in specialized fields

Children typically learn 1,000-2,000 word families per year through age 10. By age 18, most natives have ~20,000 active word families.

For comparison, second language learners reaching C2 (near-native) typically have 12,000-15,000 word families — sufficient for most contexts but rarely matching educated native vocabulary in specialized domains.

Word families vs word forms

Vocabulary counts can confuse — they may count different things:

Word: any single dictionary entry (“run”, “running”, “ran” = 3 words) Lemma: dictionary base form (“run”, “running”, “ran” = 1 lemma) Word family: lemma plus all derivatives (“run”, “runs”, “ran”, “running”, “runner”, “rerun” = 1 family)

Most vocabulary research uses word families. A 5,000-word-family vocabulary means knowing 5,000 root words and their derivatives — actually 15,000-25,000 distinct word forms.

Frequency lists for different languages

Reliable frequency lists exist for major languages:

English:

  • New General Service List (NGSL): 2,801 high-frequency words covering 92% of text
  • Academic Word List (AWL): 570 words specific to academic texts
  • General Service List (West, 1953): historic 2,000-word list

Spanish:

  • “5,000 Most Common Spanish Words” by Federación Española
  • Real Academia Española frequency studies

French:

  • “Tableau de Fréquence” - widely used in French language schools
  • Larousse frequency dictionaries

Mandarin Chinese:

  • HSK 1-6 vocabulary lists (2,500 words for HSK 6)
  • 5,000-word frequency lists for native literacy

Japanese:

  • JLPT N5-N1 vocabulary lists (~10,000 for N1)
  • 2,000 Joyo Kanji for daily literacy

Specialized domain vocabulary

Beyond general vocabulary, professional contexts require domain-specific words:

Domain Additional specialized words
Medicine 8,000-15,000 medical terms
Law 5,000-10,000 legal terms
Engineering 5,000-10,000 technical terms
Finance/business 3,000-5,000 specialized terms
Academic writing 570 AWL words plus discipline-specific
Daily news Standard 5,000+ vocabulary sufficient
Children’s literature 3,000-5,000 simpler core
Adult literary fiction 10,000-20,000 word families
Specialized scientific papers Field-specific 3,000-8,000

Reaching B2 in general language doesn’t mean you can read technical documents in your field. A separate ~3,000-5,000 domain vocabulary may be needed.

How to grow vocabulary efficiently

Research-supported methods, ranked by efficiency:

Spaced repetition flashcards (Anki, Memrise)

  • 20 new words/day sustainable for serious learners
  • 5,000 words in 8-12 months at this pace
  • Active recall, not passive review
  • Include context sentences

Extensive reading

  • Read material at 95%+ coverage (i+1 zone)
  • Look up only the most important unknown words
  • Read enormous quantities for incidental vocabulary
  • Graded readers for beginners; novels for advanced

Listening to comprehensible audio

  • Podcasts, audiobooks at your level
  • Native speech (with subtitles) at higher levels
  • 1-2 hours daily produces vocabulary growth

Active production

  • Writing journal entries that force vocabulary use
  • Speaking with natives who introduce new words
  • Translation exercises (especially target → native)

Frequency lists

  • Learn top 1,000 most-frequent words first
  • Then top 2,000, top 5,000
  • Lists exist for every major language

Vocabulary “decay” and review

Without use, vocabulary fades quickly:

  • After 1 week without exposure: 20-30% forgotten
  • After 1 month: 50-60% forgotten
  • After 6 months: 70-80% forgotten

But re-learning is much faster than initial learning. With spaced repetition reviews, vocabulary becomes essentially permanent at 5,000+ words.

Without consistent practice, vocabulary plateaus or declines. This is why language ability deteriorates when not used.

Reading vs listening vocabulary

Most learners have larger reading vocabulary than listening vocabulary:

  • Reading: you can pause, re-read, look up
  • Listening: continuous flow; instant recognition needed
  • Speed: native speakers ~150-180 words/minute; second language listening needs more familiarity

To improve listening, focus on:

  • Audio-only training (no subtitles)
  • Native-pace content
  • Repeated listening of same content
  • Shadowing exercises (repeat what you hear)

The “fluency illusion”

Some learners with 2,000-3,000 word vocabulary appear “fluent” in casual conversation. They’ve optimized for common topics:

  • Daily activities
  • Travel
  • Food
  • Weather
  • Family
  • Work (basic terms)

But:

  • Reading newspapers: difficult
  • Following nuanced TV: lost
  • Specialized topics: helpless
  • Educated native conversation: outclassed

True fluency requires 5,000+ word families minimum, ideally 8,000-10,000+.

Vocabulary testing

Several reliable vocabulary tests exist:

  • Vocabulary Size Test (Nation): estimates word-family knowledge
  • Lextale: yes/no test for English, Spanish, German, Dutch
  • Test Your Vocab: web-based estimate
  • HSK/JLPT/DELE: official exam scores indicate vocabulary level

A trained learner’s vocabulary can be estimated within 500-1,000 words from these tests.

Bottom line

Lexical coverage = percentage of words in a text that you know. The thresholds: 80% incomprehensible, 90% difficult, 95% feasible, 98% comfortable, 99% native-like. Paul Nation’s research established 98% as the minimum for reading without a dictionary. The first 1,000 most-frequent words cover ~75-78% of text (Zipf’s Law); the next 4,000 add only 10-15% more. Most educated native English speakers know 15,000-20,000 word families; second language learners reaching C2 typically have 12,000-15,000. Vocabulary grows fastest through spaced repetition + extensive reading + active production. The big payoff in coverage comes from the first 5,000 words; beyond 10,000 returns are increasingly marginal.


Ad Space — Bottom Banner

Embed This Calculator

Copy the code below and paste it into your website or blog.
The calculator will work directly on your page.