Vocabulary Coverage Calculator
Estimate reading comprehension from vocabulary size using lexical frequency research.
See your percentage and how many unknown words appear per page of text.
Why vocabulary size matters so much
In language learning, vocabulary size is the single biggest determinant of reading comprehension and listening ability. You can have perfect grammar but fail to understand text containing unfamiliar words. You can have weak grammar but understand a text if you know most of the words.
Paul Nation, Emeritus Professor at Victoria University of Wellington (New Zealand), is the world’s leading researcher on vocabulary in second language acquisition. His decades of research established the relationship between vocabulary size and lexical coverage of texts.
Lexical coverage = percentage of word tokens in a text that you already know. If a 100-word passage contains 95 known words and 5 unknown words, your lexical coverage is 95%.
Coverage thresholds and comprehension
The relationship between vocabulary size and text coverage is non-linear. Common everyday words are very frequent; rare words are very rare. This creates dramatic returns on the first 5,000 words and diminishing returns afterward.
| Vocabulary size | Text coverage | Comprehension level |
|---|---|---|
| 500 words | 65-70% | Very difficult; can’t follow most text |
| 1,000 words | 75-78% | Heavy gaps; struggle with everyday content |
| 2,000 words | 84-87% | Manageable with effort; 1 unknown per 7 words |
| 3,000 words | 90-92% | Challenging; 1 unknown per 10 words |
| 5,000 words | 95-96% | Comfortable reading; 1 unknown per 20 words |
| 8,000 words | 97-98% | Near-native; 1 unknown per 50 words |
| 10,000 words | 98-99% | Full native-level for everyday text |
| 15,000+ words | 99%+ | Approaches educated native vocabulary |
Why 98% is the critical threshold
Hu and Nation (2000) conducted a famous study on lexical coverage thresholds:
- At 80% coverage: incomprehensible — too many unknowns
- At 90% coverage: difficult — frequent dictionary use needed
- At 95% coverage: possible — significant effort but feasible
- At 98% coverage: comfortable — minimal dictionary use
- At 99% coverage: essentially native — rare unknowns easily inferred
Their conclusion: 98% lexical coverage is the minimum threshold for reading without a dictionary. Below this, the cognitive load of looking up unknown words destroys comprehension flow.
For most languages, 98% coverage of everyday text requires approximately 8,000-9,000 word families.
The frequency principle
A small number of words account for most of any text. The math:
| Word frequency rank | Coverage of typical text |
|---|---|
| Most common 100 words | ~50% |
| Most common 500 words | 70-72% |
| Most common 1,000 words | 75-78% |
| Most common 2,000 words | 84-87% |
| Most common 3,000 words | 90-92% |
| Most common 5,000 words | 95-96% |
| Most common 10,000 words | 98-99% |
| Most common 20,000 words | 99.5%+ |
This is Zipf’s Law — the frequency of a word is approximately inversely proportional to its rank. The most common word (“the” in English) appears about twice as often as the second-most-common (“of”), three times as often as the third, etc.
For language learners, this means:
- Learning the first 2,000 words gives you 85%+ coverage immediately
- Learning the next 3,000 words (to 5,000) adds 8-10% more coverage
- Learning 5,000 more words (to 10,000) adds 2-3% more coverage
- Beyond 15,000 words, returns become very marginal
Native speaker vocabulary
Educated native English speakers know approximately:
- 8,000-12,000 word families for casual conversation
- 15,000-20,000 word families for adult literacy
- 20,000-35,000+ word families for university-educated adults
- 50,000+ words known passively in specialized fields
Children typically learn 1,000-2,000 word families per year through age 10. By age 18, most natives have ~20,000 active word families.
For comparison, second language learners reaching C2 (near-native) typically have 12,000-15,000 word families — sufficient for most contexts but rarely matching educated native vocabulary in specialized domains.
Word families vs word forms
Vocabulary counts can confuse — they may count different things:
Word: any single dictionary entry (“run”, “running”, “ran” = 3 words) Lemma: dictionary base form (“run”, “running”, “ran” = 1 lemma) Word family: lemma plus all derivatives (“run”, “runs”, “ran”, “running”, “runner”, “rerun” = 1 family)
Most vocabulary research uses word families. A 5,000-word-family vocabulary means knowing 5,000 root words and their derivatives — actually 15,000-25,000 distinct word forms.
Frequency lists for different languages
Reliable frequency lists exist for major languages:
English:
- New General Service List (NGSL): 2,801 high-frequency words covering 92% of text
- Academic Word List (AWL): 570 words specific to academic texts
- General Service List (West, 1953): historic 2,000-word list
Spanish:
- “5,000 Most Common Spanish Words” by Federación Española
- Real Academia Española frequency studies
French:
- “Tableau de Fréquence” - widely used in French language schools
- Larousse frequency dictionaries
Mandarin Chinese:
- HSK 1-6 vocabulary lists (2,500 words for HSK 6)
- 5,000-word frequency lists for native literacy
Japanese:
- JLPT N5-N1 vocabulary lists (~10,000 for N1)
- 2,000 Joyo Kanji for daily literacy
Specialized domain vocabulary
Beyond general vocabulary, professional contexts require domain-specific words:
| Domain | Additional specialized words |
|---|---|
| Medicine | 8,000-15,000 medical terms |
| Law | 5,000-10,000 legal terms |
| Engineering | 5,000-10,000 technical terms |
| Finance/business | 3,000-5,000 specialized terms |
| Academic writing | 570 AWL words plus discipline-specific |
| Daily news | Standard 5,000+ vocabulary sufficient |
| Children’s literature | 3,000-5,000 simpler core |
| Adult literary fiction | 10,000-20,000 word families |
| Specialized scientific papers | Field-specific 3,000-8,000 |
Reaching B2 in general language doesn’t mean you can read technical documents in your field. A separate ~3,000-5,000 domain vocabulary may be needed.
How to grow vocabulary efficiently
Research-supported methods, ranked by efficiency:
Spaced repetition flashcards (Anki, Memrise)
- 20 new words/day sustainable for serious learners
- 5,000 words in 8-12 months at this pace
- Active recall, not passive review
- Include context sentences
Extensive reading
- Read material at 95%+ coverage (i+1 zone)
- Look up only the most important unknown words
- Read enormous quantities for incidental vocabulary
- Graded readers for beginners; novels for advanced
Listening to comprehensible audio
- Podcasts, audiobooks at your level
- Native speech (with subtitles) at higher levels
- 1-2 hours daily produces vocabulary growth
Active production
- Writing journal entries that force vocabulary use
- Speaking with natives who introduce new words
- Translation exercises (especially target → native)
Frequency lists
- Learn top 1,000 most-frequent words first
- Then top 2,000, top 5,000
- Lists exist for every major language
Vocabulary “decay” and review
Without use, vocabulary fades quickly:
- After 1 week without exposure: 20-30% forgotten
- After 1 month: 50-60% forgotten
- After 6 months: 70-80% forgotten
But re-learning is much faster than initial learning. With spaced repetition reviews, vocabulary becomes essentially permanent at 5,000+ words.
Without consistent practice, vocabulary plateaus or declines. This is why language ability deteriorates when not used.
Reading vs listening vocabulary
Most learners have larger reading vocabulary than listening vocabulary:
- Reading: you can pause, re-read, look up
- Listening: continuous flow; instant recognition needed
- Speed: native speakers ~150-180 words/minute; second language listening needs more familiarity
To improve listening, focus on:
- Audio-only training (no subtitles)
- Native-pace content
- Repeated listening of same content
- Shadowing exercises (repeat what you hear)
The “fluency illusion”
Some learners with 2,000-3,000 word vocabulary appear “fluent” in casual conversation. They’ve optimized for common topics:
- Daily activities
- Travel
- Food
- Weather
- Family
- Work (basic terms)
But:
- Reading newspapers: difficult
- Following nuanced TV: lost
- Specialized topics: helpless
- Educated native conversation: outclassed
True fluency requires 5,000+ word families minimum, ideally 8,000-10,000+.
Vocabulary testing
Several reliable vocabulary tests exist:
- Vocabulary Size Test (Nation): estimates word-family knowledge
- Lextale: yes/no test for English, Spanish, German, Dutch
- Test Your Vocab: web-based estimate
- HSK/JLPT/DELE: official exam scores indicate vocabulary level
A trained learner’s vocabulary can be estimated within 500-1,000 words from these tests.
Bottom line
Lexical coverage = percentage of words in a text that you know. The thresholds: 80% incomprehensible, 90% difficult, 95% feasible, 98% comfortable, 99% native-like. Paul Nation’s research established 98% as the minimum for reading without a dictionary. The first 1,000 most-frequent words cover ~75-78% of text (Zipf’s Law); the next 4,000 add only 10-15% more. Most educated native English speakers know 15,000-20,000 word families; second language learners reaching C2 typically have 12,000-15,000. Vocabulary grows fastest through spaced repetition + extensive reading + active production. The big payoff in coverage comes from the first 5,000 words; beyond 10,000 returns are increasingly marginal.