site address: wordfrequency.info redirected to: www.wordfrequency.info

site title: Word frequency: based on one billion word COCA corpus

Our opinion (on Monday 22 June 2026 2:59:13 UTC):

- no comments

After content analysis of this website we propose the following hashtags:

Meta tags:
keywords=word frequency English American lists lemmas lemma genres genre n-grams collocates collocate;
description=Most accurate word frequency data for English. Only lists based on a large, recent, balanced corpora of English;

Headings (most frequently used words):

word, frequency, data,

Text of the page (most frequently used words):
the (29), word (20), data (15), #frequency (11), 2285 (9), and (8), 2283 (8), web (7), each (7), for (7), 115 (7), 372 (7), texts (6), rank (6), words (6), #corpus (6), are (6), genres (6), 2286 (6), coca (6), you (6), acad (5), news (5), mag (5), fic (5), spok (5), blog (5), freq (5), shows (5), top (5), 000 (5), squeak (5), pos (5), lemma (5), deduce (5), rehabilitate (5), english (5), that (4), not (4), most (4), 13171 (4), 2284 (4), 13168 (4), headline (4), 13164 (4), 1765 (4), more (4), dataset (3), billion (3), different (3), form (3), what (3), see (3), toad (3), naming (3), blue (3), collar (3), 13166 (3), subprime (3), academic (3), 100 (3), disp (3), 101 (3), 1766 (3), just (3), based (3), samples (3), purchase (3), iweb (3), compare (3), caps (2), dentzer (2), daymond (2), this (2), list (2), other (2), forms (2), formal (2), useful (2), below (2), clockwise (2), 2282 (2), 13173 (2), 13172 (2), stand (2), oats (2), wordfreq (2), lemfreq (2), lemmarank (2), 13167 (2), 343 (2), sub (2), 13165 (2), lemmas (2), only (2), eight (2), main (2), also (2), blogs (2), short (2), 107 (2), 104 (2), 330 (2), 172 (2), entries (2), have (2), datasets (2), can (2), free (2), one (2), such (2), generated (2), druggy, 100060, drina, 100059, drams, 100058, drainer, 100057, dowler, 100056, doster, 100055, dort, 100054, dorrie, 100053, doozies, 100052, doohickey, 100051, dollarization, 100050, docter, 100049, overs, 100048, dispersant, 100047, disgracefully, 100046, disasterous, 100045, dimitroff, 100044, dilator, 100043, diigo, 100042, digged, 100041, despatch, 100040, denys, 100039, 100038, demoed, 100037, delp, 100036, deductively, 100035, 100034, datatypes, 100033, final, 219, occurs, least, times, lemmatized, listed, separately, from, tagged, part, speech, which, common, again, show, percent, capitalized, determining, proper, noun, 13174, toads, 800, 1483, namings, 2254, squeaks, 213, 583, squeaking, 593, squeaked, 894, 13170, 13169, deduces, deducing, 136, deduced, 965, 1088, bluecollar, 2262, headlining, headlined, 943, 999, prime, 207, 2079, rehabilitates, rehabilitating, 452, rehabilitated, 749, 1033, third, another, but, nearly, magazine, sports, newspaper, finance, medical, reviews, personal, comedies, etc, piedmontese, 60032, phytic, 60031, selling, 60030, sheerly, 60029, shearling, 60028, shakti, 60027, exudative, 60026, saliency, 45009, ejector, 45008, endoplasmic, 45007, 112, energizing, 45006, fold, 45005, sugarless, 45004, 102, thawing, 45003, 319, strafe, 30011, 188, 323, wuss, 30010, 305, firebird, 30009, 134, 114, 331, peppery, 30008, 301, lyricism, 30007, 341, twisty, 30006, 126, 159, glutamate, 30005, 412, 393, 315, 125, 317, 1170, macaroni, 15030, 218, 131, 164, 230, 510, 1450, complicit, 15029, 214, 239, 498, 106, 232, 209, 1510, childlike, 15028, 135, 276, 206, 322, 292, 437, 1580, disservice, 15027, 185, 203, 252, 261, 384, 293, 1420, pretentious, 15026, 150, 127, 290, 250, 1637, despair, 15025, 167, 761, 432, 1209, redhead, 15024, 14948, 51981, 28196, 5197, 19383, 5554, 18521, 14248, 79105, 158028, director, 620, 9126, 17933, 22532, 29098, 14647, 26696, 19451, 18711, 95115, 158194, soon, 619, 911, 8432, 13766, 25394, 19934, 66877, 10313, 12884, 44697, 158511, mom, 618, 39817, 18954, 26220, 4764, 11870, 5171, 28366, 23426, 74656, 158588, source, 617, 22789, 17835, 21726, 10416, 16776, 17114, 23742, 28879, 82417, 159277, choice, 616, 1552, 5453, 5878, 23413, 28378, 57719, 15355, 21706, 74761, 159454, guess, 615, 49650, 18065, 20583, 7866, 15796, 4270, 25573, 17718, 81551, 159521, describe, 614, basic, unlike, pages, lets, across, know, informal, movies, subtitles, following, few, levels, when, access, four, use, whichever, ones, given, these, much, every, tenth, entry, well, copies, complete, site, contains, probably, large, date, balanced, between, many, contemporary, american, accurate, why, generate, meet, certain, criteria, because, does, very, poor, job, modeling, happens, actual, language, information, human, 5000, get, vocabulary, wordandphrase, grams, collocates, full, text, corpora, org, related, sites, portuguese, spanish, non, wordlists, faqs, convert, txt, excel, file, format, columns, using, overview, introduction,

Text of the page (random words):
word frequency based on one billion word coca corpus word frequency data introduction overview using the data file format columns convert txt excel faqs samples coca iweb compare compare to other wordlists compare iweb coca non english spanish portuguese related sites english corpora org full text data collocates n grams wordandphrase academic vocabulary get data free data 5000 purchase data purchase data iweb why not just have ai generate word frequency data such as the top 100 words that meet certain criteria it s because ai generated word frequency data does a very poor job of modeling what happens in actual human generated language such as the coca corpus more information this site contains what is probably the most accurate word frequency data for english the data is based on the one billion word corpus of contemporary american english coca the only corpus of english that is large up to date and balanced between many genres when you purchase the data you have access to four different datasets and you can use whichever ones are the most useful for you short samples are given below for each of these datasets and you can also see much more complete samples every tenth entry as well as free copies of the top 5 000 entries for each list 1 the most basic data shows the frequency of each of the top 60 000 words lemmas in each of the eight main genres in the corpus unlike word frequency data that is just based on web pages the coca data lets you see the frequency across genres to know if the word is more informal e g blogs or tv and movies subtitles or more formal e g academic the following are just a few entries of words at different frequency levels rank 1 60 000 rank lemma pos freq texts disp blog web tv m spok fic mag news acad 614 describe v 159521 81551 0 94 17718 25573 4270 15796 7866 20583 18065 49650 615 guess v 159454 74761 0 96 21706 15355 57719 28378 23413 5878 5453 1552 616 choice n 159277 82417 0 98 28879 23742 17114 16776 10416 21726 17835 22789 617 source n 158588 74656 0 95 23426 28366 5171 11870 4764 26220 18954 39817 618 mom n 158511 44697 0 95 12884 10313 66877 19934 25394 13766 8432 911 619 soon r 158194 95115 0 98 18711 19451 26696 14647 29098 22532 17933 9126 620 director n 158028 79105 0 94 14248 18521 5554 19383 5197 28196 51981 14948 15024 redhead n 1766 1209 0 90 96 101 432 86 761 167 95 28 15025 despair v 1766 1637 0 95 250 290 127 104 330 343 172 150 15026 pretentious j 1766 1420 0 94 293 384 261 93 252 203 185 95 15027 disservice n 1765 1580 0 94 437 292 53 322 44 206 276 135 15028 childlike j 1765 1510 0 94 209 232 94 106 498 239 214 172 15029 complicit j 1765 1450 0 93 510 330 83 230 99 164 131 218 15030 macaroni n 1765 1170 0 92 84 101 317 125 315 393 412 18 rank lemma pos freq texts disp blog web tv m spok fic mag news acad 30005 glutamate n 372 159 0 77 40 67 11 19 0 101 8 126 30006 twisty j 372 341 0 89 50 42 26 6 100 104 36 8 30007 lyricism n 372 301 0 87 35 49 5 19 23 67 67 107 30008 peppery j 372 331 0 86 17 16 5 15 68 114 134 3 30009 firebird n 372 69 0 15 15 7 21 1 305 8 11 4 30010 wuss n 372 323 0 90 55 36 188 21 35 28 8 1 30011 strafe v 372 319 0 89 24 53 32 24 79 83 62 15 45003 thawing n 115 102 0 81 5 11 2 10 18 26 21 22 45004 sugarless j 115 97 0 82 12 7 20 1 21 38 12 4 45005 fold up j 115 107 0 83 5 7 6 5 41 29 20 2 45006 energizing j 115 112 0 84 14 14 2 10 3 44 12 16 45007 endoplasmic j 115 65 0 64 5 36 4 0 0 14 0 56 45008 ejector n 115 93 0 80 10 9 27 3 8 41 7 10 45009 saliency n 115 76 0 69 3 9 0 6 0 1 1 95 rank lemma pos freq texts disp blog web tv m spok fic mag news acad 60026 exudative j 45 16 0 25 1 8 2 0 1 5 0 28 60027 shakti n 45 21 0 44 25 1 0 0 0 10 3 6 60028 shearling j 45 41 0 73 2 1 1 3 15 18 4 1 60029 sheerly r 45 45 0 77 3 8 1 6 9 8 3 7 60030 short selling n 45 37 0 67 4 11 1 1 0 10 14 4 60031 phytic j 45 19 0 48 10 16 0 0 0 4 0 15 60032 piedmontese j 45 31 0 68 2 13 1 0 5 6 10 8 2 another dataset shows the frequency not only in the eight main genres but also in nearly 100 sub genres magazine sports newspaper finance academic medical web reviews blogs personal or tv comedies etc 3 a third dataset shows the frequency of the word forms of the top 60 000 lemmas lemmarank lemma pos lemfreq wordfreq word form 13164 rehabilitate v 2286 1033 rehabilitate 13164 rehabilitate v 2286 749 rehabilitated 13164 rehabilitate v 2286 452 rehabilitating 13164 rehabilitate v 2286 52 rehabilitates 13165 subprime j 2286 2079 subprime 13165 subprime j 2286 207 sub prime 13166 headline v 2285 999 headline 13166 headline v 2285 943 headlined 13166 headline v 2285 343 headlining 13167 blue collar j 2285 2262 blue collar 13167 blue collar j 2285 23 bluecollar 13168 deduce v 2285 1088 deduce 13168 deduce v 2285 965 deduced 13168 deduce v 2285 136 deducing 13168 deduce v 2285 96 deduces lemmarank lemma pos lemfreq wordfreq word form 13169 oats n 2284 2284 oats 13170 stand up j 2284 2284 stand up 13171 squeak v 2283 894 squeaked 13171 squeak v 2283 593 squeaking 13171 squeak v 2283 583 squeak 13171 squeak v 2283 213 squeaks 13172 naming n 2283 2254 naming 13172 naming n 2283 29 namings 13173 toad n 2283 1483 toad 13173 toad n 2283 800 toads 13174 clockwise r 2282 2282 clockwise 4 a final dataset shows the top 219 000 words in the billion word corpus each word that occurs at least 20 times and in 5 different texts in this list the words are not lemmatized e g each form of a word is listed separately from other forms and the words are not tagged for part of speech for each word it shows in which genres it is the most common again to show formal and what percent are capitalized useful for determining proper noun see daymond and dentzer below word rank word freq texts caps blog web tv m spok fic mag news acad 100033 datatypes 89 20 0 18 8 74 0 0 0 0 0 7 100034 daymond 89 40 1 00 13 9 0 30 1 16 17 3 100035 deductively 89 68 0 03 4 18 3 0 0 3 1 60 100036 delp 89 25 1 00 2 5 2 0 52 10 12 6 100037 demoed 89 81 0 02 24 18 7 4 2 33 0 0 100038 dentzer 89 40 1 00 0 20 0 46 0 20 2 1 100039 denys 89 53 0 94 2 18 1 0 2 28 14 23 100040 despatch 89 50 0 17 6 38 6 0 17 12 1 9 100041 digged 89 33 0 04 6 69 6 1 2 1 1 0 100042 diigo 89 32 0 98 10 12 0 0 0 0 0 67 100043 dilator 89 25 0 02 0 2 5 1 6 5 0 70 100044 dimitroff 89 43 1 00 4 4 0 0 0 23 54 4 100045 disasterous 89 86 0 01 41 42 1 0 1 0 3 1 100046 disgracefully 89 83 0 12 18 11 7 10 11 18 5 8 word rank word freq texts caps blog web tv m spok fic mag news acad 100047 dispersant 89 43 0 07 9 25 8 26 0 6 3 12 100048 do overs 89 72 0 11 18 11 11 15 2 10 12 4 100049 docter 89 48 0 88 7 22 12 12 0 9 27 0 100050 dollarization 89 20 0 10 2 52 0 2 0 3 10 20 100051 doohickey 89 72 0 11 9 8 53 0 9 7 2 0 100052 doozies 89 80 0 01 21 9 13 10 9 14 7 1 100053 dorrie 89 26 1 00 1 2 7 2 64 2 11 0 100054 dort 89 39 0 82 1 6 20 7 5 2 16 29 100055 doster 89 41 1 00 1 7 0 0 0 6 62 13 100056 dowler 89 40 1 00 4 14 0 13 6 17 20 15 100057 drainer 89 71 0 12 3 11 14 4 40 9 3 5 100058 drams 89 56 0 51 12 2 4 1 11 16 5 38 100059 drina 89 42 1 00 1 1 4 14 33 12 19 5 100060 druggy 89 74 0 10 11 3 8 8 17 21 11 1

Thumbnail images (randomly selected): * Images may be subject to copyright.

Verified site has: 15 subpage(s). Do you want to verify them? Verify pages:

1-5

6-10

11-15

The site also has 7 references to external domain(s).

english-corpora.org	Verify	corpusdata.org	Verify	collocates.info	Verify
ngrams.info	Verify	wordandphrase.info	Verify	academicvocabulary.info	Verify
ucrel.lancs.ac.uk	Verify

The site also has 3 references to other resources (not html/xhtml )

www.wordfrequency.info/freqTxtToExcel.pdf

Verify

www.english-corpora.org/ai-llms/words.pdf

Verify

www.wordfrequency.info/sub_categories.txt

Verify

site address: wordfrequency.info redirected to: www.wordfrequency.info

site title: Word frequency: based on one billion word COCA corpus

word, frequency, data,

Header

Meta Tags

Load Info