A monolingual comparable corpus
CompareIT_it
This action may take several minutes for large corpora, please wait.

Word list options


Corpus:
Search attribute:
. Value of n: from to
Filter options:
Filter word list by:Regular expression:
Minimum frequency:
Maximum frequency: (0 = no maximum frequency)
Whitelist:
Blacklist: format
Word list whitelists and blacklists must be plain text (.txt), encoded in UTF-8, with one item per line. The items must correspond to the selected attribute, so, eg, if 'lemma' is selected from the attribute menu, then the list should be a list of lemmas. We use exact matching, not regular-expression matching, for file input.
Output options:
Frequency figures:
Output type:
Reference (sub)corpus
Prefer: rare words
common words

You can select one or more output attributes. Please note that this option can be time-consuming.

Part-of-speech Tagset

ADJadjective
ADVadverb (excluding -mente forms)
ADV:menteadverb ending in -mente
ARTarticle
ARTPREpreposition + article
AUX:finfinite form of auxiliary
AUX:fin:clifinite form of auxiliary with clitic
AUX:gerugerundive form of auxiliary
AUX:geru:cligerundive form of auxiliary with clitic
AUX:infiinfinitival form of auxiliary
AUX:infi:cliinfinitival form of auxiliary with clitic
AUX:ppastpast participle of auxiliary
AUX:pprepresent participle of auxiliary
CHEche
CLIclitic
CONconjunction
DET:demodemonstrative determiner
DET:indefindefinite determiner
DET:numnumeral determiner
DET:posspossessive determiner
DET:whwh determiner
NEGnegation
NOCATnon-linguistic element
NOUNnoun
NPRproper noun
NUMnumber
PREpreposition
PRO:demodemonstrative pronoun
PRO:indefindefinite pronoun
PRO:numnumeral pronoun
PRO:perspersonal pronoun
PRO:posspossessive pronoun
PUNnon-sentence-final punctuation mark
SENTsentence-final punctuation mark
VER2:finfinite form of modal/causal verb
VER2:fin:clifinite form of modal/causal verb with clitic
VER2:gerugerundive form of modal/causal verb
VER2:geru:cligerundive form of modal/causal verb with clitic
VER2:infiinfinitival form of modal/causal verb
VER2:infi:cliinfinitival form of modal/causal verb with clitic
VER2:ppastpast participle of modal/causal verb
VER2:pprepresent participle of modal/causal verb
VER:finfinite form of verb
VER:fin:clifinite form of verb with clitic
VER:gerugerundive form of verb
VER:geru:cligerundive form of verb with clitic
VER:infiinfinitival form of verb
VER:infi:cliinfinitival form of verb with clitic
VER:ppastpast participle of verb
VER:ppast:clipast participle of verb with clitic
VER:pprepresent participle of verb
WHwh word

Document name format

Each document in COMPARE-IT corpora is a newspaper article.
Document names are 18 character unique strings that contain 5 fields separated by underscore in the following format:
[Collection name]_[Corpus country]_[Newspaper]_[Section]_[ID]

For example, document cmp_ch_gio_eco_005 belongs to COMPARE-IT Italian corpus of Switzerland (cmp_ch), to the newspaper Giornale del Popolo (gio), section Economics (eco) and its ID is 005.