A monolingual comparable corpus
This action may take several minutes for large corpora, please wait.
Simple query:
Query types Context Text types  
Query type
Word form:
CQL: Default attribute:
Lemma filter
Window: tokens.
Lemma(s): of these items.
Text types
Information about corpora is being loaded. Please wait...


Any Token








Within cannot be on the first or last position.

Within cannot be on immediately followed or preceded by containing.

Within cannot be immediately followed or preceded by another within.

Within cannot immediately followed by beginning of structure.

Within cannot be immediately preceded or followed by invalid query.

Containing cannot be on the first or last position.

Containing cannot be on immediately followed or preceded by within.

Containing cannot be immediately followed or preceded by another containing.

Containing cannot be immediately preceded or followed by invalid query.

n has to be positive integer smaller then k.

k has to be positive integer bigger then n.

Number has to be empty or positive integer.

Position has to be positive integer.

Seek has to be positive integer.

Context has to be integer.

Context has to be integer.

Coll. number has to be positive integer.

Coll. number has to be positive integer.

Position has to be positive integer.

Value has to be positive integer.


Part-of-speech Tagset

ADVadverb (excluding -mente forms)
ADV:menteadverb ending in -mente
ARTPREpreposition + article
AUX:finfinite form of auxiliary
AUX:fin:clifinite form of auxiliary with clitic
AUX:gerugerundive form of auxiliary
AUX:geru:cligerundive form of auxiliary with clitic
AUX:infiinfinitival form of auxiliary
AUX:infi:cliinfinitival form of auxiliary with clitic
AUX:ppastpast participle of auxiliary
AUX:pprepresent participle of auxiliary
DET:demodemonstrative determiner
DET:indefindefinite determiner
DET:numnumeral determiner
DET:posspossessive determiner
DET:whwh determiner
NOCATnon-linguistic element
NPRproper noun
PRO:demodemonstrative pronoun
PRO:indefindefinite pronoun
PRO:numnumeral pronoun
PRO:perspersonal pronoun
PRO:posspossessive pronoun
PUNnon-sentence-final punctuation mark
SENTsentence-final punctuation mark
VER2:finfinite form of modal/causal verb
VER2:fin:clifinite form of modal/causal verb with clitic
VER2:gerugerundive form of modal/causal verb
VER2:geru:cligerundive form of modal/causal verb with clitic
VER2:infiinfinitival form of modal/causal verb
VER2:infi:cliinfinitival form of modal/causal verb with clitic
VER2:ppastpast participle of modal/causal verb
VER2:pprepresent participle of modal/causal verb
VER:finfinite form of verb
VER:fin:clifinite form of verb with clitic
VER:gerugerundive form of verb
VER:geru:cligerundive form of verb with clitic
VER:infiinfinitival form of verb
VER:infi:cliinfinitival form of verb with clitic
VER:ppastpast participle of verb
VER:ppast:clipast participle of verb with clitic
VER:pprepresent participle of verb
WHwh word

Document name format

Each document in COMPARE-IT corpora is a newspaper article.
Document names are 18 character unique strings that contain 5 fields separated by underscore in the following format:
[Collection name]_[Corpus country]_[Newspaper]_[Section]_[ID]

For example, document cmp_ch_gio_eco_005 belongs to COMPARE-IT Italian corpus of Switzerland (cmp_ch), to the newspaper Giornale del Popolo (gio), section Economics (eco) and its ID is 005.