A monolingual comparable corpus
CompareIT_it
This action may take several minutes for large corpora, please wait.

CompareIT_it

COMPARE-IT Italy

Counts
Tokens360265
Words303510
Sentences16640
Documents531
General info
LanguageItalian
EncodingUTF-8
Compiled10/30/2018 11:52:55
Lexicon sizes
word33114
pos50
lemma12673

Structures and attributes

Part-of-speech Tagset

ADJadjective
ADVadverb (excluding -mente forms)
ADV:menteadverb ending in -mente
ARTarticle
ARTPREpreposition + article
AUX:finfinite form of auxiliary
AUX:fin:clifinite form of auxiliary with clitic
AUX:gerugerundive form of auxiliary
AUX:geru:cligerundive form of auxiliary with clitic
AUX:infiinfinitival form of auxiliary
AUX:infi:cliinfinitival form of auxiliary with clitic
AUX:ppastpast participle of auxiliary
AUX:pprepresent participle of auxiliary
CHEche
CLIclitic
CONconjunction
DET:demodemonstrative determiner
DET:indefindefinite determiner
DET:numnumeral determiner
DET:posspossessive determiner
DET:whwh determiner
NEGnegation
NOCATnon-linguistic element
NOUNnoun
NPRproper noun
NUMnumber
PREpreposition
PRO:demodemonstrative pronoun
PRO:indefindefinite pronoun
PRO:numnumeral pronoun
PRO:perspersonal pronoun
PRO:posspossessive pronoun
PUNnon-sentence-final punctuation mark
SENTsentence-final punctuation mark
VER2:finfinite form of modal/causal verb
VER2:fin:clifinite form of modal/causal verb with clitic
VER2:gerugerundive form of modal/causal verb
VER2:geru:cligerundive form of modal/causal verb with clitic
VER2:infiinfinitival form of modal/causal verb
VER2:infi:cliinfinitival form of modal/causal verb with clitic
VER2:ppastpast participle of modal/causal verb
VER2:pprepresent participle of modal/causal verb
VER:finfinite form of verb
VER:fin:clifinite form of verb with clitic
VER:gerugerundive form of verb
VER:geru:cligerundive form of verb with clitic
VER:infiinfinitival form of verb
VER:infi:cliinfinitival form of verb with clitic
VER:ppastpast participle of verb
VER:ppast:clipast participle of verb with clitic
VER:pprepresent participle of verb
WHwh word

Document name format

Each document in COMPARE-IT corpora is a newspaper article.
Document names are 18 character unique strings that contain 5 fields separated by underscore in the following format:
[Collection name]_[Corpus country]_[Newspaper]_[Section]_[ID]

For example, document cmp_ch_gio_eco_005 belongs to COMPARE-IT Italian corpus of Switzerland (cmp_ch), to the newspaper Giornale del Popolo (gio), section Economics (eco) and its ID is 005.