MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis

Sentence-layer annotation represents the most coarse-grained annotation in this corpus. We adhere to definitions of objectivity and subjectivity introduced in (Wiebe et al., 2005). Additionally, we followed guidelines drawn from (Balahur & Steinberger, 2009). Their clarifications proved to be quite effective, raising inter-annotator agreement in a sentence-layer polarity annotation task from about 50% to >80%. All sentences were annotated in two dimensions.

The first dimension covers the factual nature of the sentence, i.e. whether it provides objective information or if it is intended to express an opinion, belief or subjective argument. Therefore, it is either objective or subjective. The second dimension covers the semantic orientation of the sentence, i.e. its polarity. Thus, it is either positive, negative or neutral.

In the second layer, we model the contextually interpreted sentiments on the levels of words and NP/PP phrases. That is, the annotation decisions are based on the meaning of the words in the context of the sentence.

Word sentiment markers: The sentiments on the level of individual words are expressed by single character markers added at the end of the words.

A word might be positive (+), negative(-), neutral(empty), a shifter (~), an intensifier (^), or a diminisher (%).

If a word ends with a hyphen (e.g., "auf beziehungs-_ bzw. partnerschaftliche Probleme-", an underscore is added to the word in order to prevent missinterpretations of the hyphen as a negative marker.

Currently, only words that are part of an NP/PP are marked with sentiment markers. Annotated words are nouns, adjectives, negation particles, prepositions, adverbs.

The world level annotation was done by 3 persons individually. The individual results were harmonized into a single reference annotation.

Phrase level markers:

Each phrase is marked up textually by brackets, e.g. "[auf beziehungs-_ bzw. partnerschaftliche Probleme-]". The type of a phrase (NP/PP) is not written to the brackets. We follow largely the annotation model of TIGER for structuring embedded NPs and PPs.

Currently, the following limitations with regard to TIGER exist: (1) Adjectival phrases are not marked up (2) Relative or infinitival sentences are not included in NPs/PPs if they appear at the end of a phrase or if the are discontiguous. We do not only annotate the phrases which immediately contain words that are marked up as polar. Any dependent subphrase (NP/PP) is integrated into all its dominating NPs/PPs, e.g. "[Die tieferen Ursachen [der Faszination+]]". Dependent subphrases without any polar words are also included, however, there is no internal bracketing for them, e.g. "[hohe+ Ansprüche an Qualität und Lage]"

At the level of phrases, we distinguish the following markers: positive (+), negative (-), neutral(0), bipolar (#). The category 'bipolar' is used mainly for coordinations where negative and positive sentiments of something are kept in balance by the writer. This is quite common for a lot of binomial constructions as "Krieg und Frieden".

Data and Resources

Diagram of the MLSA linked data modelpng
Explore
- Preview
- Download
RDF dumptext/n3
2 layers of MLSA corpus, languages linked to wals, glottolog and lexvo, words...
Explore
- More information
- Go to resource
SPARQL endpointapi/sparql
Explore
- More information
- Go to resource
Example sentence resourceexample/turtle
Explore
- More information
- Go to resource

Additional Info

Field	Value
Source	http://iggsa.sentimental.li/index.php/downloads/
Author	Interest Group on German Sentiment Analysis (IGGSA)
links:glottolog-langdoc	2373
links:lexvo	2373
links:wals	2373
links:wiktionary-dbpedia-org	1371
shortname	MLSA
triples	21000