Datasets - Mannheim Linked Data Catalog

Ontos News Portal

The Ontos News Portal extracts facts (objects as e. g. persons or organizations as well as relations between them, e. g. a person is working for an organization or living at a...
- text/turtle
- RDF
WordNet 3.0 (VU Amsterdam)

RDF conversion of Princeton's package:wordnet, version 3.0. With many links to package:w3c-wordnet, package:lexvo and the Dutch package:cornetto .
- HTML
- application/rdf+xml
- XML
- meta/void
Glottolog

Glottolog provides information about descriptive literature for all the world's languages. It also provides a language classification as well as knowledge bases for names,...
- zip:csv
- zip:bib
- example/rdf+xml
- application/x-ntriples
- RDF
- n3
GeoWordNet

GeoWordNet is a semantic resource built from the full integration of WordNet, GeoNames and the Italian part of MultiWordNet. GeoWordNet Public Dataset contains 3,698,238...
- meta/void
- RDF
- meta/sitemap
- CSV
- example/rdf+xml
- HTML
- wordnet
linked hypernyms

This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are...
- HTML
- application/x-ntriples
FiESTA

FiESTA (short for "Format for extensive spatiotemporal annotations") is a generic format for linguistic and behavioral annotations.
- text/turtle
American National Corpus - Open Portion
- jar
- ZIP
Ontologies of Linguistic Annotations (OLiA)

The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models...
- HTML
- rdf, owl
- application/x-zip-compressed
- example/rdf+xml
ConceptNet

WordNet-like concept network developed at MIT ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually...
- sql
- HTML
SemanticQuran

The Semantic Quran dataset is a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured...
- gz:ttl
- gz:ttl:owl
- PDF
Phonetics Information Base and Lexicon (PHOIBLE)

Phonetics Information Base and Lexicon (PHOIBLE) is a data set of phonological inventories with additional linguistic and non-linguistic information.
- datapkg/git
- HTML
- api/sparql
Multext-East

From the web site: Version 4 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Bulgarian,...
Chat Game corpus

A corpus resulting from an object arrangement game using a computer-mediated setting.
- text/turtle
Linked Old Germanic Dictionaries

Lexical resources (word lists, etymological dictionaries) for Germanic languages in different historical stages: pre 1100 (incl. Gothic, Old High German, Old English),...
- HTML
- zip:ttl
Leipzig Corpora Collection (LCC)

Deutscher Wortschatz contains data generated from newspapers and web resources that are publicly available. The data were collected per language and encompass statistics about...
- RDF
- api/sparql
- example/rdf+xml
WikiWord Thesaurus Data

About Overview: The WikiWord-Thesaurus is a multilingual Thesaurus derived from Wikipedia by extracting lexical and semantic information. It was originally developed for a...
ISOcat

ISO 12620 provides a framework for defining data categories compliant with the ISO/IEC 11179 family of standards. According to this model, each data category is assigned a...
- html, rdf, dcif
- example/rdf+xml
MExiCo

MExiCo (short for "Multimodal Experiment Corpora") is a data model for data collections containing multimodal linguistic and interaction annotations.
- text/turtle
- example/turtle
French TimeBank

The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language....
- ol
- iso-timeml
Syntactic Reference Corpus of Medieval French (SRCMF)

The SRCMF contains the 15 Old French texts with about 280000 words. It has a high-quality manual annotation, based on a linguistically adequate dependency grammar. Annotation...
- HTML
- example/rdf+xml

You can also access this registry using the API (see API Docs).

32 datasets found