Version 0.4, 08/30/2014
Max Schmachtenberg
Christian Bizer
Heiko Paulheim


This document provides statistics about the structure and content of the crawlable subset of the Linked Open Data (LOD) cloud in April 2014 and analyzes to which extent crawlable Linked Data sources implement the Linked Data best practices.

This document updates the findings of the original State of the LOD Cloud report published in 2011. The 2011 report was based on information that was provided by the dataset publishers themselves via the datahub.io Linked Data catalog. This report is based on a crawl of the Web of Linked Data conducted in April 2014. This document is a shortend version of the ISWC2014 paper  Adoption of the Linked Data Best Practices in Different Topical Domains. The paper provides more details about the crawling process as well as a deeper discussion of the results. In contrast, this document links the statistics to the Mannheim Linked Data catalog and enables the reader to drill-down and explore information about the datasets behind each statistical result.

Contents

1. The Linked Data Crawl

In order to discover as many Linked Data sources as possible, we have crawled a snapshot of the Linked Data Web. We used the LDSpider linked data crawler. We seeded LDSpider with 560 thousand seed URIs originating from the datahub.io dataset catalog, the Billion Triple Challenge 2012 dataset as well as from datasets being advertised on the public-lod@w3.org mailing list. With those seeds, we performed crawls during April 2014 to retrieve entities from every dataset using a breadth-first crawling strategy. Datasets not allowing crawlers were not included in our corpus. Altogether, we crawled 900,129 documents describing 8,038,396 resources. In general, we assume that all data from one PLD belongs to one dataset. As an exception to this rule, we partitioned datasets in the datahub.io lod-cloud group where multiple datasets per PLD were defined. Furthermore, we removed datasets that only contained vocabulary definitions. We provide the crawled data for download, so that all results presented in the following can be verified.

2. Linked Data by Domain

Linked Data technologies are being using to share data covering a wide range of different topical domains. The table below gives an overview of the topical domains of the 1014 datasets that were disocvered by our crawl.

Datasets by topical domain.
Topic Datasets %
Government 183 18.05%
Publications 96 9.47%
Life sciences 83 8.19%
User-generated content 48 4.73%
Cross-domain 41 4.04%
Media 22 2.17%
Geographic 21 2.07%
Social web 520 51.28%
Total 1014

 

3. Crawlable LOD Cloud Diagram

The image below gives an overview of the linkage relationships between datasets. Clicking the image will take you to an image map, which allows you to explore the metadata for each dataset in the Mannheim Linked Data Catalog.

Crawlale LOD Cloud in April 2014

4. Best Practices

The central idea of Linked Data is that data publishers support applications in discovering and integrating data by complying to a set of best practices in the areas of linking, vocabulary usage, and metadata provision. This section analyses to which extent the crawled data sources implement these best practices.

4.1 Interlinking Best Practice

By setting RDF links, data providers connect their datasets into a single global data graph which can be navigated by applications and enables the discovery of additional data by following RDF links.

In total, 56.11% of the crawled datasets link to at least one other dataset. The remaining datasets are only targets of RDF links.

The table below categorizes the datasets by the number of other data sources that are target of outgoing RDF links.

Categorization by number of linked datasets
Number of linked datasets Number of datasets
more than 10 79 (7.79%)
6 to 10 81 (7.99%)
5 31 (3.06%)
4 42 (4.14%)
3 54 (5.33%)
2 106 (10.45%)
1 176 (17.36%)
0 445 (43.89%)

The tables below show the ten datasets with the highest in- and outdegrees.

Datasets with the ten highest indegrees
Dataset Category Indegree
dbpedia.org cross-domain 207
geonames.org geographic 141
w3.org cross-domain 117
quitter.se social web 64
status.net social web 63
postblue.info social web 56
skilledtests.com social web 55
reference.data.gov.uk government 45
data.semanticweb.org publications 44
fragdev.com social web 41
lexvo.org cross-domain 37
Datasets with the ten highest outdegrees
Dataset Category Outdegree
bibsonomy.org publications 91
semanlink.net user-generated content 88
deri.org social web 71
harth.org social web 68
quitter.se social web 67
semanticweb.org user-generated content 64
skilledtests.com social web 60
postblue.info social web 59
status.net social web 47
w3.org crossdomain 45
data.semanticweb.org publications 45

The table below list the most frequently used linking predicates for each topical domain.

Three most used predicates for interlinking by category.
Category Predicate Usage Category Predicate Usage
social web foaf:knows 60.27% life sciences owl:sameAs 52.17%
foaf:based_near 35.69% rdfs:seeAlso 48.48%
sioc:follows 34.34% dct:creator 21.74%
publications owl:sameAs 32.20% government dct:publisher 47.57%
dct:language 25.42% dct:spatial 30.10%
rdfs:seeAlso 23.73% owl:sameAs 24.27%
user-generated content owl:sameAs 53.13% geographic owl:sameAs 64.29%
rdfs:seeAlso 21.88% skos:exactMatch 21.43%
dct:source 18.75% skos:closeMatch 21.43%
media owl:sameAs 81.25% cross-domain owl:sameAs 80.00%
rdfs:seeAlso 18.75% rdfs:seeAlso 52.00%
foaf:based near 18.75% dct:creator 20.00%

4.2 Vocabulary Best Practices

In order to make it easier for applications to understand Linked Data, data providers should use terms from widely deployed vocabularies to represent data wherever possible.

4.2.1 Usage of Proprietary Vocabularies

We define a vocabulary as non-proprietary if there are at least two datasets using the vocabulary.

Of all 649 vocabularies encountered, 378 (58.24%) vocabularies can are proprietary according to our definition, while 271 (41.76%) are non-proprietary.

In total, 241 (23.17%) datasets use proprietary vocabularies, while nearly all (99.87%) datasets use non-proprietary vocabularies.

The table below lists the non-proprietary vocabularies that are used by at least 1% of the datasets and provides links to the data sources that use a specific vocabulary.

Vocabulary PrefixVocabulary LinkNumber of DatasetsDatasets that use the Vocabulary
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#996 (98.22%)Datasets that use rdf
rdfshttp://www.w3.org/2000/01/rdf-schema#736 (72.58%)Datasets that use rdfs
foafhttp://xmlns.com/foaf/0.1/701 (69.13%)Datasets that use foaf
dctermhttp://purl.org/dc/terms/568 (56.02%)Datasets that use dcterm
owlhttp://www.w3.org/2002/07/owl#370 (36.49%)Datasets that use owl
poshttp://www.w3.org/2003/01/geo/wgs84_pos#254 (25.05%)Datasets that use pos
siochttp://rdfs.org/sioc/ns#179 (17.65%)Datasets that use sioc
adminhttp://webns.net/mvcb/157 (15.48%)Datasets that use admin
skoshttp://www.w3.org/2004/02/skos/core#143 (14.10%)Datasets that use skos
voidhttp://rdfs.org/ns/void#137 (13.51%)Datasets that use void
biohttp://purl.org/vocab/bio/0.1/125 (12.33%)Datasets that use bio
cubehttp://purl.org/linked-data/cube#114 (11.24%)Datasets that use cube
rsshttp://purl.org/rss/1.0/99 (9.76%)Datasets that use rss
w3conhttp://www.w3.org/2000/10/swap/pim/contact#77 (7.59%)Datasets that use w3con
doaphttp://usefulinc.com/ns/doap#65 (6.41%)Datasets that use doap
bibohttp://purl.org/ontology/bibo/62 (6.11%)Datasets that use bibo
dcathttp://www.w3.org/ns/dcat#59 (5.82%)Datasets that use dcat
certhttp://www.w3.org/ns/auth/cert#51 (5.03%)Datasets that use cert
sdmxdhttp://purl.org/linked-data/sdmx/2009/dimension#48 (4.73%)Datasets that use sdmxd
airporthttp://www.daml.org/2001/10/html/airport-ont#45 (4.44%)Datasets that use airport
wothttp://xmlns.com/wot/0.1/44 (4.34%)Datasets that use wot
contenthttp://purl.org/rss/1.0/modules/content/43 (4.24%)Datasets that use content
cchttp://creativecommons.org/ns#39 (3.85%)Datasets that use cc
refhttp://purl.org/vocab/relationship/36 (3.55%)Datasets that use ref
wnhttp://xmlns.com/wordnet/1.6/33 (3.25%)Datasets that use wn
tsiochttp://rdfs.org/sioc/types#33 (3.25%)Datasets that use tsioc
vcard2006http://www.w3.org/2006/vcard/ns#29 (2.86%)Datasets that use vcard2006
sdmxahttp://purl.org/linked-data/sdmx/2009/attribute#29 (2.86%)Datasets that use sdmxa
gnhttp://www.geonames.org/ontology#27 (2.66%)Datasets that use gn
swchttp://data.semanticweb.org/ns/swc/ontology#27 (2.66%)Datasets that use swc
dctypeshttp://purl.org/dc/dcmitype/26 (2.56%)Datasets that use dctypes
hartigprovhttp://purl.org/net/provenance/ns#26 (2.56%)Datasets that use hartigprov
sdhttp://www.w3.org/ns/sparql-service-description#25 (2.47%)Datasets that use sd
openhttp://open.vocab.org/terms/22 (2.17%)Datasets that use open
provhttp://www.w3.org/ns/prov#21 (2.07%)Datasets that use prov
resourcehttp://purl.org/vocab/resourcelist/schema#20 (1.97%)Datasets that use resource
rdahttp://rdvocab.info/elements/19 (1.87%)Datasets that use rda
prvthttp://purl.org/net/provenance/types#18 (1.78%)Datasets that use prvt
c4dmhttp://purl.org/NET/c4dm/event.owl#18 (1.78%)Datasets that use c4dm
grhttp://purl.org/goodrelations/v1#17 (1.68%)Datasets that use gr
rsahttp://www.w3.org/ns/auth/rsa#17 (1.68%)Datasets that use rsa
aiisohttp://purl.org/vocab/aiiso/schema#17 (1.68%)Datasets that use aiiso
pingbackhttp://purl.org/net/pingback/16 (1.58%)Datasets that use pingback
timehttp://www.w3.org/2006/time#14 (1.38%)Datasets that use time
orghttp://www.w3.org/ns/org#14 (1.38%)Datasets that use org
wdrshttp://www.w3.org/2007/05/powder-s#13 (1.28%)Datasets that use wdrs
vshttp://www.w3.org/2003/06/sw-vocab-status/ns#12 (1.18%)Datasets that use vs
vannhttp://purl.org/vocab/vann/12 (1.18%)Datasets that use vann
icaltzdhttp://www.w3.org/2002/12/cal/icaltzd#11 (1.08%)Datasets that use icaltzd
frbrcorehttp://purl.org/vocab/frbr/core#11 (1.08%)Datasets that use frbrcore
xhvhttp://www.w3.org/1999/xhtml/vocab#11 (1.08%)Datasets that use xhv
lcyhttp://purl.org/vocab/lifecycle/schema#10 (0.99%)Datasets that use lcy
rdfghttp://www.w3.org/2004/03/trix/rdfg-1/10 (0.99%)Datasets that use rdfg
mohttp://purl.org/ontology/mo/9 (0.89%)Datasets that use mo
calhttp://www.w3.org/2002/12/cal/ical#9 (0.89%)Datasets that use cal
sdmxhttp://purl.org/linked-data/sdmx#9 (0.89%)Datasets that use sdmx
skosxlhttp://www.w3.org/2008/05/skos-xl#8 (0.79%)Datasets that use skosxl
visithttp://purl.org/net/vocab/2004/07/visit#8 (0.79%)Datasets that use visit
timelinehttp://purl.org/NET/c4dm/timeline.owl#8 (0.79%)Datasets that use timeline
counhttp://www.daml.org/2001/09/countries/iso-3166-ont#7 (0.69%)Datasets that use coun
wn20schemahttp://www.w3.org/2006/03/wn/wn20/schema/6 (0.59%)Datasets that use wn20schema
spatialhttp://geovocab.org/spatial#6 (0.59%)Datasets that use spatial
dcamhttp://purl.org/dc/dcam/5 (0.49%)Datasets that use dcam
admshttp://www.w3.org/ns/adms#5 (0.49%)Datasets that use adms
voafhttp://purl.org/vocommons/voaf#5 (0.49%)Datasets that use voaf
xkoshttp://purl.org/linked-data/xkos#5 (0.49%)Datasets that use xkos
revhttp://purl.org/stuff/rev#5 (0.49%)Datasets that use rev
apihttp://purl.org/linked-data/api/vocab#4 (0.39%)Datasets that use api
rdarelhttp://rdvocab.info/RDARelationshipsWEMI/3 (0.30%)Datasets that use rdarel
geomhttp://geovocab.org/geometry#3 (0.30%)Datasets that use geom
oohttp://purl.org/openorg/3 (0.30%)Datasets that use oo
loghttp://www.w3.org/2000/10/swap/log#3 (0.30%)Datasets that use log
wordnethttp://purl.org/vocabularies/princeton/wordnet/schema#3 (0.30%)Datasets that use wordnet
formatshttp://www.w3.org/ns/formats/3 (0.30%)Datasets that use formats
exifhttp://www.w3.org/2003/12/exif/ns#3 (0.30%)Datasets that use exif
wlohttp://purl.org/ontology/wo/3 (0.30%)Datasets that use wlo
goldhttp://purl.org/linguistics/gold/3 (0.30%)Datasets that use gold
xtypeshttp://purl.org/xtypes/3 (0.30%)Datasets that use xtypes
dochttp://www.w3.org/2000/10/swap/pim/doc#3 (0.30%)Datasets that use doc
bookhttp://purl.org/NET/book/vocab#2 (0.20%)Datasets that use book
pohttp://purl.org/ontology/po/2 (0.20%)Datasets that use po
rdag1http://rdvocab.info/Elements/2 (0.20%)Datasets that use rdag1
taxohttp://purl.org/rss/1.0/modules/taxonomy/2 (0.20%)Datasets that use taxo
labelhttp://purl.org/net/vocab/2004/03/label#2 (0.20%)Datasets that use label
wvhttp://vocab.org/waiver/terms/2 (0.20%)Datasets that use wv
damlhttp://www.daml.org/2001/03/daml+oil#2 (0.20%)Datasets that use daml
ctorghttp://purl.org/ctic/infraestructuras/organizacion#2 (0.20%)Datasets that use ctorg
proghttp://purl.org/prog/2 (0.20%)Datasets that use prog
cshttp://purl.org/vocab/changeset/schema#2 (0.20%)Datasets that use cs
opmvhttp://purl.org/net/opmv/ns#2 (0.20%)Datasets that use opmv
coinhttp://purl.org/court/def/2009/coin#2 (0.20%)Datasets that use coin
admsswhttp://purl.org/adms/sw/2 (0.20%)Datasets that use admssw
libraryhttp://purl.org/library/1 (0.10%)Datasets that use library
fresnelhttp://www.w3.org/2004/09/fresnel#1 (0.10%)Datasets that use fresnel
scvhttp://purl.org/NET/scovo#1 (0.10%)Datasets that use scv
rehttp://www.w3.org/2000/10/swap/reason#1 (0.10%)Datasets that use re
cidoccrmhttp://purl.org/NET/cidoc-crm/core#1 (0.10%)Datasets that use cidoccrm
grddlhttp://www.w3.org/2003/g/data-view#1 (0.10%)Datasets that use grddl
lyouhttp://purl.org/linkingyou/1 (0.10%)Datasets that use lyou
tehttp://www.w3.org/2006/time-entry#1 (0.10%)Datasets that use te
gadmhttp://gadm.geovocab.org/ontology#1 (0.10%)Datasets that use gadm
beinghttp://purl.org/ontomedia/ext/common/being#1 (0.10%)Datasets that use being
annhttp://www.w3.org/2000/10/annotation-ns#1 (0.10%)Datasets that use ann
bookmarkhttp://www.w3.org/2002/01/bookmark#1 (0.10%)Datasets that use bookmark
radhttp://www.w3.org/ns/rad#1 (0.10%)Datasets that use rad
linkhttp://www.w3.org/2006/link#1 (0.10%)Datasets that use link
oahttp://www.w3.org/ns/oa#1 (0.10%)Datasets that use oa
asnhttp://purl.org/ASN/schema/core/1 (0.10%)Datasets that use asn
swidhttp://semanticweb.org/id/1 (0.10%)Datasets that use swid
radionhttp://www.w3.org/ns/radion#1 (0.10%)Datasets that use radion
gbvhttp://purl.org/ontology/gbv/1 (0.10%)Datasets that use gbv
ssnhttp://www.w3.org/2005/Incubator/ssn/ssnx/ssn#1 (0.10%)Datasets that use ssn
wdrhttp://www.w3.org/2007/05/powder#1 (0.10%)Datasets that use wdr
gsohttp://www.w3.org/2006/gen/ont#1 (0.10%)Datasets that use gso
amalgamehttp://purl.org/vocabularies/amalgame#1 (0.10%)Datasets that use amalgame
emphttp://purl.org/ctic/empleo/oferta#1 (0.10%)Datasets that use emp
conversionhttp://purl.org/twc/vocab/conversion/1 (0.10%)Datasets that use conversion
aclhttp://www.w3.org/ns/auth/acl#1 (0.10%)Datasets that use acl
psychhttp://purl.org/vocab/psychometric-profile/1 (0.10%)Datasets that use psych
placeshttp://purl.org/ontology/places#1 (0.10%)Datasets that use places
hcardhttp://purl.org/uF/hCard/terms/1 (0.10%)Datasets that use hcard
citohttp://purl.org/spar/cito/1 (0.10%)Datasets that use cito
rovhttp://www.w3.org/ns/regorg#1 (0.10%)Datasets that use rov
identityhttp://purl.org/twc/ontologies/identity.owl#1 (0.10%)Datasets that use identity
flowhttp://www.w3.org/2005/01/wf/flow#1 (0.10%)Datasets that use flow
b2bohttp://purl.org/b2bo#1 (0.10%)Datasets that use b2bo
swrlhttp://www.w3.org/2003/11/swrl#1 (0.10%)Datasets that use swrl
transithttp://vocab.org/transit/terms/1 (0.10%)Datasets that use transit
rdafrbrhttp://rdvocab.info/uri/schema/FRBRentitiesRDA/1 (0.10%)Datasets that use rdafrbr
fowlhttp://www.w3.org/TR/2003/PR-owl-guide-20031209/food#1 (0.10%)Datasets that use fowl
cvhttp://purl.org/captsolo/resume-rdf/0.2/cv#1 (0.10%)Datasets that use cv
simhttp://purl.org/ontology/similarity/1 (0.10%)Datasets that use sim
wordmaphttp://purl.org/net/ns/wordmap#1 (0.10%)Datasets that use wordmap
frbrehttp://purl.org/vocab/frbr/extended#1 (0.10%)Datasets that use frbre

The following table displays the top three used vocabularies except the ubiquitously used vocabularies rdf, rdfs and owl for different categorical domains. The prefix odc denotes the vocabulary from opendatacommunities.org.

Category Vocabulary Usage Category Vocabulary Usage
social web foaf 86.12% life sciences dct 66.29%
dct 40.65% foaf 41.57%
wgs84 36.99% void 31.46%
publications dct 81.73% government dct 63.98%
foaf 69.23% cube 60.75%
bibo 41.34% odc* 46.24%
user-generated content dct 81.91% geographic dct 82.93%
foaf 74.55% foaf 65.85%
sioc 43.63% skos 48.78%
media foaf 75.67% crossdomain dct 72.73%
dct 54.05% foaf 72.73%
mo 18.91% skos 38.63%

4.2.2 Usage of Dereferencable Vocabularies

In particular for proprietary vocabularies, it is essential that they are derefencable and linked to other vocabularies, so that agents can interpret their semantics.

To assess whether a vocabulary is dereferencable, we collected the terms for each proprietary vocabulary encountered in our corpus. For every term, we requested its URI via an HTTP GET request. We define the dereferencability quota of a vocabulary as the number of dereferencable terms divided by all terms collected from the vocabulary.

In total, 19.25% of all proprietary vocabularies are fully dereferencable (i.e., their quota is 1.0). On the other hand, 72.75% of all proprietary vocabularies are not dereferencable at all. The remaining vocabularies, which are 8.00% of all proprietary ones, are partially dereferencable, meaning that for some terms, but not for all, a definition could be retrieved.

The dereferencability of proprietary vocabularies attributed to individual categories can be seen at the following table.

Usage and Dereferencability of Proprietary Vocabularies per Category
CategoryDifferent Prop. Vocabs Used (% of all Prop Vocab.)# of Datasets Using Prop. Vocab. (% of all datasets)Dereferencability
FullPartialNone
social web 128 (33.86%) 83 (15.99%) 16.41% 6.25% 77.78%
government48 (12.70%)35 (18.82%) 20.83% 12.50% 66.67%
publications58 (15.34%) 35 (33.65%) 20.69% 6.90% 72.41%
life sciences35 (9.25%)26 (29.21%) 28.57% 5.71% 65.71%
user-gen. cnt.30 (7.93%)26 (47.27%) 13.33% 10.00% 76.67%
cross-domain55 (14.55%)16 (36.36%) 27.27% 10.91% 61.82%
media22 (5.82%)21 (56.76%) 0.00% 9.09% 90.91%
geographic24 (6.34%)16 (39.02%) 20.83% 4.17% 75.00%
Total378 (58.24%)241 (23.17%) 19.25% 8.00% 72.75%

4.3 Adoption of Metadata Best Practices

Metadata helps making datasets self-descriptive. Best practices for providing metadata as Linked Data include provenance and licensing information, dataset-level metadata, and information about additional access methods.

4.3.1 Providing Provenance Information

Our analysis is based on a set of 26 vocabularies we identified to be usable for providing provenance information. It was assembled from information provided by the W3C working group on provenance, the LOV vocabulary catalog, as well as our own experience. Using those vocabularies, we searched in each datasets for triples that use one of the vocabularies and have a document's URI as the subject.

In summary, 35.77% of all datasets use some provenance vocabulary. Looking at individual vocabularies, 28.37% of all datasets use DC or DCTerms, 10.77% use MetaVocab, and 0.77% use prv or prov.

The following table shows an overview of provenance vocabulary use for different topical domains.

Datasets Providing Provenance Information By Category Including the Vocabulary Used
Category Any provenance vocabulary Using Dublin Core Using admin Using prv or prov
social web 169 (32.56%) 56.21% 58.58% 1.18%
government 77 (41.40%) 100.00% 0.00% 1.30%
publications 39 (37.50%) 94.87% 5.13% 2.56%
life sciences 21 (23.60%) 100.00% 0.00% 2.56%
user-gen. content 11 (20.00%) 90.91% 54.55% 0.00%
cross-domain 8 (18.18%) 100.00% 12.50% 0.00%
media 5 (13.51%) 100% 0.00% 0.00%
geographic 4 (9.76%) 100.00% 0.00% 25.00%
Total 372 (35.77%) 28.37% 10.77% 0.77%

4.3.2 Providing Licensing Information

With the help of licensing information, agents can assess whether they may use the data for the purpose at hand.

To evaluate whether a dataset provides license information, we again searched for triples which have the document as their subject and a predicate containing the string "licen". To this list, we added all predicates containing the string dc:/dct:rights and the waiver vocabulary, which leads to a total of 47 terms.

In total, 7.85% of all datasets provide licensing information in RDF. The most important predicates for indicating the license are dc/dct:license (7.98%), cc:license (2.02%) and dc/dct:rights (1.63%).

Category Licensing Information
social web 5.20%
government 29.57%
publications 3.85%
life sciences 3.37%
user-gen. content 10.91%
cross-domain 11.36%
media 5.41%
geographic 0.00%
Total 7.85%

4.3.3 Providing Dataset Level Metadata

Dataset level metadata is provided by using the VoID vocabulary, either as inline statements in the dataset or in a separate VoID file.

In the latter case, that file has to be linked from the data via a backlink, or provided at a well-known location, as defined by RFC5785, which is created by appending /.well-known/void to the host part of the URI. As the latter condition is often too strict for data providers due to missing root-level access to the servers, we relax the the search for VoID files at well-known locations, appending /.well-known/void to any portion of the URI.

In total, 140 (13.46%) of all datasets use the VoID vocabulary of which 48 (4.62%) use a backlinking mechanism, 34 of which link to a retrievable VoID file.

Category Total Link Well-known Inline
social web 6 (1.16%) 0.58% 0.19% 0.58%
government 75 (40.32%) 6.99% 3.23% 31.18%
publications 14 (13.46%) 6.73% 2.88% 5.77%
life sciences 29 (32.58%) 19.10% 4.49% 12.36%
user-gen. content 6 (10.91%) 5.45% 0.00% 5.45%
cross-domain 5 (11.36%) 9.09% 2.27% 2.27%
media 2 (5.41%) 2.70% 0.00% 2.70%
geographic 15 (36.59%) 14.63% 12.20% 12.20%
Total 140 (13.46%) 4.62% 1.44% 8.27%

4.3.4 Providing Alternative Access Methods

When looking at the availability of alternative access methods, we restricted ourselves those access methods which are stated in the dataset-level metadata, namely VoID files and triples with VoID statements.

In total, we found alternative access methods for 48 (5.89%) of all datasets. In total, SPARQL endpoints are denoted by 4.54% of all datasets while dumps are denoted by 3.8%. The following table gives an overview over SPARQL endpoints and dumps by category.

Category Any SPARQL Dump
social web 6 (1.16%) 1.16% 0.39%
government 61 (32.80%) 30.11% 30.65%
publications 10 (10.58%) 9.62% 3.85%
life sciences 19 (21.35%) 20.22% 16.85%
user-gen. content 3 (5.45%) 5.45% 1.82%
cross-domain 4 (9.09%) 4.55% 6.82%
media 1 (2.70%) 0.00% 2.70%
geographic 8 (19.51%) 12.20% 12.20%
Total 48 (5.89%) 4.54% 3.80%

4. Downloads

The crawl dumps and other files which are the basis of this analysis can be downloaded here.

5. Feedback

For feedback, please contact Max Schmachtenberg or Chris Bizer

6. References

7. Credits

The work was supported by the EU research project PlanetData.

PlanetData Logo