RDF Tree revisited: Developer-friendly JSON or XML from RDF graphs

In my previous post I talked about RDF Tree, an approach to building JSON or XML data from RDF graphs. Having received a number of useful comments, particularly from those involved with JSON-LD, I have revisited the approach and would like to present a revised version.

What is RDF Tree?

RDF Tree is an approach (and a Java library in-development to implement the approach), to producing developer-friendly serialisations of RDF graphs. It is not a serialisation format in itself like JSON-LD, but simply an approach to building predictable, stable JSON and XML representations of graph data.

The aims of this approach are as follows:

  • RDF Tree serialisations are non-semantic
    • Designed to power data-driven visual representation of data such as HTML
    • Designed to be lossy: the RDF graph cannot be recovered from the data
      • It is best practice to offer the data as RDF also for clients that require semantic data
  • RDF Tree is designed to be flexible
    • Whilst there are core principles, different rules, syntax and algorithms can be used to tailor the approach to a specific domain or use-case
  • RDF Trees are either single trees or multiple trees in an ordered list
    • Tree root(s) are indicated in the RDF using the tree ontology (see previous post)
    • For single trees, a specific root resource is known
    • For multiple trees, an ordered list of root resources is known (duplicates allowed)
    • RDF Trees can be built according to different rules
  • The general four rules for constructing the abstract tree from a graph structure are outlined in the previous blogpost.
  • As the rules can vary, there is no one canonical RDF Tree for a given graph input
  • Given a fixed set of rules, RDF Trees are produced as a function of a graph input
    • Rules include:
      • When to stop traversing the graph when building the tree
      • How to ‘canonicalise’ the resulting RDF Tree (e.g. deterministic property ordering)
  • The JSON or XML produced using this approach is largely indistinguishable from ‘vanilla’ JSON or XML
    • No superfluous meta or reference data is provided in order to extract the original graph or understand the specific semantics of the data
    • Designed for use with generic JSON or XML parsing libraries
  • Where naming conflicts exist, stable prefixes are used to distinguish between properties
  • Assumptions are made to optimise the approach
    • All data is considered single-language, different languages can be requested using the Accept-Language header
    • Datatype handling is minimal – datatypes are expected to be predictable
      • No datatypes in XML
      • JSON value types are respected
  • Where possible, the JSON syntax is aligned with JSON-LD, with the principal difference being the absence of the “@context” metadata
  • Inverse properties are included with the “^” prefix in JSON and a inverse=”true” attribute in XML

What does RDF Tree look like?

For the given RDF Turtle input:

@prefix par:     <http://purl.org/vocab/participation/schema#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix geo:     <http://www.bbc.co.uk/ontologies/geopolitical/> .
@prefix foaf:    <http://xmlns.com/foaf/0.1/> .
@prefix owl:     <http://www.w3.org/2002/07/owl#> .
@prefix domain:  <http://www.bbc.co.uk/ontologies/domain/> .
@prefix oly:     <http://www.bbc.co.uk/ontologies/2012olympics/> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sport: <http://www.bbc.co.uk/ontologies/sport/> .
@prefix tree:  <http://purl.org/rdf-tree/> .

tree:tree tree:root <http://www.bbc.co.uk/things/4e40ce40-b632-4a42-98d7-cf97067f7bf9#id> .

<http://www.bbc.co.uk/things/7ef7ffdf-f101-4470-adc0-38a5abac9122#id> a sport:CompetitiveSportingOrganisation ;
      oly:territory <http://www.bbc.co.uk/things/territories/gb#id> ;
      domain:document <http://www.bbc.co.uk/sport/olympics/2012/countries/great-britain> ;
        domain:shortName "Great Britain & N. Ireland"^^xsd:string ;
      domain:name "Team GB"^^xsd:string .

<http://www.bbc.co.uk/things/4e40ce40-b632-4a42-98d7-cf97067f7bf9#id> a sport:Person ;
      par:role_at <http://www.bbc.co.uk/things/7ef7ffdf-f101-4470-adc0-38a5abac9122#id> ;
      oly:dateOfBirth "1976-10-24"^^xsd:date ;
      oly:gender "M"^^xsd:string ;
      oly:height "172.0"^^xsd:float ;
      oly:weight "72.0"^^xsd:float ;
      domain:name "Ben Ainslie"^^xsd:string ;
      sport:competesIn <http://www.bbc.co.uk/things/2012/sam002#id>, <http://www.bbc.co.uk/things/2012/sam005#id> ;
      sport:discipline <http://www.bbc.co.uk/things/d65c5dce-f5e4-4340-931b-16ca1848d092#id> ;
      domain:document <http://www.bbc.co.uk/sport/olympics/2012/athletes/4e40ce40-b632-4a42-98d7-cf97067f7bf9>, <http://www.facebook.com/pages/Ben-Ainslie/108182689201922> ;
      foaf:familyName "Ainslie"^^xsd:string ;
      foaf:givenName "Ben"^^xsd:string .

<http://www.facebook.com/pages/Ben-Ainslie/108182689201922> a domain:Document ; 
   domain:documentType <http://www.bbc.co.uk/things/document-types/external> , <http://www.bbc.co.uk/things/document-types/facebook> .

<http://www.bbc.co.uk/sport/olympics/2012/athletes/4e40ce40-b632-4a42-98d7-cf97067f7bf9> a domain:Document ;
   domain:domain <http://www.bbc.co.uk/things/domains/olympics2012> ;
   domain:documentType <http://www.bbc.co.uk/things/document-types/bbc-document> .

<http://www.bbc.co.uk/things/2012/sam002#id> a sport:MedalCompetition ;
        domain:name "Sailing - Men's Finn"^^xsd:string ;
        domain:shortName "Men's Finn"^^xsd:string ;
      domain:externalId <urn:ioc2012:SAM002000> ;
      domain:document <http://www.bbc.co.uk/sport/olympics/2012/sports/sailing/events/mens-finn> .

<http://www.bbc.co.uk/things/2012/sam005#id> a sport:MedalCompetition ;
      domain:name "Sailing - Men's 470"^^xsd:string ;
        domain:shortName "Men's 470"^^xsd:string ;
      domain:externalId <urn:ioc2012:SAM005000> ;
        domain:document <http://www.bbc.co.uk/sport/olympics/2012/sports/sailing/events/mens-470> .

<http://www.bbc.co.uk/things/d65c5dce-f5e4-4340-931b-16ca1848d092#id> a sport:SportsDiscipline ;
      domain:document <http://www.bbc.co.uk/sport/olympics/2012/sports/sailing> ;
      domain:name "Sailing"^^xsd:string .

<http://www.bbc.co.uk/things/territories/gb#id> a geo:Territory ;
      domain:name "the United Kingdom of Great Britain and Northern Ireland"^^xsd:string ;
      geo:isInGroup <http://www.bbc.co.uk/things/81b14df8-f9d2-4dff-a676-43a1a9a5c0a5#id> .

<http://www.bbc.co.uk/things/81b14df8-f9d2-4dff-a676-43a1a9a5c0a5#id> a geo:Group ;
      domain:name "Europe"^^xsd:string ;
      geo:groupType <http://www.bbc.co.uk/things/group-types/bbc-news-geo-regions> .

The following JSON is produced:

{
  "@id": "http://www.bbc.co.uk/things/4e40ce40-b632-4a42-98d7-cf97067f7bf9#id",
  "@type": "http://www.bbc.co.uk/ontologies/sport/Person",
  "dateOfBirth": "1976-10-24",
  "familyName": "Ainslie",
  "gender": "M",
  "givenName": "Ben",
  "height": 172.0,
  "name": "Ben Ainslie",
  "weight": 72.0,
  "competesIn": [
    {
      "@id": "http://www.bbc.co.uk/things/2012/sam002#id",
      "@type": "http://www.bbc.co.uk/ontologies/sport/MedalCompetition",
      "name": "Sailing - Men\u0027s Finn",
      "shortName": "Men\u0027s Finn",
      "document": "http://www.bbc.co.uk/sport/olympics/2012/sports/sailing/events/mens-finn",
      "externalId": "urn:ioc2012:SAM002000"
    },
    {
      "@id": "http://www.bbc.co.uk/things/2012/sam005#id",
      "@type": "http://www.bbc.co.uk/ontologies/sport/MedalCompetition",
      "name": "Sailing - Men\u0027s 470",
      "shortName": "Men\u0027s 470",
      "document": "http://www.bbc.co.uk/sport/olympics/2012/sports/sailing/events/mens-470",
      "externalId": "urn:ioc2012:SAM005000"
    }
  ],
  "discipline": {
    "@id": "http://www.bbc.co.uk/things/d65c5dce-f5e4-4340-931b-16ca1848d092#id",
    "@type": "http://www.bbc.co.uk/ontologies/sport/SportsDiscipline",
    "name": "Sailing",
    "document": "http://www.bbc.co.uk/sport/olympics/2012/sports/sailing"
  },
  "document": [
    {
      "@id": "http://www.facebook.com/pages/Ben-Ainslie/108182689201922",
      "@type": "http://www.bbc.co.uk/ontologies/domain/Document",
      "documentType": [
        "http://www.bbc.co.uk/things/document-types/facebook",
        "http://www.bbc.co.uk/things/document-types/external"
      ]
    },
    {
      "@id": "http://www.bbc.co.uk/sport/olympics/2012/athletes/4e40ce40-b632-4a42-98d7-cf97067f7bf9",
      "@type": "http://www.bbc.co.uk/ontologies/domain/Document",
      "documentType": "http://www.bbc.co.uk/things/document-types/bbc-document",
      "domain": "http://www.bbc.co.uk/things/domains/olympics2012"
    }
  ],
  "role_at": {
    "@id": "http://www.bbc.co.uk/things/7ef7ffdf-f101-4470-adc0-38a5abac9122#id",
    "@type": "http://www.bbc.co.uk/ontologies/sport/CompetitiveSportingOrganisation",
    "name": "Team GB",
    "shortName": "Great Britain \u0026 N. Ireland",
    "document": "http://www.bbc.co.uk/sport/olympics/2012/countries/great-britain",
    "territory": {
      "@id": "http://www.bbc.co.uk/things/territories/gb#id",
      "@type": "http://www.bbc.co.uk/ontologies/geopolitical/Territory",
      "name": "the United Kingdom of Great Britain and Northern Ireland",
      "isInGroup": {
        "@id": "http://www.bbc.co.uk/things/81b14df8-f9d2-4dff-a676-43a1a9a5c0a5#id",
        "@type": "http://www.bbc.co.uk/ontologies/geopolitical/Group",
        "name": "Europe",
        "groupType": "http://www.bbc.co.uk/things/group-types/bbc-news-geo-regions"
      }
    }
  }
}

And the following XML is produced:

<Person id="http://www.bbc.co.uk/things/4e40ce40-b632-4a42-98d7-cf97067f7bf9#id">
  <dateOfBirth>1976-10-24</dateOfBirth>
  <familyName>Ainslie</familyName>
  <gender>M</gender>
  <givenName>Ben</givenName>
  <height>172.0</height>
  <name>Ben Ainslie</name>
  <weight>72.0</weight>
  <competesIn>
    <MedalCompetition id="http://www.bbc.co.uk/things/2012/sam002#id">
      <name>Sailing - Men's Finn</name>
      <shortName>Men's Finn</shortName>
      <document id="http://www.bbc.co.uk/sport/olympics/2012/sports/sailing/events/mens-finn"/>
      <externalId id="urn:ioc2012:SAM002000"/>
    </MedalCompetition>
  </competesIn>
  <competesIn>
    <MedalCompetition id="http://www.bbc.co.uk/things/2012/sam005#id">
      <name>Sailing - Men's 470</name>
      <shortName>Men's 470</shortName>
      <document id="http://www.bbc.co.uk/sport/olympics/2012/sports/sailing/events/mens-470"/>
      <externalId id="urn:ioc2012:SAM005000"/>
    </MedalCompetition>
  </competesIn>
  <discipline>
    <SportsDiscipline id="http://www.bbc.co.uk/things/d65c5dce-f5e4-4340-931b-16ca1848d092#id">
      <name>Sailing</name>
      <document id="http://www.bbc.co.uk/sport/olympics/2012/sports/sailing"/>
    </SportsDiscipline>
  </discipline>
  <document>
    <Document id="http://www.facebook.com/pages/Ben-Ainslie/108182689201922">
      <documentType id="http://www.bbc.co.uk/things/document-types/facebook"/>
      <documentType id="http://www.bbc.co.uk/things/document-types/external"/>
    </Document>
  </document>
  <document>
    <Document id="http://www.bbc.co.uk/sport/olympics/2012/athletes/4e40ce40-b632-4a42-98d7-cf97067f7bf9">
      <documentType id="http://www.bbc.co.uk/things/document-types/bbc-document"/>
      <domain id="http://www.bbc.co.uk/things/domains/olympics2012"/>
    </Document>
  </document>
  <role_at>
    <CompetitiveSportingOrganisation id="http://www.bbc.co.uk/things/7ef7ffdf-f101-4470-adc0-38a5abac9122#id">
      <name>Team GB</name>
      <shortName>Great Britain &amp; N. Ireland</shortName>
      <document id="http://www.bbc.co.uk/sport/olympics/2012/countries/great-britain"/>
      <territory>
        <Territory id="http://www.bbc.co.uk/things/territories/gb#id">
          <name>the United Kingdom of Great Britain and Northern Ireland</name>
          <isInGroup>
            <Group id="http://www.bbc.co.uk/things/81b14df8-f9d2-4dff-a676-43a1a9a5c0a5#id">
              <name>Europe</name>
              <groupType id="http://www.bbc.co.uk/things/group-types/bbc-news-geo-regions"/>
            </Group>
          </isInGroup>
        </Territory>
      </territory>
    </CompetitiveSportingOrganisation>
  </role_at>
</Person>

Property names

RDF Tree uses the property’s local name as the JSON field name. If a name conflict exists (more than one IRI exists for the same local name), then the IRI prefix is used to distinguish the properties, e.g. “foaf:name” where another “name” exists. A namespace priority list is used to determine which IRI can be expressed as just the local name, and which requires the prefix.

Essentially, no two properties can have the same name. However, property names can vary depending on the presence of other properties with the same local name.

The same approach is used in the XML element names, except the separator char is “-” resulting in disambiguated element names like <foaf-name/>.

Stable property set

Even though naming inconsistencies will be rare, the potential can be reduced by adding properties to a list of ‘stable’ IRIs with prefix and unique local name. This set will contain the definitive set of unambiguous local names. This set will never be visible to users of the data, and is simply there to ensure the stability of the data.

Introducing Tripliser

I recently had to solve the problem of how to take XML, in a predefined format, and create RDF representing the semantics of the data. I began using XSLT, but gradually the edge cases to handle inconsistencies in the input XML caused the XLST to become verbose and incomprehensible (being a mix of syntax handling and business logic). Errors were hard to diagnose and failures were not effectively recovered from. I decided to write a library to help me with this problem, called Tripliser…

>> Homepage  |  >> GitHub

Tripliser is a Java library and command-line tool for creating triple graphs, and RDF serialisations, from XML source data. It is particularly suitable for data exhibiting any of the following characteristics:

  • Messy – missing data, badly formatted data, changeable structure
  • Bulky – large volumes of data
  • Volatile – ongoing changes to data and structure, e.g. feeds

Other non-RDF source data may be supported in future such as CSV and SQL databases.

It is designed as an alternative to XSLT conversion, providing the following advantages:

  • Easy-to-read mapping format – concisely describing each mapping
  • Robust – error or partial failure tolerant
  • Detailed reporting – comprehensive feedback on the successes and failures of the conversion process
  • Extensible – custom functions, flexible API
  • Efficient – facilities for processing data in large volumes with minimal memory usage

XML files are read in, and XPath is used to extract values which can be inserted into a triple graph. The graph can be serialised in various RDF formats and is accompanied by meta-data and a property-by-property report to indicate how successful or unsuccessful the mapping process was.

Data flow in Tripliser

Here’s what a typical mapping format looks like…

<?xml version="1.0" encoding="UTF-8"?>
<rdf-mapping xmlns="http://www.daverog.org/rdf-mapping" strict="false">
	<constants>
		<constant name="objectsUri" value="http://objects.theuniverse.org/" />
	</constants>
	<namespaces>
		<namespace prefix="xsd" url="http://www.w3.org/2001/XMLSchema#" />
		<namespace prefix="rdfs" url="http://www.w3.org/2000/01/rdf-schema#" />
		<namespace prefix="dc" url="http://purl.org/dc/elements/1.1/" />
		<namespace prefix="universe" url="http://theuniverse.org/" />
	</namespaces>
	<graph query="//universe-objects" name="universe-objects" comment="A graph for objects in the universe">
		<resource query="stars/star">
			<about prepend="${objectsUri}" append="#star" query="@id" />
			<properties>
				<property name="rdf:type" resource="true" value="universe:Star"/>
				<property name="dc:title" query="name" />
				<property name="universe:id" query="@id" />
				<property name="universe:spectralClass" query="spectralClass" />
			</properties>
		</resource>
		<resource query="planets/planet">
			<about prepend="${objectsUri}" append="#planet" query="@id" />
			<properties>
				<property name="rdf:type" resource="true" value="universe:Planet"/>
				<property name="dc:title" query="name" />
				<property name="universe:id" query="@id" />
				<property name="universe:adjective" query="adjective" />
				<property name="universe:numberOfSatellites" dataType="xsd:int" query="satellites" />
			</properties>
		</resource>
	</graph>
</rdf-mapping>

Go to the Homepage or to GitHub to find out more.