I recently had to solve the problem of how to take XML, in a predefined format, and create RDF representing the semantics of the data. I began using XSLT, but gradually the edge cases to handle inconsistencies in the input XML caused the XLST to become verbose and incomprehensible (being a mix of syntax handling and business logic). Errors were hard to diagnose and failures were not effectively recovered from. I decided to write a library to help me with this problem, called Tripliser…
Tripliser is a Java library and command-line tool for creating triple graphs, and RDF serialisations, from XML source data. It is particularly suitable for data exhibiting any of the following characteristics:
- Messy – missing data, badly formatted data, changeable structure
- Bulky – large volumes of data
- Volatile – ongoing changes to data and structure, e.g. feeds
Other non-RDF source data may be supported in future such as CSV and SQL databases.
It is designed as an alternative to XSLT conversion, providing the following advantages:
- Easy-to-read mapping format – concisely describing each mapping
- Robust – error or partial failure tolerant
- Detailed reporting – comprehensive feedback on the successes and failures of the conversion process
- Extensible – custom functions, flexible API
- Efficient – facilities for processing data in large volumes with minimal memory usage
XML files are read in, and XPath is used to extract values which can be inserted into a triple graph. The graph can be serialised in various RDF formats and is accompanied by meta-data and a property-by-property report to indicate how successful or unsuccessful the mapping process was.
Here’s what a typical mapping format looks like…
<?xml version="1.0" encoding="UTF-8"?> <rdf-mapping xmlns="http://www.daverog.org/rdf-mapping" strict="false"> <constants> <constant name="objectsUri" value="http://objects.theuniverse.org/" /> </constants> <namespaces> <namespace prefix="xsd" url="http://www.w3.org/2001/XMLSchema#" /> <namespace prefix="rdfs" url="http://www.w3.org/2000/01/rdf-schema#" /> <namespace prefix="dc" url="http://purl.org/dc/elements/1.1/" /> <namespace prefix="universe" url="http://theuniverse.org/" /> </namespaces> <graph query="//universe-objects" name="universe-objects" comment="A graph for objects in the universe"> <resource query="stars/star"> <about prepend="${objectsUri}" append="#star" query="@id" /> <properties> <property name="rdf:type" resource="true" value="universe:Star"/> <property name="dc:title" query="name" /> <property name="universe:id" query="@id" /> <property name="universe:spectralClass" query="spectralClass" /> </properties> </resource> <resource query="planets/planet"> <about prepend="${objectsUri}" append="#planet" query="@id" /> <properties> <property name="rdf:type" resource="true" value="universe:Planet"/> <property name="dc:title" query="name" /> <property name="universe:id" query="@id" /> <property name="universe:adjective" query="adjective" /> <property name="universe:numberOfSatellites" dataType="xsd:int" query="satellites" /> </properties> </resource> </graph> </rdf-mapping>