Skip to content

News

Linking Data

Nov 26, 2010

One of the great advantages of the world wide web has been the ability for anyone to make a website and link it to any other. This openness, and the ground-up approach, has led to the mass of interconnected web pages we see today. However, as more and more data makes its way online, computers are failing to understand the connections between different data sets. That is where the idea of Linked Data comes in. Linked Data is simply a way of describing and storing data on the web and saying how data on web page A is connected to data on web page B.

For a while I've been trying to get my head around Linked Data. Not so much what it is but rather how to create it and use it - preferably by hand. Most of the websites about Linked Data are very technical and launch into talk of specifications, schemas and ontologies at the slightest provocation. Most of that type of stuff scares me as it is often written for those that understand it rather than those starting out. I'd also been lacking any relevant examples to play around with and learn the basics from. Then, a couple of weeks ago via a conversation with Doug Burke, I noticed that schools in England and Wales have been included in the UK government's first foray into Linked Data (Scotland and Northern Ireland have separate education systems and aren't included yet). LCOGT, through the Faulkes Telescope Project, has many school users in the UK so that seemed like a good place for me to get my feet wet.

Education.data.gov.uk provides web addresses (URIs) - for each school. At a page for a specific school (e.g. Clifton High School) data about that school can be seen and, importantly, understood by special software. My first experiment was to see if we could find out the special web address (the URI) for each school that we had in the LCOGT database. This involved learning some SPARQL (apparently similar to SQL or MYSQL) so that I could search their school database. It turned out that the data quality of our own database wasn't great with some schools being listed with slightly different names, numbers or postcodes compared to the government database. However, after a bit of effort, we now have URIs for 684 schools. That meant we could start doing some interesting things.

The first thing I did was to download the longitude and latitude of every school that we had a Linked Data address for. I then gridded these and made a heat map (the redder an area, the more schools are in that bit of the country) for English and Welsh schools. The result looks fairly similar to a map showing population density so the good news is that we don't appear to have much bias in which parts of England and Wales register with us.

That done, we added some Linked Data within the web pages for observations and users. Although not visible to the person viewing the web page it does show up in special software.

In the past couple of days I've been experimenting with sharing our data properly through something known as RDF. I'm not entirely sure of what the best way of putting information into this format is yet but I'm creating examples of how it might look and hoping some Linked Data experts might be able to give me some pointers. At the same time, I'm also experimenting with making our data accessible to Javascript through JSON (or, more correctly, JSONp).

These are just the first baby steps towards making Linked Data at LCOGT. I'm still learning all this stuff but what we offer should improve with time.