I think this comment that Danny Ayers left on my In defence of the RDBMS post deserves to be discussed in a post of its own:
The point about relational databases as an integration technology is well made, but I’m curious to know why you consider RDF worse as a storage medium. It has definite advantages over OO/RDBs when it comes to integration on the web (thanks to the use of URIs as keys, and the open world model).
For persistence I can’t see any way it’s worse than OO/RDBs (in fact quite a few RDF stores use RDBs for persistence under the hood). What’s more, RDF has well-defined serializations (such as RDF/XML) which means that not only is the data portable between stores, it can also be dumped in a *standard* form. (For persistence it’s perfectly reasonable to divide the data up into manageable chunks and distribute them across RDF/XML files).
If “data lives much longer than applications”, then isn’t it better to take advantage of a clear standard, rather than the quasi-standards found in SQL implementations, or for that matter the more proprietary models found in OO DBs..?
Well… yes and no.
Let me first point out the fact that I wrote about “using RDF as a persistent storage medium just because it’s more flexible than a RDBMS”. That’s what I was objecting to (and before you ask: no, it’s not a hypothetical scenario) and not the usage of RDF per se. As Gavin King wrote in the post I was responding to: “Database refactoring is possible and practical.” and you shouldn’t be using some new, unproven technology just because refactoring and maintaining SQL databases is hard.
Second, RDF data might be portable when it is serialized as XML or N3, but once it is persistently stored, it is usually in a proprietary format that can only be accessed with a proprietary API. If I have, say, a Jena model stored in an RDBMS, all I have is a essentially a single table with three columns (subject, predicate and object) where all values have been mangled so much that the number 42 becomes Lv:0:42:http://www.w3.org/2001/XMLSchema#nonNegativeInteger4 and so on.
Contrast that to an SQL database schema where, if the designer didn’t purposefully obfuscate it, it’s usually possible to reverse engineer it, sometimes just by looking at the names of tables and columns, and at foreign keys to infer relationships. There are also mature tools to move data between different databases.
You could argue that I am comparing things that are at different levels, that I should be looking at N3 serialization format as an equivalent of SQL, and that complaining about the non-portability of Jena models is equivalent to complaining about not being able to move MySQL data files to Oracle. If you did that… well, I’d concede you have a point
But the fact remains that, as long as I have my data served by a reasonably well-known RDBMS and I am using a reasonably well-designed schema, I’ll be able to find a (oftentimes cheap or free) tool that allows me to make sense of that data, analyze it, transform it, plot it, report it, you name it.
Without even much thinking about it, I can fire up mysql, psql or sqlplus from the comman line and type:
select avg(salary) as a from person group by age having avg(salary) > 50000 order by a desc;
I’m not really up to speed with SPARQL, but I don’t think it’s able to do that just yet. Not to mention how efficient it would be, whereas RDBMS have been optimized for 30 years in order to be blazingly fast at doing joins, sorts, groupings, projections, and the like. You know, the kind of things business people tend to ask from a data store.
So, to sum it up, RDF does really shine “when it comes to integration on the web”, especially when we are doing integration between really heterogeneous systems, without much in the way of predefined agreements between them. But I wouldn’t right now, given the maturity of tools, design a system that had an RDF storage system at its core, unless I had some compelling, specific reason for doing so.


You said:
““Database refactoring is possible and practical.” and you shouldn’t be using some new, unproven technology just because refactoring and maintaining SQL databases is hard.”
Urgh - that’s a terrible argument to be trying to make. If something is hard it’s time to investigate/try other options which can be either revolution or evolution. Both are about as valid as each other and which you try will be determined by your project risk profile.