Thursday, April 16, 2009

Easy semantic linking for authors

I've been playing with RDF a bit lately to see what I can make of it in terms of practical applications. The first hurdle is the rather long specs. Now, I won't pretend I'm someone that can pick up a spec and read it cover to cover. I like to play with some code as I read so that I can sort it in my head. So, as part of my reading I put together semanto - it's wrong in a couple of ways and generally basic but it's my live learning.

This got me thinking about people that don't want to read the W3 specs and hunt for schema that suits their needs. Peter Sefton discussed a method for authors to embed a triple into a document's link. Once the article is completed, the publisher can pass the document to a system that will turn these links into RDF/RDFa and output a webpage.

As he's my boss, I tend to agree with Peter. Actually, no, I tend to agree with the idea as it provides part of an "easy in" for authors.

Having played with the various RDF stuff out there, I can see that an essential part of the "easy in" is to remove the chase for RDF schemas. Bascially, I want to author something and then have an easy to use UI for classifying the information. If that system can provide me standard predicates for my items then I don't really need to think too much about semantics.

To base my thoughts on this workflow:
  1. Do research
  2. Write article
  3. Indicate document predicates/objects
  4. (Maybe) Determine other predicates/objects
  5. Publish
Steps 1 & 2 are really in your court (though you may want to keep an eye on The Fascinator).

I pick up Peter's idea in step 3. You can go through your document and add links to useful information. For example, you can assert that "Jim Smith" is a dc:creator and the dc:title is "My Weekend" etc. In Peter's model, these all appear as hyperlinks. You could even highlight the abstract and create a dc:description link. It'd be ugly and (from my experience unwieldy) but it is possible and it is cross app. You could even do some fancy grouping *.

What Step 3 needs is a predefined set of terms for you to plug into. For example, we would cherry pick the various schema elements and provide those best suited to the work being produced. You could base this in an eprints-style workflow:

What sort of publication are you describing?

... then we present the usual

The following properties are available for an article:

From that session we could produce an RDF document for the article using Dublin Core and the Bibliographic Ontology. The user will get a generated RDF file that has all the info and no need for them to work out which namespaces/schemas are the most appropriate. This isn't new - it's a little like the FOAF-a-matic.

We could also provide an interface with something like

What are you describing?

The system can then spit out rdf triples or a link for Peter's word processor. What matters here is that, again, the author can be largely unaware of the underlying rdf complexities.

This last point leads to Step 4, in which we could throw the article at a system like OpenCalais to find content/metadata in the article that may be worth describing in RDF/RDFa. The author can select/deselect elements as they deem sensible and those that remain are either linked via RDFa or put into the associated RDF file.

Now, all I need is to find the time to try this out....

* Not being completely across the spec, RDFa does seem to be limited in terms of some aspects of academic publishing. The issue of author order comes to mind. Using the basic RDFa examples, I link the authors but can't contain them ala an RDF:Seq. This is discussed in RDFa Containers and solvable - even in word processors as they have (un)ordered lists....


  1. Maybe an ideal location for this type of system would be within Zotero.

  2. Three quick things: 1) you're using the incorrect DC property; use the new dcterms one, 2) Why encode this all in the URI when you could just use RDFa (add the property to the rel attribute)?, and 3) take a look at a little photo app Norm Walsh was once experimenting with.

  3. Hi there anonymous,

    1) Good point - I was using DC Elements as that's what they discuss in the RDFa primer. I'll certainly use dc terms from now. That being said, the new schema maps to the old one.

    2) My team develops word processing solutions that allow documents to be converted into HTML. At present, Word and OpenOffice lack the ability to set rel attributes so we're looking at the URI as a "proxy" representation that we can resolve before rendering to XHTML (+RDFa) (see also: This is important as many academic authors don't write articles in HTML editing tools - nor do they do much investigating into how RDFa mechanisms works.

    3. Norm Walsh's tool and FOAF-a-matic are good examples of what I'm suggesting. It's just a basic/early start on tools that help generate LinkedData without getting in the way of authoring. I would agree that it's still cumbersome but at least we could remove the issue of selecting the correct schemas for well-known structures (such as journal articles).

    Thanks for the feedback!