Thursday, April 16, 2009

Easy semantic linking for authors

I've been playing with RDF a bit lately to see what I can make of it in terms of practical applications. The first hurdle is the rather long specs. Now, I won't pretend I'm someone that can pick up a spec and read it cover to cover. I like to play with some code as I read so that I can sort it in my head. So, as part of my reading I put together semanto - it's wrong in a couple of ways and generally basic but it's my live learning.

This got me thinking about people that don't want to read the W3 specs and hunt for schema that suits their needs. Peter Sefton discussed a method for authors to embed a triple into a document's link. Once the article is completed, the publisher can pass the document to a system that will turn these links into RDF/RDFa and output a webpage.

As he's my boss, I tend to agree with Peter. Actually, no, I tend to agree with the idea as it provides part of an "easy in" for authors.

Having played with the various RDF stuff out there, I can see that an essential part of the "easy in" is to remove the chase for RDF schemas. Bascially, I want to author something and then have an easy to use UI for classifying the information. If that system can provide me standard predicates for my items then I don't really need to think too much about semantics.

To base my thoughts on this workflow:
  1. Do research
  2. Write article
  3. Indicate document predicates/objects
  4. (Maybe) Determine other predicates/objects
  5. Publish
Steps 1 & 2 are really in your court (though you may want to keep an eye on The Fascinator).

I pick up Peter's idea in step 3. You can go through your document and add links to useful information. For example, you can assert that "Jim Smith" is a dc:creator and the dc:title is "My Weekend" etc. In Peter's model, these all appear as hyperlinks. You could even highlight the abstract and create a dc:description link. It'd be ugly and (from my experience unwieldy) but it is possible and it is cross app. You could even do some fancy grouping *.

What Step 3 needs is a predefined set of terms for you to plug into. For example, we would cherry pick the various schema elements and provide those best suited to the work being produced. You could base this in an eprints-style workflow:



What sort of publication are you describing?




... then we present the usual



The following properties are available for an article:








From that session we could produce an RDF document for the article using Dublin Core and the Bibliographic Ontology. The user will get a generated RDF file that has all the info and no need for them to work out which namespaces/schemas are the most appropriate. This isn't new - it's a little like the FOAF-a-matic.

We could also provide an interface with something like



What are you describing?




The system can then spit out rdf triples or a link for Peter's word processor. What matters here is that, again, the author can be largely unaware of the underlying rdf complexities.

This last point leads to Step 4, in which we could throw the article at a system like OpenCalais to find content/metadata in the article that may be worth describing in RDF/RDFa. The author can select/deselect elements as they deem sensible and those that remain are either linked via RDFa or put into the associated RDF file.

Now, all I need is to find the time to try this out....

* Not being completely across the spec, RDFa does seem to be limited in terms of some aspects of academic publishing. The issue of author order comes to mind. Using the basic RDFa examples, I link the authors but can't contain them ala an RDF:Seq. This is discussed in RDFa Containers and solvable - even in word processors as they have (un)ordered lists....

Tuesday, April 14, 2009

Attempt 1: URLs with semantics

Having read Peter Sefton's Journal 2.0 post, I thought I'd have a play and create a basic URL encoder for such information. The result is the semant-o-matic and it's basic but a start.

Excuse the poor formatting - I just wanted to have a play with some code (I get only rare chances).

Monday, April 6, 2009

Accessing the Personal Knowledge Network

I get my RSS/ATOM feeds through Google Reader and can't always get to reading *everything*. This is where the search tool is such an excellent component. Like GMail, Reader allows those of us who don't tag everything to recall articles and posts that we glanced but didn't tag/store/zotero etc.

Working on The Fascinator has made me start to think where my pool of "knowledge" comes from. Naturally, there are a few things in my head but I really rely on my various data sources to form my personal knowledge network.

Whilst The Fascinator desktop edition will scan sections of my drive for things that I've saved, I often don't save articles and posts to my drive. If I know I want to keep something, I put it into my poorly organised Zotero library. Otherwise, I might tag it via Delicious. If it's a blog post I sometimes tag or star it but usually I am happy to know that it's somewhere in that mess of posts.

So, based on this, I think that The Fascinator would benefit from allowing this personal knowledge network to be aggregated - even at only the search level. This would mean that we can allow users to access their full network and tag/comment/associate across it.

My initial targets are selfish ones - Zotero, Delicious and Google Reader.

Thursday, April 2, 2009

Scholary HTML and Article 2.0

I wanted to respond to Peter Sefton's blog about Scholarly HTML in light of the Article 2.0 competition winners so, instead of doing it here, I posted a log-winded response on Peter's blog.