Friday, November 20, 2009

Fascinator 0.3.0 and 2009 in review

Well, Linda has just released Version 0.3.0 of The Fascinator and I thought I'd take the opportunity to sit back and reflect on our work in 2009.

So, to keep it short, here's my Top 5 "big things" from our Fascinator work this year:

1. Complete Fascinator redesign
We took a look at The Fascinator as it had been built for the ARROW project and started to reconceptualise it as a desktop system. Our work has lead us to take on a plugin notion throughout the system and this gives us the flexibility of plugging in different harvesters, storage layers and transformers.

Maybe calling it a desktop eResearch system is a misnomer. We are certainly working to ensure that it runs on the desktop in a friendly manner but it should also scale so that it can be the faculty eResearch system, the institutional eResearch system and so on. eResearch - isn't it all about data and collaboration?

2. Advancing our Agile approach
The team (esp. Linda) has been working to fulfill more of the Agile approach. We're getting a lot better at scoping specific releases and actually releasing them. We're also finding a good balance in our documentation and knowledge sharing, with the idea that all the developers can work across the system.

The hope now is to draw a line at some point near Easter and get The Fascinator 2.0 out there.

3. Embracing Maven

Oliver had utilised Maven in the original Fascinator and we continued this work. It's quite a learning curve but investment in time has meant that we can easily get instances of TF up and running and (hopefully) be more open to external developers wanting to take a crack at the code. Complementing this is our Nexus repository which allows us to manage dependencies without chunks of doco and frustration.

This aspect is still developing and we hope to have a Continuous Integration service (Hudson) up and running in the new year. This will allow us to release daily snapshots, keep a live Maven site constantly updated and allow for Darth Tater to be quickly relocated to anyone that breaks the build.

4. Working with RDF
This proved a steep learning curve. We picked up RDF2Go and RDFReactor on the back of implementing the Aperture system. As we started to develop new harvesters and indexing rules we found the need to read/write RDF. I even got in and developed an RDF Reactor plugin for Eclipse with the hope of easing the development of my long overdue feed reader plugin.

The area of RDF development still has a long way to go before it matches the abstraction provided to RDBMS developers.

We're also getting a grip on SPARQL and may even have a triple store running at the back of TF sometime in 2010.

5. Presenting Fascinator
Well, this was big for me - I presented at eResearch Australasia 2009 and had a room full of people.

I'm off to the UK for the DCC and IEEE eScience conferences in the first half of December. I'm looking forward to meeting a raft of new people and (hopefully) showing off our work over a pint.

Importantly, it's not just me or Peter working on this stuff - Bron, Linda, Oliver, Ron and Cynthia have been developing, designing, planning and dealing with our flights of fancy (including harried IMs from conferences). Thanks guys!

Monday, November 16, 2009

eResearch Australasia 2009

Well, it was a busy week last week as Peter Sefton and I attended eResearch Australasia 2009. This post represents my report on the main items of interest for our development work - it's not a blow by blow account.

I'm told that slides and video will be available online shortly.

Monday (Workshops)
Two workshops on Monday. The first was "Tools and Technologies for the Social Sciences and Humanities" and focussed on the ASSDA (Australian Social Science Data Archive). Main points of interest were:
  • ASSDA is working to incorporate more qualitative research artefacts
  • Work is being done to provide quantitative analysis tools via the Nesstar tool
  • The Historical Census and Colonial Data Archive discussed the difficulty of digitising older texts (in this case fiche) - a fair bit of manual work is involved, esp. for tabular data. This work was outsourced to India.
The afternoon saw me trying out R, a data mining package. It was interesting, if a little out of my normal mode of operation.

Tuesday (Conference)
The presentation on the Black Loyalist repository was an interesting look at a project that took historical documents and attempted to map the lives of little known slaves in the US. Of interest to me was the user interface which provides timeline, map and network visualisations that help you discover an individual's movements and relationships. This is backed up through links to the original source. Furthermore, the project team is working to crowd source the project by allowing others to comment and contribute to the project. Behind the scenes is the timeline from Simile widgets and I'm not sure where the network map is from.

Mitchell Whitelaw's visualisations of archival datasets was very interesting. Of note was the A1 Explorer which provides a tag cloud that would be really interesting to see within the Fascinator. See http://visiblearchive.blogspot.com/

I presented The Fascinator and that seemed to go well. I really feel that we're working on "new" stuff here and was encouraged by people's interest in the project and the various technologies we've been utilising.

Wednesday (Conference)
I attended Anne Cregan's introduction to Linked Open Data in the morning and a BOF by Peter Sefton, Anna Gerber and Peter Murray-Rust on the same topic in the afternoon. Peter describes the BOF in his blog. I'll only add that the W3C's Media Fragments work was mentioned and this looks to provide a method for linking to video segments. I haven't looked into this standard (yet) and interested as to how it relates to SMIL.

Rob Chenrich's presentation on the Atlas of Australia was a good look at Danno, an RDF-based annotation server for text and images. It's completely browser-based and I'm really interested in setting this system up on my PC and annotate my local Fascinator. Now, if we could annotate media fragments....

Thursday (Conference)
The sessions where generally informative but not specifically related to our work. I did enjoy the text mining session by Calum Robertson. With the NLA putting newspapers online it would be interesting to mine old news to find emerging patterns. Specifically for The Fascinator, text mining could bring out patterns in content such as interviews.

Friday (Workshop)
This was an eResearch Project Management session that covered a lot of stuff in a few hours. It was a generally OK session but a lot of our work is of a size that I feel a weighty PM approach would slow us down. I'm a big fan of our work in the Maven space and out on-going work to refine our development practise. I can see the need to scope our projects to a reasonably formal level but, beyond that, PM starts to dominate the actual work.

Wednesday, November 11, 2009

The Fascinator @ eResearch Australasia

Well, I gave my first substantial conference presentation today and, although I felt really nervous, I'm told it didn't show. The presentation ran for 15 minutes with 5 on top for questions.

I was asked 2 questions - one by Andrew Treloar on the design and another by Jim Richardson about tagging. Andrew asked about the code model and I thought I'd add some extra clarification: the whole system is a plugin model so already runs as components. What it lacks, however, is an asynchronous, parallel form of communication. So, when you harvest a file, the system will transform and index in serial and then look at the next file. What we're wanting to do is move towards a message queue system (e.g. Rabbit MQ or Apache's MQ) that allows the system to break up and spread things like transformation. This is very useful when you hit a 1Gig video that you want to transform to flv. Time, however, is always challenging us...

Jim's question was handy as I had forgotten to show off the tagging system. We're using the CommonTag schema (http://commontag.org/) so can point to endpoints. We're currently creating a user endpoint using their email as the URI but hope to have you linking to ontologies and places like dbPedia soon(ish).

On the tagging front, I'd like to see us build an ontology/taxonomy/thesearus builder. This may be based on SKOS and will allow the user to create their own thesaurus. For example, in our current work, Leonie could create a list of participants for use in tags. Peter's also interested in hierarchical tagging (e.g. people/duncan) that doesn't require you to define anything formally. With this data we could create at least a basic SKOS for the user at publication time.

At some point in the near future (once it's been cleared) you'll be able to check out the slides via USQ ePrints: http://eprints.usq.edu.au/6090.