Wednesday, January 28, 2009

EPrints: Random EPrint

Having an eprint of the day on your IR site is a useful tool. Authors like to see their work come up and it gives people something new to look at. I created a basic script to select a full-text eprint and either redirect the browser or create a citation. The redirection would be useful if you wanted a link along the lines "Find a random eprint" but this maybe isn't overly useful. By outputting a citation you can embed the information into a web page. We needed two types of output: one creates a citation as a snippet of html for use in web pages. The other outputs the citation within the archive's template.

So, the code below lives in a file called "random" in the cgi folder. You have some options:

  • http://myeprints.org/cgi/random: Displays the random eprint within the archive template

  • http://myeprints.org/cgi/random?insert=1: Displays an HTML snippet

  • http://myeprints.org/cgi/random?redirect=1: Redirects the browser to the full abstract of a random eprint



You may also notice that, in the code, I search eprints for items with public full text.

Code


Now, here's that code:


######################################################################
#
# Returns a random eprint
#
######################################################################

use EPrints;

use strict;
use Data::Dumper;

my $session = new EPrints::Session;
exit(0) unless ( defined $session );

#load the archive data set
my $ds = $session->get_repository->get_dataset("archive");

my $searchexp = EPrints::Search->new(
satisfy_all => 1,
session => $session,
dataset => $ds,
);
$searchexp->add_field( $ds->get_field("full_text_status"), 'public' );
my $results = $searchexp->perform_search;
my $offset = rand int( $results->count );

my @ids = @{ $searchexp->get_ids };
$searchexp->dispose;

if ( $session->param("redirect") == 1 ) {
$session->redirect( "/" . $ids[$offset] );
exit;
}

#prepare a citation string
my $ep = EPrints::DataObj::EPrint->new( $session, $ids[$offset] );
my $citation = $ep->render_citation_link("default");

if ( $session->param("insert") eq '1' ) {
$session->send_http_header( content_type => "text/plain" );
print $citation->toString;
}
else {

#Build a display page
my $title = $session->html_phrase("cgi/random:title");
my $page = $session->make_doc_fragment();
$page->appendChild($citation);
$session->build_page( $title, $page, "latest" );
$session->send_page();
}

$session->terminate();
exit;

How?


So, how can you use this? Well, we could have setup a cron job to wget a random citation snippet. One of the library pages accessed this via JSP and inserted it into their page.

However, we also wanted the eprint of the day on our eprints home page. After a few thoughts on the best way to do this, I settled with phrases. So, in the code below, I request a random eprint in HTML snippet and do 2 things. Firstly (easily) I output this to a text file that can be grabbed over the web. Secondly, I create a phrase file with the citation in it. I can then use this phrase in any xpage with <epc:phrase ref="eprint_of_the_day" />

This is the script that does the job:

#!/usr/bin/perl -w
use LWP::UserAgent;
use HTTP::Request;

$ua = LWP::UserAgent->new( env_proxy => 1, keep_alive => 1, timeout => 30, );

my $response = $ua->request(
HTTP::Request->new( 'GET', 'http://my.eprints/cgi/random?insert=1' ) );

open( PHRASE_FILE,
">/usr/local/eprints/archives/myeprints/cfg/lang/en/phrases/eprint_of_the_day.xml"
);

open( INCLUDE_FILE,
">/usr/local/eprints/archives/myeprints/cfg/lang/en/static/random.txt" );

print PHRASE_FILE "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<!DOCTYPE phrases SYSTEM \"entities.dtd\">
<epp:phrases xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:epp=\"http://eprints.org/ep3/phrase\"
xmlns:epc=\"http://eprints.org/ep3/control\">
<epp:phrase id=\"eprint_of_the_day\">"
;

print PHRASE_FILE $response->content;
print INCLUDE_FILE $response->content;

print PHRASE_FILE "</epp:phrase></epp:phrases>";

close(PHRASE_FILE);
close(INCLUDE_FILE);

`su - eprints -c '/usr/local/eprints/bin/generate_static myeprints'`;

print $response->content;






syntax highlighted by Code2HTML, v. 0.9.1

Tuesday, January 27, 2009

Eprints: Context sensitive labels and help

As a part of the QUT ePrints upgrade we held several meetings to discuss the submission workflow. As we were integrating the data with other systems we had to make sure that those undertaking the data entry (researchers, admin officers) wouldn't be thrown off by the language. One thing that came up was the need to have fields given a diferent label based on the resource type. For example, a book has an author but a painting may have an artist. You don't want to create an extra field because the data is the same, it's just the human interface that needs to be flexible. Likewise, the help text should change based on resource type.

So, the idea was to provide a sub-phrase within the phrases file. The system would default to a base phrase if none was found.

This would mean that we can have the following in our phrases file:


<epp:phrase id="eprint_fieldname_volume">Volume</epp:phrase>

<epp:phrase id="eprint_fieldname_volume#book">Series Volume</epp:phrase>

<epp:phrase id="eprint_fieldhelp_volume">
Enter the volume number of the journal or series in which your item appeared. Please just use the number, do not include text such as "vol".
</epp:phrase>
<epp:phrase id="eprint_fieldhelp_volume#book">
If this book is a part of a series, please provide the volume number here. Please just use the number, do not include text such as "vol".
</epp:phrase>

As I hope you can see, if you're entering the information for a book, you get a contextual label and help. Any other resource types that use the volume field will fall back to the default text.

As usual, when the doco falls short, I hit the mailing list. This can go one of two ways. You can get a good answer or you can get ignored. My question got a discussion going and a solution was reached. My final message can be read here.

The biggest difficulty we faced was working out which objects were available from within the given piece of code...

So, in perl_lib/EPrints/MetaField.pm, I changed the render_name and render_help to check for a resource type specific phrase. I've used the hash (#) to denote the separation but you could change this.

sub render_name {
my ( $self, $session ) = @_;

if ( defined $self->{title_xhtml} ) {

return $self->{title_xhtml};
}
my $phrasename = $self->{confid} . "_fieldname_" . $self->{name};

# START: Changes made to provide context sensitive names
if ( defined $session->{query} ) {

my $eprintid = $session->{query}->{eprintid}->[0];

if ( $eprintid eq "" ) {

$eprintid = $session->{query}->{param}->{eprintid}->[0];
}

my $ep = EPrints::DataObj::EPrint->new( $session, $eprintid );

if ($ep) {

my $eptype = $ep->get_type;
$phrasename .= "#$eptype"
if $session->get_lang->has_phrase("$phrasename#$eptype");
}

}

# END: Changes made to provide context sensitive names

return $session->html_phrase($phrasename);
}

sub render_help {
my ( $self, $session ) = @_;

if ( defined $self->{help_xhtml} ) {

return $self->{help_xhtml};
}
my $phrasename = $self->{confid} . "_fieldhelp_" . $self->{name};

# START: Changes made to provide context sensitive help
my $eprintid = $session->{query}->{eprintid}->[0];

if ( $eprintid eq "" ) {

$eprintid = $session->{query}->{param}->{eprintid}->[0];
}

my $ep = EPrints::DataObj::EPrint->new( $session, $eprintid );

if ($ep) {

my $eptype = $ep->get_type;
$phrasename .= "#$eptype"
if $session->get_lang->has_phrase("$phrasename#$eptype");
}

# END: Changes made to provide context sensitive help

return $session->html_phrase($phrasename);
}





syntax highlighted by Code2HTML, v. 0.9.1

The code base modified was 3.1.1.

Naturally, no warranty is offered.

Thursday, January 22, 2009

Omeka Man

Just been checking out Omeka. Very interesting. Installing it now so will tr and post my thoughts soonish

Tuesday, January 13, 2009

Upgrading QUT ePrints

One of the main reasons I wanted to start this blog was to document the work I recently completed in upgrading the QUT ePrints system. I did not do this alone so will state outright that there was a team of Librarians, a couple of developers and other QUT staff. I won't name them here for privacy reasons.

So, this is my attempt to feed back to the eprints community with some code and discussion about the work.

What is QUT ePrints?
QUT operates its institutional repository out of the Library. There is high-level support for the repository - not just rhetoric. The insitution alo has some key people - one of whom lives and breathes this stuff - they're fun to work with :)

Chances are that if you don't know what ePrints is then you have moved away from this page for now. QUT was a member of the now defunct ARROW group and bought into the VITAL software. However, due to a variety of reasons, we chose to upgrade from eprints v2 to v3. I won't go into the whys...

What we did
A fair bit... Whilst the eprints team provides a complete out of the box solution, the beauty of its open sourciness is that we could adapt it to QUT and Australian higher-ed requirements.

So the main items we produced were:
  • Integrate QUT's ESOE infrastructure for single sign on
  • Transfer the data from ePrints 2 to a new server running ePrints 3
  • Bring over the ADT theses to be served from our IR
  • Expand the metadata to capture data for the HERDC collection - this could reduce a fair bit of work for researchers
  • ... and some general user interface work
The data data transfer gave us a chance to normalise our metadata - the system had grown "organically" over several years so needed a little landscaping
What we didn't do
Well, there were a few things dropped along the way:
  • Oracle integration: We'd thought about using the university's corporate Oracle infrastructure but, after battling with the rather new database layer, killed the idea off so as to meet the deadline.
  • We'd really wanted to link into QUT's Identity system but it wasn't ready yet. This would give us a name authority that could ensure that systems with which we share data were all on the same page
On that last point, I received a bit of flack as I was talking of using a QUT-local name authority. I tended to disagree with people on this one. For one, we actually had a name authority at QUT (though the SOAP interface was delayed). The NLA have the People Australia work that would meet our needs but it sin't there yet. There was talk of other projects but it all started to sound rather overly crowded in terms of scope. Secondly, the SOA interface could easily provide any info needed once given that unique QUT ID - we all know that one ID won't be enough, especially for the public sector.
Moving on
Well, enough of a ramble. I'll start posting code and overviews shortly.

Obligatory first post

I guess there has to be a first post and this is it.

Well, ok then