See SALT – a demo

A further set of sample data from JRUL, comprising 100,000 loan transactions this time, has been processed and used to test a prototype web API.  Signs are encouraging.

The process begins with data being extracted from the Talis library management system (LMS) at JRUL in CSV format.  This data is parsed by a PHP script which separates the data into two tables in a MySQL database, the bibliographic details describing an item go into a table called items and the loan specific data, including borrower ID, goes into a table called, you’ve guessed it, loans.  A further PHP script then processes the data into two additional MySQL tables, nloans and nborrowers; nloans contains the total number of times each item has been borrowed, and nborrowers contains, for each combination of two items, a count of the unique number of library users to have borrowed both items.

With the above steps complete, additional processing is performed on demand by the web API.  When called for a given item, say item_1, the API returns a list of items for suggested reading, where this list is derived as follows.  From the nborrowers table a list of items is compiled from all combinations featuring item_1.  For each item in this list the number of unique borrowers, from the nborrowers table, is divided by the total number of loans for that item, from the nloans table, following the logic used by Dave Pattern at the University of Huddersfield.  The resulting values are ranked in descending order and the details associated with each suggested item are returned by the API.

For a bit of light relief here’s an image.

A screenshot of a demonstrator for SALT.

This is a screenshot from a piece of code written to demonstrate the web API.  For a given item, identified by the ISBN, the details are retrieved from the items table in the MySQL database and displayed in [A].  An asynchronous call is made to the web API that accepts the ISBN as a parameter, along with threshold and format values which are set using the controls in [B]; threshold is the minimum number of unique borrowers that any given combination of items must have to be considered, and format specifies how the returned data is required (either xml or json).  Results from the web API are displayed in [C], with the actual output from the API reproduced in [D].  Note that all available results are returned by the API but the test code only shows the number set by the third control in [B].

The exact format of the output is yet to be ratified but the API is in a state where it can now be incorporated into prototype interfaces at JRUL and in COPAC.  In addition the remaining 3 million or so loan transactions from JRUL will be loaded and processed in readiness for user testing.

What do the library users think?

As the SALT project and the Activity Data programme progresses, I’m finding the results of the various user engagement exercises really interesting.  As Janine’s already mentioned, we’re planning a structured user evaluation of our recommender tool with subject librarians and researchers, but before that we wanted to talk to some students to test some of our assumptions and understand library users experiences a little better.

So, last week I took myself off to the JRUL and interviewed four students (three postgraduates and one undergraduate).  In the main, I was (of course) interested in their opinions about recommenders, and whether they would find such a tool useful in the JRUL library catalogue and in Copac.  There is a lot of evidence to suggest that researchers would find the introduction of a recommender beneficial (not least from the other blogs on this programme), but what would the Manchester students and researchers think?  I was also interested in their research behaviour – did they always know exactly what they were looking for, or did they do subject and keyword searches?  And finally, I wanted to sound them out about privacy.

So what did they tell me?

On recommendations

There was varied use of recommenders through services like Amazon, but all of the students could see the potential of getting recommendations through the library catalogue, Copac, and eventually through the Library Search (powered by Primo).  There were some concerns about distractions, with one student worried that she would spend her limited research time following a never ending cycle of recommendations that took her further and further away from her original purpose.  However, the biggest concerns from all four was the possibility of irrelevant material being pushed to them – something that they would all find frustrating.  A recommender could certainly help to widen reading choices, but all of them wanted to know how we were going to make sure that the suggestions were relevant.  I noticed that the postgraduate focus group participants in the Rise focus groups needed to trust the information, and were interested to know where the recommendation has come from.  It’s clear that trust is a big issue, and this is something we’ll definitely be re-visiting when we run the user evaluation workshops.

On research behaviour

On the whole, the participants knew what they were looking for when they opened the catalogue, and suggestions of material came from the usual suspects – supervisors, tutors, citations, or specific authors they needed to read.  All of them felt that recommendations would be interesting and especially useful during extended research projects such as dissertations.  However, what was most interesting to me was that, although they all said they would be interested to look at the suggestions, they all seemed unconvinced they would actually borrow the recommended books because they, on the whole, visited the catalogue in order to find specific items.  So what does this mean for our hypothesis – that using circulation data can support research by surfacing underused library materials?  These students didn’t have the opportunity to try the recommender, so you could argue that some scepticism is inevitable, and Hudderfield’s experience suggests that underused books will resurface.  However, again we need to explore this further once we can show some students a working prototype.

On privacy

I wasn’t sure whether privacy would be an issue, but none of the students I spoke to had any concerns about the library collecting circulation data and using it to power recommendations.  They considered this to be a good use of the data, as long as anonymity was taken into consideration.  On the whole, the students’ responses backed up the findings of the 2008 Developing Personalisation for the Information Environment Final Report, which found that “students place their trust in their college or university. They either think that they have no alternative to disclosing information to their institution, or believe that the institution will not misuse the information.”  They felt that, by introducing a recommender, the library was doing “a good thing” by trying to improve their search experience.  No concerns here.

Next Steps

Obviously, this was only the views of four students, and we need to do more work to test the usefulness of the tool.  We’re now planning the user testing and evaluation of the recommender prototype, and recruiting postgraduate humanities researchers to take part.  As Janine outlined, we’ll be introducing the tool to subject librarians at JRUL and humanities researchers to see if the recommendations are meaningful and useful.

I’m looking forward to finding out what they think, and we’ll let you know the results in a later blog post.