Evaluating the recommender – undergraduate focus groups

We held more focus groups over the summer holidays, which is kind of tricky when your audience for testing are undergraduates!  Not many are left on campus during the summer months, but we did manage to find some undergraduates still around the Manchester Metropolitan University (MMU) campus.  This might be a reflection of the heavy rainfall we’ve been having in Manchester this summer.

We chose MMU as we wanted to test the recommender on undergraduates who didn’t already have a recommender on their own library catalogue, as they might complete the tests as a comparison.   We spoke to 11 undergraduates in total and they tested the recommender 42 times between them.

The Results

The students were positive about the concept of the book recommender and were keen to use this as another tool in their armoury to find out about new resources.  A key bonus to them was the lack of input the recommender needed in order to gain maximum output. To a time poor, pressured undergraduate this is a huge plus point.

‘Yeah, I would use it, I don’t have to do anything’

‘I would always look at it if it was on the MMU library catalogue’.

The recommender also offered an alternative source of materials to the ubiquitous reading list.  This is absolutely crucial because it quickly became apparent that our participants struggled to find resources;

‘I go off the reading list, all those books are checked out anyway’

‘I’ve used Google scholar when I’ve had nowhere else to go but it returned stuff I couldn’t get hold of and that just frustrated me’.

So in theory it offered more substance to their reading lists.  The additional books it found were more likely to be in the library and came with the advantage that they were suggested because of student borrowing patterns.  Our respondents liked having this insider knowledge about what their peers have read in previous years.

‘It would be useful as I had to read a book for a topic area and when we got the topic area there were already 25 reservations on the book, so if I could click on something and see what the person who did this last year read, that would be very useful’.

Testing the prototype

In testing it proved difficult to conclude if the recommender was useful or not, as some testers seemed to have more luck then others in finding resources that were useful to them. Obviously within the data collection method some margin of error needs to be accounted for.

Of course, you could argue that whether a book is useful or not is a highly subjective decision.  One person’s wildcard might be another’s valuable and rare find, and whereas one tester might be searching for similar books others might be looking for tangentially linked books.   As an example of this, in our group, History students wanted old texts and Law students wanted new ones.

Positively, 91.4 % of the recommendations looked useful and only 3 searches returned nothing at all of any use to the user. 88.6% of searches generated at least one item that the user wanted to borrow and only 4 searches resulted in not a single thing that the user would borrow. Even with a deviation due to subjectivity these are compelling results. As the recommender requires the user not to submit anything substantial in order to get results, a low percentage returning nothing is acceptable to all users we interviewed.

Privacy concerns?

As in previous research, none of the undergraduates attending the focus groups expressed any concern in respect of privacy issues and they understood that the library collected circulation data and the results by the book recommender are generated by that circulation data.

‘I would benefit from everyone else’s borrowing as they are benefitting from mine, so I haven’t got a problem’.

‘It would be nice to be informed about it and given the option to opt out, but I don’t have a problem with it. No.’

Although a ‘nice to be asked’ was expressed by more than one attendee, they wouldn’t want this to delay the development of the book recommender.

In conclusion, the time poor, pressured student struggling to find books off the reading list still in the library would welcome another way of finding precious resources. The majority of students in our groups would use the recommender and, although some recommendations are better than others, they would be willing to forgive this if it just gave them one more resource, when the coursework deadline looms!

User Feedback Results – Super 8

In an effort to find the magic number the SALT team opened its testing labs again this week.  Another 6 University of Manchester post graduate students spent the afternoon interrogating the Copac and John Rylands library catalogues to evaluate the recommendations thrown back by the SALT API.

With searches ranging from ’The Archaeology of Islam in Sub Saharan Africa’ to ‘Volunteering and Society: Principles and Practice’ no aspect of the Arts and Humanities was left unturned, or at least it felt that way.  We tried to find students with diverse interests within Arts and Humanities to test the recommendations from as many angles as possible.  Using the same format as the previous groups (documented in our earlier blog post ‘What do users think of the SALT recommender?), the library users were asked to complete an evaluation of the recommendations they were given.  Previously the users tested SALT when the threshold was set at 3(that is 3 people borrowed the book which therefore made it eligible to be thrown back as a recommendation), however we felt that the results could be improved.  Previously, although 77.5% found at least one recommendation useful, too many recommendations were rated as ’not that useful’,(see the charts in ‘What do users think of the SALT recommender?’).

This time, we set the threshold at 15 in the John Rylands library catalogue and 8 in Copac.  Like the LIDP team at Huddersfield, (http://library.hud.ac.uk/blogs/projects/lidp/2011/08/30/focus-group-analysis/), we have a lot of data to work with now, and we’d like to spend some more time interrogating the results to find out whether clear patterns emerge.  Although, our initial analysis has also raised some further questions, it’s also revealed some interesting and encouraging results.  Here are the highlights of what we found out.

The Results

On initial inspection the JRUL with its threshold of 15 improved on previous results;

Do any of the recommendations look useful:

92.3 % of the searches returned at least one item the user thought was useful, however when the user was asked if they would borrow at least one item only 56.2% answered that they would.

When asked, a lot of the users stated that they knew the book and so wouldn’t need to borrow it again, or that although the book was useful, their area of research was so niche that it wasn’t specifically useful to them but they would deem it as ‘useful’ to others in their field.

One of the key factors which came up in the discussions with users was the year that the book had been published. The majority of researchers are in need of up to date material, many preferring the use of journals rather than monographs, and this was taken into account when deciding whether a book is worth borrowing. Many users wouldn’t borrow anything more than 10 years old;

‘Three of the recommendations are ‘out of date’ 1957, 1961, 1964 as such I would immediately discount them from my search’ 30/08/11 University of Manchester, Postgraduate, Arts and Humanities, SALT testing group.

So the book could be a key text, and ‘useful’ but it wouldn’t necessarily be borrowed.  Quite often, one user explained, rather than reading a key text she would search for journal articles about the key text, to get up to date discussion and analysis about it. This has an impact on our hypothesis which is to discover the long tail. Quite often the long tail that is discovered includes older texts, which some users discount.

Copac, with a threshold of 8 was also tested. Results here were encouraging;

Do any of the recommendations look useful;

Admittedly further tests would need to be done on both thresholds as the number of searches conducted (25) do not give enough results to draw concrete conclusions from but it does seem as if the results are vastly improved on increase of the threshold.

No concerns about privacy

The issue of privacy was raised again. Many of the postgraduate students are studying niche areas and seemed to understand how this could affect them should the recommendations be attributed back to them. However, as much as they were concerned about their research being followed, they were also keen to use the tool themselves and so their concerns were outweighed by the perceived benefits. As a group they agreed that a borrowing rate of 5 would offer them enough protection whilst still returning interesting results. The group had no concerns about the way in which the data was being used and indeed trusted the libraries to collect this data and use it in such a productive way.

‘It’s not as if it is being used for commercial gain, then what is the issue?’ 30/08/11 University of Manchester, Postgraduate, Arts and Humanities, SALT testing group.

Unanimous support for the recommender

The most encouraging outcome from the group was the uniform support for the book recommender. Every person in the group agreed that the principle of the book recommender was a good one, and they gave their resolute approval that their data was collected and used in a positive way.

All of them would use the book recommender if it was available. Indeed one researcher asked, ‘can we have it now?’

Janine Rigby and Lisa Charnock 31/08/11

What do the users think of the SALT recommender?

Following internal in house testing the recommender was open to the users. In the last week of July 18 humanities postgraduates passed through the SALT testing labs, (11 PhD students, 3 taught Masters students and 4 research students). Lisa and I held three focus groups and grilled our potential users about the SALT recommender. The research methods used were designed to answer our objectives, with an informal discussion to begin with to find out how postgraduate students approach library research and to gauge the potential support for the book recommender. Following the discussion we began testing the actual recommender to answer our other research objectives which were:

  • Does SALT give you recommendations which are logical and useful?
  • Does it make you borrow more library books?
  • Does it suggest to you books and materials you may not have known about but are useful and interesting?

As a team we agreed to set the threshold of the SALT recommender deliberately low, with a view to increasing this and testing again if results were not good. As our hypothesis is based on discovering the hidden long tail of library research we wanted the recommender to return results that were unexpected – research gems that were treasured and worthy items but had somehow been lost and only borrowed a few times.

42 searches in total were done on the SALT recommender and of those 42, 77.5% returned at least one recommendation, (usually many more) that participants said would be useful. (As an aside, one of the focus groups participants found something so relevant she immediately went to borrow it after the group has finished!)

However the deliberately low threshold may have caused some illogical returns.  The groups were asked to comment on the relevance of the first 5 recommendations, but quite often it was the books further down the list that were of more relevance and interest.  One respondent referred to it as a ‘Curate’s egg’ however, assured me this was in reference to some good and some bad. His first five were of little relevance, ‘only tangentially linked’, his 6th, 7th, 8th, 9th, 11th and even 17th recommendations were all ‘very relevant’. Unfortunately this gave disappointing results when the first 5 suggested texts were rated for relevance, as demonstrated in the pie chart below.

However the likelihood of borrowing these items gave slightly more encouraging results;

Clearly we’ve been keen on the threshold count.  Lessons need to be learnt about the threshold number and this perhaps is a reflection of our initial hypothesis. We think that there would be much merit in increasing the threshold number and retesting.

On a positive note, initial discussions with the researchers (and just a reminder these are seasoned researchers, experts in their chosen fields familiar and long term users of the John Ryland’s University Research Library) told us that the recommender would be a welcome addition to Copac and the library catalogue. 99% of the researchers in the groups had used and were familiar with Amazons recommender function and 100% would welcome a similar function on the catalogues based on circulation records.

Another very pertinent point, and I cannot stress this strongly enough, was the reactions expressed in regards to privacy and collection and subsequent use of this data. The groups were slightly bemused by questions regarding privacy. No one expressed any concern about the collection of activity data and its use in the recommender. In fact most assumed this data was collected anyway and encouraged us to use it in this way, as ultimately it is being used to develop a tool which helps them to research more effectively and efficiently.

Overwhelmingly, the groups found the recommender useful. They were keen that their comments be fed back to developers and that work should continue on the recommender to get the results right as they were keen to use it and hoped it would be available soon.

But is this what the users want?

I’ll admit it, I’m prepared to out myself, I’ve just finished a post graduate research degree and more than once I have used the Amazon book recommender. In fact when I say more than once, possibly over the course of my studies we’ll be getting into double figures. I’m not ashamed, (I may be about using Wikipedia, but let’s not go there), but I’m not ashamed because I did and so did many of my peers. There may be more traditional methods to conduct academic research, but sometimes, with a deadline looming and very little time for a physical trip to the library to speak to a librarian, finding resources in one or two clicks is just to attractive. My hunch is many other scholars also use this method to conduct research. Recently on another Copac project we facilitated some focus groups. The participants in the groups were postgraduate researchers, a mix of humanities and STEM. Some had used Copac before others had not. Although the focus groups were answering another hypothesis I couldn’t resist asking the gathered group, if they would find merit in a book recommender on Copac which was based on 10 years of library circulation data from a world class research library? It’s not often you see a group of students become visibly excited at the thought of a of new research tool, but they did that night. A book recommender, would make a positive impact on their research practices and was greeted with enthusiasm from the group. I thought it was worth mentioning this incident, because when the going gets tough, and we are drowning under data, it might be worth remembering that users really want this to happen.


Working through the SALT hypothesis

I’m currently project managing, SALT, but my own area of interest is evaluation and  user behaviour – So I’m going to be taking on an active role in putting what we develop in front of the right users (we’re thinking academics here at the University) to see what their reactions might be.  As I think this over, a number of questions and issues come to mind. Are we more likely to look on things favourably if they are recommended by a friend? If we think about what music we listen to, films we go and see, TV we watch and books we read, are we far more likely to do any of those things should we receive a recommendation from someone we trust, or someone we know likes the same things that we like? If you think the answer to this is yes, then is there any reason that we wouldn’t do the same thing should a colleague or peer recommend a book to us that would help us in our research? In fact more so? Going to see a film that a friend recommends that is, well average, it has far less lasting consequences then completing a dissertation that fails to acknowledge some key texts. As a researcher would you value a service which could suggest to you other books which relate to the books you’ve just searched for in your library?

We know library users very rarely take out one book. Researchers borrowing library books tend to search for them centrifugally, one book leads to another, as they dig deeper into the subject area, finding rarer items and more niche materials. So if those materials have been of use to them, could they not also be of use to other people researching in the same area? The University of Manchester’s library is stocked with rare and niche collections, but are they turning up within traditional searching, or are they hidden down at that long end of the tail? By recommending books to humanities researchers that other humanities researchers have borrowed from the library I’m really hoping we can help improve the quality of research – we know that solid research means going beyond the prescribed reading list, and discussing new or different works.  Maybe a recommender function can support this (even if it potentially undermines the authority of the supervisor prescribed list – as one academic has recently suggested to us: “isn’t this the role of the supervisor?”).

Here’s how I’m thinking we’ll run our evaluation: Once the recommender tool is ready, we’ll ask a number of subject librarians to do the first test the tool to see if it recommends what they would expect to see linked to their original search. They will be asked to search the library catalogue for something they know well, when the catalogue returns their search does the recommender tool suggest further reading which seems like a good choice to them? As they choose more unusual books, does the recommender then start suggesting things, which are logically linked, but also more underused materials? Does it start to suggest collections which are rarely used, but never the less just as valuable?  Or does it just recommend randomly unrelated items?  And can some of the randomness support serendipity?

We’ll then run the same test with humanities researcher (it’ll  be interesting to see if librarians and academics have similar responses.  As testing facilitators, we’ll  also be gauging people’s reactions to the way in which their activity data is used. The question is, do users see this as an invasion of their privacy, or a good way to use the data? Do the benefits of the recommender tool outweigh the concerns over privacy?

The testing of the hypothesis will be  crucial indicator as to the legitimacy of the project. Positive results from the user testing will (hopefully) take this project on to the next level, and help us move towards some kind of shared service. But we really need to guage of this segment of more ‘advanced’ users can see the benefit, if they believe that the tool has the ability to make a positive impact on their research, then we hope to extend the project and encourage further libraries to participate. With more support from other libraries then hopefully researchers will be one step closer to receiving a library book recommender.