The data source is currently static, but nonetheless yields excellent results. Please experiment and let us know how you get on. Stay tuned for a future post detailing some work we have planned for continuing this project, which will include assessing additional use cases, aggregating more data sources (and adding them to the API) and producing a shared service feasibility report for JISC.
In this final post I’m going to sum up what this project has produced, potential next steps, key lessons learned, and what we’d pass on to others working in this area.
In the last five months, the SALT project has produced a number of outputs:
- Data extraction recipe: http://salt11.wordpress.com/recipe-data-extraction-from-talis/
- Details on how the algorithm can support recommendations (courtesy Dave Pattern): http://www.daveyp.com/blog/archives/1453
- Technical processes documentation for processing the data and supporting the recommender API (though the API itself is not yet published): http://salt11.wordpress.com/technical-processes/
- An open licensing statement from JRUL which means the data can be made available for reuse (we’ve yet to determine how to make this happen, given the size of the dataset; and we also need to explore whether CC-BY is the most appropriate license going forward): http://salt11.wordpress.com/2011/07/26/agreeing-licensing-of-data/
- A trial recommender functionality in the live Copac prototype: http://salt11.files.wordpress.com/2011/07/copac_recommender.jpg
- A recommender function the JRUL library search interface prototype: http://salt11.files.wordpress.com/2011/08/salt_jrul.jpg
- User testing instruments:SALT Postgraduate User Discussion Guide SALT user response sheet and results
- Feedback from collections managers & potential data contributors helping us consider weaknesses and opportunities, as well as possible sustainable next steps.
There are a number of steps that can be taken as a result of this project – some imminent ‘quick wins’ which we plan to take on after the official end, and then others that are ‘bigger’ than this project.
What we plan to do next anyway:
- Adjust the threshold to a higher level (using the ‘usefulness’ benchmark given to us as users as a basis) so as to suppress some of the more off-base recommendations our users were bemused by.
- Implement the recommender in the JRUL library search interface
- Once the threshold has been reset, consider implementing the recommender as an option feature in the new Copac interface. We’d really like to, but we’d need to assess if the results are too JRUL-centric.
- Work with JRUL to determine most appropriate mechanisms for hosting the data and supporting the API in the longer term (decisions here are dependent on how, if at all, we continue with this work from a Shared Services perspective)
- Work with JRUL to assess the impact of this in the longer term (on user satisfaction, and on borrowing behaviour)
The Big Picture (what else we’d like to see happen):
1. Aggregate more data. Combine the normalised data from JRUL with processed data from additional libraries that represent a wider range of institutions, including learning and teaching. Our hunch is that only a few more would make the critical difference in ironing out some of the skewed results we get from focusing on one data set (i.e. results skewed to JRUL course listings)
2. Assess longer term impact. Longer-term analysis of the impact of the recommender functionality on JRUL user satisfaction and borrowing behaviour. Is there, as with Huddersfield, more borrowing from ‘across the shelf’? Is our original hypothesis borne out?
3. Requirements and costs gathering for a shared service. Establish the requirements and potential costs for a shared service to support processing, aggregation, and sharing of activity data via an API. Based on this project, we have a fair idea of what those requirements might be, but our experience with JRUL indicates that such provision need to adequately support the handling and processing of large quantities of data. How much FTE, processing power, and storage would we need if we scaled to handling more libraries? Part of this requirements gathering exercise would involve identifying additional contributing libraries, and the size of their data.
4. Experiment with different UI designs and algorithm thresholds to support different use cases. For example, undergraduate users vs ‘advanced’ researcher users might benefit from the thresholds being set differently; in addition, there are users who want to see items held elsewhere and how to get them vs those who don’t. Some libraries will be keen to manage user expectations if they are ‘finding’ stock that’s not held at the home institution.
5. Establish more recipes to simplify data extraction from the more common LMS’s beyond Talis (Horizon, ExLibris Voyager, and Innovative).
6. Investigate how local activity data can help collections managers identify collection strengths and recognise items that should be retained because of association with valued collections. We thought about this as a form of “stock management by association.” Librarians might treat some long-tail items (e.g. items with limited borrowing) with caution if they were aware of links/associations to other collections (although there is also the caveat that this wouldn’t be possible with local activity data reports in isolation)
7. More ambitiously, investigate how nationally aggregated activity data could support activities such as stock weeding by revealing collection strengths or gaps and allowing librarians to cross check against other collections nationally. This could also inform the number of copies a library should buy, and which books from reading lists are required in multiple copies.
8. Learning and teaching support. Explore the relationship between recommended lists and reading lists, and how it can be used as a tool to support academic teaching staff.
9. Communicate the benefits to decision-makers. If work were to continue along these lines, then a recommendation that has come out strongly from our collaborators is the need to accompany any development activity with a targeted communications plan, which continually articulates the benefits of utilising activity data to support search to decision-makers within libraries. While within our community a significant amount of momentum is building in this area, our meetings with librarians indicates that the ‘why should I care?’ and more to the point ‘why should I make this a priority?’ questions are not adequately answered. In a nutshell, ‘leveraging activity data’ can easily fall down or off the priority lists of most library managers. It would be particularly useful to tie these benefits to the strategic aims and objectives of University libraries as a means to get such work embedded in annual operational planning.
What can other institutions do to benefit from our work?
- For those using the Talis LMS (and with a few years of data stored), institutions can extract data, and create their own API to pull in as a recommender function using these recipes.
- Institutions can benefit from the work we did with users to understand their perceptions of the function, and can be assured that students (undergraduates and postgraduates) can see the immediate benefit (as long as we get rid of some of the odd stuff by setting the threshold higher)
- Use the findings of this project to support a business case for this work to their colleagues
How can they go about this?
- Assess the quality and quantity of the data stored in your LMS to determine if there’s potential there. For this project (and for the simple recommender based on ‘people who borrowed) you only need data that ties unique individuals to borrowed items (see more from Andy Land on the data extraction process and how anonymisation is handled here: http://salt11.wordpress.com/recipe-data-extraction-from-talis/).
- To understand how the recommender algorithm works, see this post Dave Pattern wrote for us: http://www.daveyp.com/blog/archives/1453
- To follow our steps in terms of data format, loading, processing, and setting up an API see Dave Chaplin’s explanation: http://salt11.wordpress.com/technical-processes/
- To conduct user-testing and focus groups to assess the recommender, feel free to draw from our SALT Postgraduate User Discussion Guide and SALT user response sheet.
Our most significant lessons:
- A lower threshold may throw up ‘long tail’ items, but they are likely to not be deemed relevant or useful by users (although they might be seen as ‘interesting’ and something they might look into further). Set a threshold of ten or so, as the University of Hudderfield has, and the quality of recommendations is relatively sound.
- Concerns over anonymisation and data privacy are not remotely shared by the users we spoke to. While we might question this response as potentially naive, this does indicate that users trust libraries to handle their data in a way that protects them and also benefits them.
- You don’t necessarily need a significant backlog of data to make this work locally. Yes, we had ten years worth from JRUL, which turned out to be a vast amount of data to crunch. But interestingly in our testing phases when we worked with only 5 weeks of data, the recommendations were remarkably good. Of course, whether this is true elsewhere, depends on the nature and size of the institution. But it’s certainly worth investigating.
- If the API is to work on the shared service level, then we need more (but potentially not many more) representative libraries to aggregate data from in order to ensure that recommendations aren’t skewed to represent one institution’s holdings, course listings or niche research interests, and can support different use cases (i.e. learning and teaching).