Saturday, June 15, 2013

8 things we know about web scale discovery systems in 2013

Web scale discovery systems as a class of product has existed for over 4 years, and there has been rapid adoption by academic libraries around the world. We are currently way past earlier adopter phase, and probably deep into if not past early majority phase.

Some of the early leaders in this space like Summon are even announcing a "2.0" version, which may or may not be marketing hype but is symbolic I guess in signalling that products in this class have reached a certain amount of maturity.

Today in 2013, Summon alone has over 500 libraries using it, and many more are using Worldcat local, Primo Central, Ebsco Discovery Service etc. As usual, this has led to the rise of professional literature written on the topic (see list curated by me here as well as Flipboard custom magazine), covering a host of areas including
  • impact on resource usage (full-text downloads, print catalogue items, A&I usage, articles not indexed in discovery)
  • impact on workflow for management of eresources
  • proper marketing and positioning of discovery products for users
  • impact on teaching of information literacy by librarians
  • surveys on attitudes of librarians, undergraduates, graduate students and faculty towards discovery services vs databases
  • usability testing & intergretion of discovery services into library websites 
and many more.

With all this literature out there, what do we really know about web scale discovery services in 2013 that we didn't know in 2009  and what are some issues where the jury is still out?

Some qualifiers.

First, I don't profess to know all the answers or have read or even remembered every study done on discovery services, nor am I an "expert", though I have kept my eye on this interesting area.

Second, I have far greater familiarity towards Summon (which we test and implemented in our institution in 2011-2012) and to some extent EDS so what I write might apply only for Summon. (e.g I wonder if EDS interface with more advanced features at the cost of a crowded user-interface would mean advanced users would be more satisfied) . Still I suspect on the general high level view, the web scale discovery services on the market are similar enough that most statements appear for all of them.

Third, I am going to speculate based not just literature but also my own knowledge and feel of what the general consensus is (which might be wrong).

I hope this post can lead to some fruitful discussion, even if you disagree with what I have written.

1. Web Scale Discovery Services increase accessibility of eresources and will definitely on the whole increase full-text downloads

This seems to be the result that is most robust and uncontroversial. Every library that has implemented discovery services has reported on the whole usage of eresources has gone up.

Distractors might say, users might be downloading more, but do they actually find what they need? Or even if they found something that is just good enough but is it the best? That's a (possibly) fair but different point.

2. Undergraduates generally love discovery services
Again another point that is I believe mostly accepted. Survey after survey has shown undergraduates are generally happy with discovery services because it mostly fits their mental models by functioning somewhat like Google. Are they perfect and do all undergraduates like them? Of course not, but on a whole, libraries that have surveyed users have mostly obtained positive feedback compared to existing catalogue or search tools, this is of course unlike results for federated search in the past.

3. Librarians reactions towards discovery services are mixed at best. 

The earliest study I am aware of that surveyed librarians reactions to Summon reported "culture shock". This seems to be the default reaction of librarians who encounter discovery tools for the first time. Of course, this was by one earlier adopter library in Australia, back when the concept of discovery tools was still novel to the profession. and the study itself suggests based on a followup survey 6 months later that as librarians get used to the concept, they become more positive towards the tool.

However, more recent studies on librarians attitudes towards information literacy such as this one and this one suggest that librarians attitudes towards discovery tools are still polarized or ambivalent whether it be when using it and recommending it to users at the reference desk or teaching in classes. Attitudes range from enthusiastic support (see the series of free recorded webinars on Summon and information literacy adopted by librarians) to acceptance (sometimes grudgingly) to extreme opposition for instance claiming that teaching Summon is "a dereliction of duty reference librarians have towards their users" - one of the more extreme statements found in the literature.

Based on discussions with librarians both within and outside my institution, I can verify as well that there are many highly qualified reference librarians who dislike discovery services intensely and not out of mere ignorance or resistance to change.

There have been a couple of blog posts and papers trying to explain this resistance. Here's my take combining reasons I have seen given in various papers with my own thoughts.
  • Relevancy ranking results can be inconsistent if not awful (opinions vary on how bad this issue is, possibly depending on expectations, implementation and discipline). 
  • Lack of advanced search features 
  • Worry that some important material is missed out in the index or in some disciplines totally inadequate. Related is the view that a subject specific database is almost always better eg PubMed.
  • Worry that users are unaware that they are missing out material not found in the index, and they may settle for good enough instead of the best available
  • Worry that discovery services are damaging information literacy skills by misleading users into thinking research is easy
  • Technical issues relating to instability of linking to full-text, clarity of labels in the interface etc
  • Uncertainty on how to position discovery systems next to databases and how to teach
  • Worry that libraries are handing over too much power to discovery services due to lockin by discovery service providers who are simultaneously content providers (example of recent dispute).
Each point can be of course expanded further, for instance relevancy itself can be a big area, with some librarians unhappy about the weighting of content types (newspaper articles appearing too often instead of books) while others are unhappy with the overall relevancy ranking for known item searches.

4. Advanced searchers generally mirror the attitudes of librarians and are not as satisfied 

As expected, experienced researchers and faculty staff generally mirror the opinions of librarians and they are a lot less enthusiastic than undergraduates in general because they are familiar with what databases offer and are more demanding on what they should get.

That said the Ithaka Faculty Survey 2012 speculates that library heavy investment in discovery services are paying off leading to more faculty starting their search from the library catalogue in 2012, the first time ever it increased since the survey started in 2003.

As Barbara Fister points out, faculty staff are often searching "for known items, something discovery systems seem to handle rather badly", so this seems off.

5. Relevancy ranking can still be improved

This differs from service to service with some services claiming superiority in this area.

Head to head tests give mixed results, eg. This gives victory to EDS over Summon, this to Summon ,  this simple one gave A&I>Summon>Google Scholar, but this one gave it to Google Scholar over Summon etc.

But I doubt most librarians will say Summon or any other discovery service is as good as it can be and would yearn for better relevancy.

I am personally more sympathetic towards discovery systems in this area, though having spent countless hours studying and duplicating thousands of user searches since June 2012, I am well aware of how poor the relevancy ranking of Summon can be on some searches (I have also done limited testing on other systems).

Lest I be accused of not giving examples here's one Singapore "national service" , where currently the first 9 results are totally irrelevant. Though one example hardly proves a pattern, I am sure any librarian familiar with discovery services can give dozens of examples similar to this one. But of course, relevancy isn't an easy problem to solve and to be fair in this case, doing the same search without quotes actually gives you better results but still poor results.

Compared to the early days where discovery services raced to sign up content providers and boasted of the size of their index (they still do I guess), there is a increasing realization by all parties whether librarians or discovery providers that all this content can be counter-productive if the relevancy ranking isn't capable enough to surface the right or at least decent content.

Also as mentioned before there was in the early days doubts on how good such systems are for known item searching particularly for catalogue items and this continues to this day despite improvements.

6. Adding Federated search does not add much to web scale discovery (currently)

This is somewhat more controversial. But I believe the current consensus is moving towards the idea that tagging on federated search to web scale discovery is not that useful, at least with current implementations of this. An early debate in 2009 was sparked on the Federated Search Blog with the post Beyond Federated Search and followups, that critiqued Summon for lacking federated search, claiming that a hybrid solution of indexing what you can, and doing a broadcast search (federated search) over what you can't should be the way to go.

I could be wrong, but my impression is that many libraries that implemented Ebsco Discovery Service which does have federated search, have chosen to turn off the federated search portion, basically because it wasn't used and/or was counterproductive.

Federated Search is Dead -- and Good Riddance! , a piece explaining why James Madison University (JMU) turned off the EBSCO Integrated Search federated search add on included in EBSCO Discovery Service is perhaps a typical reaction.

Essentially the sheer size of the index of discovery services like Ebsco Discovery service or other services, means that students have no incentive to wait 30 seconds for more results, the problem they face typically is too many results, not insufficient results. Scholars will already be using traditional databases anyway  as primary search tool (e.g Scopus) and may just use Web Scale discovery tools as a final round-up of what they have missed so they don't really have a dying need to see results from such traditional databases anyway.

I would say even Ebsco is downplaying the significance of the option of federated search in their EDS service, as a look at their pages on EDS does not mention federated search at all (though to be fair it's a seperate product EHIS), and there is even a page on platform blending (which I frankly don't quite understand what is going on here despite a vendor explaining it to me) where they go out of their way to state it is "not federation"

Of course, an argument could be made (correctly I think) that the idea of a hybrid system is sound but the implementation needs a lot of work to make it worth it, but currently it seems of the 4 major players in the market none seem to have cracked this issue yet and may not do so in the foreseeable future as it is perhaps not a priority.

I would also add that many of the issues brought up back then about the dangers of ceding control to your discovery provider on the content can be found if you don't do federation, may still retain its teeth (again see recent spate), but at the very least on the practical front, the inclusion of a federated search option in a web scale discovery system generally isn't considered critical by most librarians now.

7. Content providers are generally eager to cooperate with discovery vendors to have their content indexed. 

One of the reasons why the need for federated search seems to have diminished is because more and more content is getting indexed. In 2009, there was still uncertainty on how content providers would react , would they want to be included? and discovery vendors had to work hard to get content included. If most did, then federated search would be of limited value except for reasons related to currency of results. If most couldn't be indexed, then federation would be crucially important to get at those resources.

As of 2013, the situation has clarified, over the years as more libraries started to release data showing that usage tends to fall for anything not in discovery services and or conversely anything indexed in them will lead to increased in usage, content providers have become more and more eager to be indexed or risk being cut out of the game.

The earlier mentioned James Madison University paper is perhaps instructive. Back in 2010 where he was describing the situation, of the sources, he mentioned that was accessible via federated search, by now many like JSTOR, Sage, Sciencedirect etc all are now indexed in Summon and probably other discovery services.

More interesting even A&I services like Scopus, Web of Science, MLA, ERIC are often included in many discovery services now though with appropriate safeguards to ensure their records are shown only for authenticated users.

That said, there are still hold-outs, the well known Psycinfo, EconLit etc and other A&I databases that work with Ebsco Discovery Service only is perhaps the most gaping hole currently existing.

And of course the above refers only to publishers but in general aggregator databases have been less willing (Gale seems to be a an exception here being included in Summon since 2009 and recently added to ebscohost discovery service as well as others) particularly those owned by Proquest and ebscohost are typically out of bounds to discovery services of competing services barring some special agreement.

8. Problems of broken links are still an issue though the problem is less serious and likely to be so in future

One of the greatest issues with discovery services is that they typically rely heavily on openurl to get to the full text. As is well known openurl linking is not 100% reliable, so discovery services have put in place alternate routes to full text.

For example Summon implemented "Index-Enhanced Direct Linking" and EDS has their smart links (if content is in the ebscohost databases) or custom links (I believe equivalent to Summon's index-enhanced direct linking in most cases)

That said, linking to newspaper articles, non-journal items and free content can still be iffy.

Still new efforts like KBART and Improving OpenURLs Through Analytics (IOTA) are underway, so perhaps in time to come this issue will be hopefully reduced.


Some other less important findings,


I confess it took quite a bit of effort and courage to get this piece written and posted. Sometimes I wondered if I was getting the general consensus totally wrong, and yet other times I thought what I wrote is totally trite and obvious that people knew even right at the start of 2009.

I suspect the later is more likely to be correct, because I decided to err on side of caution and list the statements I thought were definitely agreed upon and bump the ones I was unsure to a follow- up blog post  "X things we still are unsure about web scale discovery systems in 2013".

But what do you think? What else is it we know about discovery services that were in doubt in 2009?

BTW If you want to keep up with articles, blog posts, videos etc on web scale discovery, do consider subscribing to my custom magazine curated by me on Flipboard or looking at the bibliography on web scale discovery services)

blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...