IL 2011: Discovery Engines

Advances in Discovery Engines and Services
Greg Notess, Montana State University and Marshall Breeding, Vanderbilt University

Marshall Breeding

Academic libraries really interested in this lately – 800 total sales of Primo so far. No numbers on others. BiblioCommons adopted by major municipal libraries.

AquaBrowser – pioneer, but currently losing ground. Was originally marketed by companies that now have their own products.

Turning into Mega Indexes – heterogeneous mix of resources. 1 billion items is not so much – well within the capacity of the technology.

Tech providers and content providers merging. Relationship between your own content and third-party content when exposed in a discovery service. Example — EBSCO.

Open source discovery interface (VuFind, Blacklight) – but no open content mega index yet. How are they doing this then? Unlikely on a larger scale. Some places purchasing a disco service so they can use the content with an open source interface.

Current setup doesn’t eliminate publishers – another mode of access to their content

What’s in the index? Will you be less likely to subscribe to things that aren’t in the index because your users can’t find it through the discovery interface?

Open Discovery Initiative – address issues between producers of content and producers of tech. Tech protocols, business rules, transparency of what’s indexed, displayed.

HathiTrust will explose SOLR index to discovery services. Brings in full-text books.

Technically possible to deal with hundreds of millions or billions of records, but how do you order the records in a way that makes sense? Rely on use-based and social factors to improve relevancy?

Moving from discovery to management – ILS now being pulled in. Serials Solutions and OCLC have web-scale management services. Ex Libris Alma for ILS. Integration with e-book lending? More than just MARC records.

Must be device-agnostic! Everything we do for patrons can be used on all the devices. Silos for mobile apps are fragmenting things.

Greg Notess

Vendors say: “Web-scale” – what does that mean?? Can’t say that to patron. They all say they’re comprehensive but they’re not. How much of this holds up? Pros and cons:

  • Single search box helps patrons know where to start – true! But is it the best starting point? Not always.
  • Looks like Google – good and bad.
  • Simplifies search — oversimplifies it though?
  • Faster searching with single index. Not always fast. Also, special characters break it!! UGH.
  • Faster updating — by provider. Still complex though. E-books released faster than the MARC records are. Not in the discovery service, not in the catalog, but you can get it if you know where to go.
  • Combines all of the resources. But students may just want a single type of resource. “I need to get a journal article from the last two years . . . in print.”
  • Full-text searching! But it finds too much . . . relevance algorithm can’t interpret what they actually want from their poor search string.
  • Finds mentions of things when not included in metadata. But this can junk up search results with review articles.
  • Does broad searches, but citation only searches may get buried because there’s not as much text to search.

LJ article in March – The Next Generation of Discovery.

Overall with Discovery Services. Some faculty love it, some hate it. Some librarians love it, some hate it. Different groups have different problems/reactions. A negative experience with a product turns people away.

Coverage – what are you really including? What’s full-text, what’s citaton-only? Summon posted a list recently.

Local results, look at search strings.

  • They’re putting in database names. Using a database reccomender module, results improving.  . . . of course with ERIC they get sent to free version with no open URL linking to full text.
  • Known-item searches. Last year 58% success rate, this year 100%.
  • Full-text linking. 42% failure rate for testing of results listed as “full-text available.” Varies by discipline/topic. Still have to jump through hoops to get to it. 28% this year with analysis ongoing. Serials says it’s the 360 product not the discovery service but you know what? Students don’t care and what matters is that it’s broken!

Student success. Last year, most common searches on Summon were navigational strings that should have gone into browser bar. This year, lots of empty searches (mistakes? trawlers?), lots tend to be assignment-based as they all search same keyword.