5/30/2004

Trumping Google? Metasearching's Promise

Trumping Google? Metasearching's Promise
Library Journal

Trumping Google? Metasearching's Promise
By Judy Luther -- 10/1/2003



Metasearch promises to give patrons one-stop access to the many and various resources at the heart of the library digital collection


We know it's true—library patrons prefer Google. Usability studies conducted by both librarians and vendors repeatedly confirm that today's library users start with Google to answer their questions—and they often finish there.

Google's popularity is easy to understand. It's ubiquitous—accessible wherever a searcher can get Internet access. It's simple. Searching with Google is as easy as entering keywords in a single search box. By relying on algorithms that include relevancy ranking by the popular culture, Google provides "good enough" answers. So good that 30 percent of all web searches are conducted through Google.

Like it or not, Google and its competitor search engines have created a model that librarians, as information providers, must meet head on. The question is, will librarians (who are Google users themselves) embrace the new technology to simplify access to their own resources? Metasearch technology, also known as federated or broadcast search, creates a portal that could allow the library to become the one-stop shop their users and potential users find so attractive.

Hiding our wealth

Google has radically changed users' expectations and redefined the experience of those seeking information. For many searchers, the quality of the results matter less than the process—they just expect the process to be quick and easy.

If our users make it to the library's web site at all, chances are they are confronted with library terminology they don't understand and a long list of databases they have to decipher and choose among. The result? Libraries are losing potential users. Librarians license valuable and costly full-text databases that we know contain the information researchers are seeking. But in a three-click world, each vendor's database remains a separate silo of information that our users don't find. Even if patrons are familiar with searching the OPAC, that won't help them retrieve articles. Library services that require training or require the user to come to the library undermine the advantages of licensing electronic content.

The metasearch promise
Some librarians believe that meta-search could be a way to meet the expectations and needs of "the Google generation." This software allows the user to enter keywords in a simple interface and retrieve articles from multiple full-text and bibliographic databases simultaneously.

A few academic administrators see metasearch as a way to serve a broader audience than the library may typically reach. According to one library director, all users should be considered novices. Recently a VP of academic affairs challenged his staff by pointing out that 85 percent of their students only wanted a passing grade and were more concerned with having the financial resources needed to support their lifestyle. Both perspectives point to organizing library resources so they can be used to meet the basic needs of the casual information user.

Jill Emery, director, electronic resources program at the University of Houston Libraries, sees metasearch as a way for librarians to move beyond teaching specific databases or publisher interfaces, into the realm of ubiquitous article discovery. "We must continue to develop ways for the end user to discover resources independent of us, while making this process as seamless and easy as Amazon.com or eBay," says Emery.

How it works
Although metasearch may offer the user a Google-like experience, what actually transpires is entirely different. Google sends its robots to spider or collect data from millions of web pages in advance, so the user is actually searching a cross-file index and not the content used to create the index. Users connect to the full-text pages when they click on a hyperlink in the results set. It's easier for Google to pre-index homogenous web sites that are referenced with standard URLs.

In contrast, metasearch executes a cross-file query across both bibliographic and full-text databases that do not share a common thesaurus or index, notes Maureen Kelly, industry consultant. A different search protocol is also required for each database, as some use variations of Z39.50, which was devised prior to the web; a few use XML, which identifies the data elements being used (similar to tagging on a MARC record); and many leave it to the metasearch vendor to figure out their methodology.

One of the major advantages of metasearch is that results can be obtained from multiple databases without having to repeat a search. Large public and academic libraries typically subscribe to between 100 and 400 databases. Given the multidisciplinary nature of research, users looking for "the" answer will find that metasearching facilitates discovery of databases that they otherwise may not have consulted. The novice user looking for "an" answer can find results in multiple sources with a single search.

Metasearch isn't a new concept. Dialog in the 1970s and subsequently SilverPlatter executed a single search simultaneously across multiple bibliographic databases. In the web era, meta-search engines such as Meta-crawler merge, dedupe, and rank the results of multiple web search engines such as Ask Jeeves, FAST (which is used by LexisNexis), and Overture (which is being acquired by Yahoo).

What the user sees

Libraries implementing metasearch have flexibility in how their initial screen is configured. The user will either see a single search box that defaults to a designated group of databases or will see the list of databases grouped into subjects. If the system can detect information about the user, it may present a specific range of options customized to that user's discipline. Some librarians choose to include the library's OPAC as one of the databases; others offer the user the option of including it.

Results are downloaded in batches. The user sees either a message telling him or her that the search is in process or a scoreboard with the total number of records being returned from each database. Since records appear from the fastest servers first, results sets can be limited to a specified number downloaded per set. Results are either displayed by the source searched, or merged into a single list with duplicate records eliminated. The records can serve as active links, allowing the user to navigate in the native system and to download records to view.

How results should be presented is an area of debate. WebFeat and Ex Libris believe that results must be presented separately by database. Users don't want to wait for all the results from each database to be downloaded so the completed results sets can be merged and then deduped, they argue. However, MuseGlobal offers a single results set that it merges and then dedupes "on the fly." In a recent study for the National Library of New Zealand, the first item on its wish lists of desirable features was "to unify the results…then present them in a useful order and deduplicate the results."

While search results from bibliographic databases typically appear in reverse chronological order, Oliver Pesch, CTO at EBSCO, notes that determining the order of documents in full-text databases involves a calculation that relates the number of times each keyword appears in the record to the number of times that keyword appears in the database. Google built its reputation on its ability to relevance-rank its search results. Since metasearches are performed across both bibliographic and full-text databases, the order in which the results are presented in a merged set will need to be determined by new criteria.

Additional features for postsearch processing allow the user to store or save searches and export the results through email or to bibliographic software such as ProCite. Personalization features can also allow the user to have the search executed on a schedule and be alerted to new search results.

While each integrated library system (ILS) vendor offers metasearch or has plans to offer it, the majority of libraries are using software created by one of two vendors (WebFeat and MuseGlobal) that have developed this capability and licensed it to the library systems market (see "The Players," p. 37).

The range of available features and stability of functions varies by vendor. The same metasearch software licensed by different ILS vendors will have varying capabilities based on their contract and the way in which it has been integrated with their own systems.

Sweet and salty
Metasearch has great potential, but librarians involved in implementing the technology report that it is a slow process. George Machovec, Colorado Alliance of Research Libraries, notes that implementation typically ranges from six to 12 months, and the number of resources included initially ranges from 30 to 90 databases.

Machovec cautions that librarians must spend considerable time deciding how to present the search screen to the user, confirming that authentication issues are addressed, and verifying that connectors for the databases are working. Machovec adds, "The set-up and maintenance of these interfaces are substantial and librarians should be aware of the necessary staff commitments."

The flexibility of these new tools presents libraries with an opportunity to rethink how users look for information. Kristen Hewitt, manager of support systems, Westerville Public Library, OH, where they have implemented III's MetaFind, says that the library staff had to agree on whether they wanted this tool to look like their web OPAC, which resources are presented, and how to organize them. Todd Miller, president and CEO of WebFeat, notes that librarians need to decide how to group their resources: by type of content, by subjects, or by audience.

Librarians are concerned that users may not know what databases they are using or what fields are being searched, according to Machovec. But do our users care? Some librarians are also concerned about the number of results presented to their users; they look to usage data to determine the success of the system.

Morphing models
The University of Rochester, NY, is implementing Endeavor's ENCompass, and its experience is fairly common for academic libraries. Eighty percent of the databases included in the metasearch function are working out of the box; over half are based on HTTP, slightly less than half are Z39.50, and only a couple are XML. Stanley Wilder, associate dean of the library, states that they have customized the user interface based on usability studies, reducing the number of clicks it takes to reach full text. It is labor-intensive in the setup phase, but Wilder remains enthusiastic about metasearch's potential to "help us reconnect with novice users who prefer Google and draw them into…the literature in their chosen disciplines."

Tom Wilson, director of information technology at the University of Maryland, where they are implementing MetaLib by Ex Libris, is not sure how effective or relevant a search will be across a diverse group of databases. Although it is attractive to merge and dedupe the search results, every database is different. He recognizes the challenge of deduping search results when databases use different abbreviations for the same journal title. The school has opted to group its databases into related subject areas for searching and present results by database.

The University of Arizona is busy implementing the Scholar's Portal (www.arl.org/access/scholarsportal). Kris Maloney, team leader for the digital library and information systems, voices a common complaint for an academic library: it's labor-intensive. Arizona has configured 60 of its 400 databases, with plans to continue the process to include all its licensed databases. Staff must ensure that remote users have access and organize the selected resources into subject-related profiles. Nevertheless, Maloney is convinced that metasearch will "fundamentally change the way that we deliver information."

The root of librarian resistance
At its heart, metasearch is about providing easy access for the user to complex resources. It is not a tool that allows librarians—or other expert users— to search with greater precision. It's not for us, it's for the average user.

In academic libraries nationwide, the same conversation is taking place between librarians who don't want the interface "dumbed down" and librarians with usability practice who know that patrons basically want the Google experience. It's time for librarians to accept that library users are not interested in being more like us. If we don't understand that the majority of our users are novice searchers who may wish to remain that way, we are missing the opportunity to serve the pragmatic user who is happy with a "good enough" answer.

The resistance makes sense when you look at the effect metasearch will have on the reference process. After all, metasearch technology further removes the user from interacting with a librarian who is committed to teaching the user how to find information. Many librarians also enjoy the reference interview. It is what attracted them to the profession in the first place and is part of their continuing commitment to provide good service.

Driven by convenience, however, users are increasingly accessing library resources remotely or opting to search the web, thereby eliminating the opportunity for a reference interview. Anne Lipow, director of Library Solutions Institute and Press and author of The Virtual Reference Librarian's Handbook (Neal-Schuman), shifts the perspective when she recognizes that it is librarians who are remote from their users. Lipow recommends an "in-your-face" approach to reference service that is as obvious and as convenient to the user as is web-based access to information. At a minimum, she proposes a link "click here to talk with a librarian" on the user's browser or as part of the library's web page to provide the user with help at point-of-need.

Perhaps more libraries will find their experience to be similar to that of the Houston Public Library. The library, which implemented WebFeat, experienced staff resistance in the first six months, according to Judith Hiott, assistant coordinator of materials selection. But the staff became more supportive when usage statistics revealed that full-text retrievals increased by 69 percent. "Both staff and experienced users continue to stress the importance of the native interface and the ability to easily switch from WebFeat to the native database," says Hiott.

The impact on publishers
The web has provided large publishers with the opportunity to deliver a database of journals directly to academic libraries (such as Academic Press's Ideal, Elsevier's Science Direct, Wiley's Interscience) through negotiations referred to as the "Big Deal." These agreements, some argue, threaten smaller publishers by dominating significant portions of the library budget, leaving smaller publishers to compete for what remains.

For publishers, it is also difficult to be identified as the appropriate resource when their database is just one of many on a growing list of licensed resources. Increasingly, many libraries won't license single titles or small publisher-based files. Users don't search by publisher, but they are more likely to be referred to a large database such as Science Direct or Web of Science. As Tom Sanville, executive director of Ohiolink, noted at the Charleston Conference several years ago, "size matters." The larger databases win the publisher a level of brand recognition with both the librarian and the user.

Metasearch can level the playing field by eliminating the need to select a database—it's already done for the user by the librarian, often by subject groups. This gives the smaller publisher the same chance of being searched as the larger publisher. In the process it also bypasses branding at the database level. If users only see exported metadata records or links directly to full text, publishers will need to ensure that they have labeled their content at the citation and page level.

Metasearch also bypasses the sophisticated interface developed for that unique content, such as a thesaurus that is useful in providing alternate terms for searching. In an era where the value of content depends on the functionality of the interface, metasearch compromises this value and potentially diminishes its usefulness.

It is easy when using a metasearch tool, especially if you are a novice user, to choose to search all files. After all, more can only be better, right? This has caused concern among publishers, aggregators, and other content providers that their systems can't handle the greatly increased usage that occurs when every search is run against every database. The likelihood of system overload grows with each library that adds a metasearch tool, impacting system resources and performance. This can be particularly problematic for smaller, specialized databases that were never designed to support such high traffic.

How results are displayed also concerns vendors. Will their metadata records be deduped and eliminated from the final list? If they are included, what criteria will determine their ranking in the search results. Managing the results of a search becomes more important as the number of resources continues to grow.

The road ahead
Most library system vendors and content providers believe that, as it matures, metasearch will become an integral part of our industry. "Within 18 months, all academic and consortia RFPs will include requirements for the basic functionality for metasearch," says Peter Noerr, CTO of MuseGlobal.

As large academic libraries develop institutional repositories to host their local content and digital collections, meta-search offers the capability to provide simple access to the growing array of resources the library offers, from databases of e-journals to e-book collections to reference tools online. As more libraries digitize special collections, metasearch can enable libraries to provide simple access to a wider range of locally and commercially developed products and services.

Three big issues remain for librarians. They must understand metasearch's potential role in serving their users, rethink how the library's resources are presented, and develop realistic expectations of this evolving technology. As operational issues are addressed by the technology vendors in conjunction with the NISO initiative, the library can play a more active role in reaching and serving users who have been lost between the library's web page and the growing array of licensed databases.

NISO Addresses Industry Issues
The NISO Metasearch Initiative (www.niso.org) comprises ILS vendors, aggregators, and publishers that are collaborating to make their systems interoperable. The following issues are being addressed:

Metasearch queries need to be identifiable so that aggregators can manage the load on their systems.
Databases need common descriptors that include data format, language, content tags, and taxonomies.
User authentication requires a method of presenting the library's IP address even though metasearch functions are hosted on a vendor's server.
Search protocols need to be standardized with an XML version of Z39.50 and clear search strategies used by the content providers.
Results sets require a core set of data elements for each record and for the entire results set, along with a mechanism to communicate the information.
Usage statistics need to be coordinated with Project Counter, ARL E-Metrics, and ICOLC to determine how to treat usage data resulting from metasearch—what will be counted and who will count it?

--------------------------------------------------------------------------------
Author Information
Judy Luther is President of Informed Strategies, Ardmore, PA, a consulting firm focused on successful market relations. With both an MBA and an MLS, Luther provides insights to publishers and libraries on the development and delivery of electronic products and services

No comments: