Digital Library Explorations

Sunday, December 6, 2009

Reading notes for Legal Rights and the Future of Digital Libraries

Joseph E. Stiglitz -- "Intellectual Property Rights and Wrongs"

I've been thinking a lot about intellectual property for the other class I'm taking, Kip Currier's Legal Issues - Copyright class. So yes, I agree with Stiglitz's contention that sharing intellectual property is crucial for advances in medicine, science, technology, and other research. The author raises some of the key issues we've been considering about how abuses of legal power may impede innovation merely in order to benefit corporations or other legal institutions. As a prospective digital librarian, I of course tend to favor open access to research and information. But in the last few months I've become aware of many unfortunately thorny issues that librarians must face in attempting to provide information to the public.

Clifford Lynch -- "Where Do We Go from Here? The Next Decade for Digital Libraries"

Preservation concerns are also something I've been thinking about lately, with regard to both physical and digital materials. One angle of this article that I found interesting was the idea that long-term preservation of intellectual property is too important to allow only librarians to be entrusted with it, since they may be considered only "one group among a broad array of stakeholders." I'm glad to see that funding for digitization initiatives has increased over the last decade or so, thereby "validating" the mission and forming communities among diverse organizations, as the article points out. And indeed, the article reminds us that digital collection creation and management are essential to a huge range of industries and institutions, such as (just for example) engineering firms, homeland security, museums, personal archives, schools, laboratories, and historical societies.

I appreciate the author's effort to consider a "long time horizon perspective" in integrating digital information management technologies for multiple purposes across people's lifetimes. As a side note, this makes me think of the excellent book "The Clock of the Long Now" by Stewart Brand, which I read as an optional assignment for Dr. Richard Cox's class in archival ethics. The book was a fascinating meditation on ultra-long-term preservation. Highly recommended!

Sunday, November 29, 2009

Reading notes for Security and Economics

W. Y. Arms -- Implementing Policies for Access Management

This article addresses issues in access management of electronic documents. Many institutions wish to restrict access to online documents for reasons of privacy, security, or payment restrictions. The access model's framework places policies at the center, such that every user and collection has an associated policy. However, under this model any policy change would require altering every document, a time-consuming and error-prone prospect. Alternatively, another approach uses containers of information to encapsulate policies and more easily transmit/change/enforce them. The article's table in section two was helpful for illustrating a simple breakdown of users, attributes, and operations all comprising a policy.

Of course, digital materials' metadata allows for many attributes to be associated with every item, and users' logins may easily demarcate different populations. However, interoperability is still a challenge when dealing with multiple libraries' collections. The article shows why it's best to keep attributes, policies, operations, etc. separate for easy management.

Lesk Chapter Nine

The chapter begins by pointing out that traditional libraries have often been financially extravagant, and questioning whether digital libraries offer a more economically reasonable alternative.

Funding models for digital libraries include:
* institutional support
* charging users
* advertisers
* other, such as pledge drives for donations

Of course, the traditional library has not been monetized, so users may be resistant to paying for digital library services. Although costs for digital copying are low to nil, if consumers expect instant and unlimited copies, publishers stand to lose money.

Costs of academic texts and journals are particularly high, causing many libraries to reduce their offerings -- a loss to scholars. Sometimes libraries switch to on-demand acquisition only. I was interested in the notion of libraries as "buyers' clubs," wherein people pool their money to buy a single copy of something, but the article mentions many problems (and even paradoxes) with this approach.

Subscription libraries are one option, with parallels both in history and in organizations such as video lending services. A per-item or a per-month/year fee model may be used.

One problem libraries may face with digital materials is revocation of priorly owned materials. When a library cancels a subscription to a print journal, it still owns the titles it has already bought; in the digital realm, all access to back issues may be denied with the cancellation of a subscription.

Yet another issue is the difficulty of obtaining access to copyrighted work in order to digitize it. This process can consume far too much time and money to be worthwhile.

Worse than a muddy point...

This past week I compiled the metadata for all of my group's documents on Greenstone, and everything first seemed to be going fine. However, at (what I thought would be) the very end of the project, when I tried to build the digital library, Greenstone kept giving me messages such as ""The file [filename.pdf] was recognised but could not be processed by any plugin." Then at the end of processing, it gave me the following message:

120 documents were considered for processing:
46 documents were processed and included in the collection.
74 were rejected.

From then on, I no longer had access to the rejected documents via Greenstone. They were still on my computer, but had been relegated to an "archives" folder in the Greenstone file structure, with names like Hash9bb4.dir in place of the more recognizable file or folder names. I have no idea what to do about this! Are two thirds of our metadata-enriched PDFs totally unusable in Greenstone? And why??? I tried looking up the issue online, but only found a single site with a vague suggestion to convert the PDFs to HTML using third-party software.

If anyone reading this has any ideas, please comment or email me...

Sunday, November 8, 2009

Reading notes for Evaluation

The link to Arms' chapter didn't work, so I looked it up elsewhere and found a 1999 version at the following web address: http://www.cs.cornell.edu/wya/DigLib/MS1999/Chapter8.html
I assume it's not too outdated!

It gives a good basic introduction to the principles of a user interface, and the ways that an interface should change along with technology over time. The conceptual model that this chapter proposed was pretty straightforward, as was its review of browser technology. Most of the chapter's points are ones I was already familiar with, e.g. that a web designer must balance effective use of advanced or sophisticated features with the ability to offer simplicity and speed for less well-equipped users. I was also already familiar with mirroring and caching -- and was interested to see that when this article was written (presumably in 1999) video skimming was mostly merely an idea for future development. What I found most interesting were the chapter's brief references to the writer's own experiences, such as the fact that his online magazine redesigned its interface yearly.

Kling and Elliott's article brings a focus to usability concerns in designing an interface; they recognize that ease of use improves users' performance. They break ease of use down into four components:

Learnability - which also concerns the speed with which a user can begin using the software

Efficiency - how productively a user can make use of the system

Memorability - whether the user can easily return to using the system after an absence

Low error rate - no catastrophic errors and easy recovery from the minor ones

Clearly an intuitive system organization that works well on a server will lead to the best results. Given that users of digital libraries will have all sorts of different goals and intentions, it's probably best for system developers to survey users frequently to determine the areas in need of improvement.

For organizations, the authors break down concerns as follows:

Accessibility - the ease with which people can locate specific systems and content, both physically and administratively

Compatibility - of file transfers between systems

Integrability into work practices - how smoothly the system fits existing practices

Social-organizational expertise - how well people can obtain training and consulting to learn to use systems and troubleshoot

Unsurprisingly, many of the authors' recommendations for digital libraries involve testing systems, surveying users, exploring multiple design alternatives, etc. They implore us to pay attention to cultural models of user bases, reminding us that a system appropriate for elementary schoolchildren will not be as appropriate for graduate-level science laboratories.

Finally, Tefko Saracevic's article evaluates evaluation: analyzing the methods and contexts of the (relatively rare) evaluation of several different digital libraries. The article goes into details about the evaluative methods that were used, and highlights the variety of approaches possible: usability-centered (as with the article above), ethnographic, anthropological, sociological, and economic. The article highlights many distinct matrices of assessment, and briefly acknowledges that despite many factual criteria there is also the role of human judgment in certain evaluations. Digital libraries are fairly new, so it is understandable that not much evaluation has been done on them, but one of the take-home messages of this article is that despite apparent lack of interest and definite lack of funding, evaluation is important and should become a bigger part of digital library culture.

A note on "muddy points"

I've found all of the recent lectures to be clear and comprehensive; thanks for that! I'll post if I think of any questions over the course of the project I'm working on, but for now, I don't really have any muddy points that would be useful to address in tomorrow's lecture. I almost feel disappointed about that!

Tuesday, November 3, 2009

I didn't mean to fall behind...

My poor, neglected blog. Well, assuming that a late posting is better than none, here are my reading notes for the Preservation of Digital Materials unit.

Preservation in the Age of Large-Scale Digitization -- A White Paper by Oya Y. Rieger

I appreciated the wide scope and long-term perspective of this paper. Given the pace at which technologies change, it's essential to ask questions such as "who will ensure that digital content remains accessible over time?".

The article points out the difference between digital backups (to ensure against destruction of physical texts) and a bona fide digital library (searchable, indexed, copyright-cleared, etc.). It's a reminder of considerations to keep in mind when transitioning a "backup" repository into a digital library.

The paper gives a rundown of some of the key players in digitization, including OCLC, the OCA, Google, Microsoft, and the Million Book Project. I appreciated Table 1, which lays out the essential aspects of various initiatives (their distinguishing features & goals) -- personally I'm most interested in Google Book Search, simply because of its hugely ambitious intentions; I'll continue to follow news about it.

As a side note, I was interested to hear the figure that (at least in Cornell's study) about 10% of library books accounted for about 90% of circulation. While this may make digitization priorities a bit easier, on another level it makes me a bit sad; I'd like to think of a majority of people reading more different things!

I agree with criticisms (mentioned in the article) of the Google Book project over uploading scans with poor image quality, missing text, or other defects. For example, when I recently looked up a Chaucer poem on Google Books, the copy that came up was covered in extensive handwritten notes; it seemed odd to me that this adulterated text was selected as the copy to be scanned. Surely they could have easily found a cleaner copy? (Or at least digitally removed the margin notes?) I know Chaucer's words won't ever be lost, but as for lesser-known texts, I do fear -- as the article mentioned -- that once digital copies are uploaded, some originals will be discarded, even if their contents were not always properly preserved.

The article details some storage and retrieval concerns, as well as some security and environmental considerations. Throughout the article, I was aware of how today's decisions will affect tomorrow's library conditions; it's worth making careful quality assessments while we're at this crucial point of transition into digitized formats.

Finally, all of the registry and copyright concerns that were mentioned here dovetail interestingly with the other class that I'm taking this semester, Legal Issues and Copyright. I've become increasingly aware of the ways in which legal concerns can curtail a library's practices, and I hope to find ways to circumvent perceived restrictions and allow access of library texts as widely as is legally feasible.

Research Challenges in Digital Archiving and Long-term Preservation by Margaret Hedstrom

This article starts with an interesting point, that "many of the digital resources we are creating today will be re-purposed and re-used for reasons that we cannot imagine today" -- while at the same time, evolving technologies make traditional paradigms obsolete. Just as preservation of library materials has always focused on the very long term, digital preservation should be enacted with an eye to long-term feasibility, including the adaptability of metadata, ease of restructuring, and sustainability of infrastructure.

Actualized Preservation Threats -- Practical Lessons from Chronicling America by Justin Littman

Chronicling America is an initiative to digitize several historical American newspapers and provide public access. This article focused on several of the things that can go wrong in digital preservation efforts, such as software errors, operator errors, hardware failure, and problems with media drives and file corruptions. At best, such issues slow down a data transfer process, and at worst, data is lost. But this article led me to believe that even the worst cases of data loss are remediable as long as operators are paying close intention to issues of data integrity.

Sunday, October 25, 2009

Reading notes for Access in Digital Libraries II

Chapter 1. Definition and Origins of OAI-PMH --

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH): a relatively simple protocol for sharing descriptive data, broadly useful (esp. for digital libraries)
-Created to aid the development of services across similar items (e.g. journal articles, video clips, etc.)
-Allows transfer of metadata online

-It's important not to assume a context that would be obvious within an institution but not to outsiders, for when this collection is shared, there will be no metadata to indicate what would have been obvious only within the institution

-The OAI technical committee worked throughout 2001 to establish the metadata issues most in need of consideration

-While OAI-PMH enables searches across repositories, it is not itself a protocol for searching

Todd Miller -- Federated Searching: Put It in Its Place --

This article posits the idea that "only librarians like to search; everyone else likes to find." (This may be an oversimplification; just for example, surely many users benefit from playing around with search terms, or find interesting new materials within a search for other items...) It points out that library searches limited to cataloged metadata pertaining to books is insufficient for the twenty-first century, when searchability should extend within the full text of a broader range of materials (especially digital documents). Thus, the article draws a distinction between catalog searches of books (relying on metadata) and Google searches, which can more thoroughly index a text's entire content. The article argues for simplicity and access, claiming that efforts to make information more secure usually make it less accessible. This would seem to be an obvious point.

The Truth About Federated Searching--

The article debunks the five most common myths about federated searching. In doing so, it highlights the importance for libraries of using their own authentication when possible in order to keep authentication problems from preventing effective searches for remote users. The article also helped me see that federated searching is not just software, but a service that constantly updates itself and helps a library avoid the need to update translators for its search terms (which can result in disruption of service).

The Z39.50 Information Retrieval Standard--

This article gives a helpful overview and history of Z39.50. I was most interested in the section about the role of content semantics, which allow for more abstract associations in searching. There are endless classes of information mapping, and I'm wondering how consensus is reached regarding the structure of content semantics. Also, this is a fairly old article; I'm wondering what may have changed in the last twelve years?

Search Engine Technology and Digital Libraries-

One of the interesting things this article pointed out was that libraries still see themselves as repositories of collections, rather than "gateways" to information that already exists online. The article highlights the importance of libraries' awareness of existing digital resources, and argues for their role evolving to include serving as portals to the academic web. It claims that the younger generations express a strong preference for "Google-like" access to information over traditional catalogs, and examines libraries' resistance to commercial search engines while suggesting ways in which such search technologies could be integrated into sustainable system architecture for library collections and digital materials. Are more libraries indeed creating their own local search engine infrastructures in order to build further indexes? And if so, is interoperability a great concern?