Digital Library Explorations: October 2009

Sunday, October 25, 2009

Reading notes for Access in Digital Libraries II

Chapter 1. Definition and Origins of OAI-PMH --

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH): a relatively simple protocol for sharing descriptive data, broadly useful (esp. for digital libraries)
-Created to aid the development of services across similar items (e.g. journal articles, video clips, etc.)
-Allows transfer of metadata online

-It's important not to assume a context that would be obvious within an institution but not to outsiders, for when this collection is shared, there will be no metadata to indicate what would have been obvious only within the institution

-The OAI technical committee worked throughout 2001 to establish the metadata issues most in need of consideration

-While OAI-PMH enables searches across repositories, it is not itself a protocol for searching

Todd Miller -- Federated Searching: Put It in Its Place --

This article posits the idea that "only librarians like to search; everyone else likes to find." (This may be an oversimplification; just for example, surely many users benefit from playing around with search terms, or find interesting new materials within a search for other items...) It points out that library searches limited to cataloged metadata pertaining to books is insufficient for the twenty-first century, when searchability should extend within the full text of a broader range of materials (especially digital documents). Thus, the article draws a distinction between catalog searches of books (relying on metadata) and Google searches, which can more thoroughly index a text's entire content. The article argues for simplicity and access, claiming that efforts to make information more secure usually make it less accessible. This would seem to be an obvious point.

The Truth About Federated Searching--

The article debunks the five most common myths about federated searching. In doing so, it highlights the importance for libraries of using their own authentication when possible in order to keep authentication problems from preventing effective searches for remote users. The article also helped me see that federated searching is not just software, but a service that constantly updates itself and helps a library avoid the need to update translators for its search terms (which can result in disruption of service).

The Z39.50 Information Retrieval Standard--

This article gives a helpful overview and history of Z39.50. I was most interested in the section about the role of content semantics, which allow for more abstract associations in searching. There are endless classes of information mapping, and I'm wondering how consensus is reached regarding the structure of content semantics. Also, this is a fairly old article; I'm wondering what may have changed in the last twelve years?

Search Engine Technology and Digital Libraries-

One of the interesting things this article pointed out was that libraries still see themselves as repositories of collections, rather than "gateways" to information that already exists online. The article highlights the importance of libraries' awareness of existing digital resources, and argues for their role evolving to include serving as portals to the academic web. It claims that the younger generations express a strong preference for "Google-like" access to information over traditional catalogs, and examines libraries' resistance to commercial search engines while suggesting ways in which such search technologies could be integrated into sustainable system architecture for library collections and digital materials. Are more libraries indeed creating their own local search engine infrastructures in order to build further indexes? And if so, is interoperability a great concern?

Monday, October 19, 2009

Week 7 reading notes, etc.

I'm back from Pittsburgh, and although it was a rushed trip, I'm glad to have had the chance to meet with some of the people in my classes.

And yes, I'm a bit late in posting these reading notes, but better late than never, I suppose.

David Hawking, Web Search Engines Part I--

I had never heard the term "politeness" to describe a way to prevent too many server requests from forming a bottleneck, but it makes sense to introduce delays when necessary in order for a server not to be overwhelmed (in the same way highway on-ramps at rush hour only allow one car per green light). Parallelism is clearly important for making maximal use of the server's capacities; I was surprised to hear how complex (and prone to crash) these systems are.

David Hawking, Web Search Engines Part II--

I was interested to read about some of the ways that algorithms aim to improve result quality within search engines. This reminded me of some of the things Professor He said in the lecture when I was on campus about how the highest number of hits doesn't always equal the "right" search result: for example, "Java" might mean a programming language, a country, or coffee, and a good search engine will show all three of those on the front page of results. (Similarly, the article mentioned the importance of distinguishing a search for the political satire magazine "The Onion" from countless recipes and gardening sites.) Programmers clearly have to think of many subtleties when designing search engines; I was glad to learn about some aspects that I hadn't considered (skipping, caching, assigning document numbers intentionally, etc.).

Lesk Chapter 4 --

Text searches seem utterly simple by comparison to other digital media files' data. I know the technology is still developing, but it's remarkable to me that automatic recognition programs (for indexing images) work at all, and it's unsurprising that they currently work only imperfectly. I'm also not surprised that media formats whose content unfolds over time, such as video and audio, are still harder to search for content than images are. I know that textual tagging helps greatly, and can be used in numerous ways, as when Pandora.com uses musically relevant textual tags such as "downtempo beats" and "female vocalist" to hone in on a user's musical preferences.

Sunday, October 18, 2009

Lest I forget to mention...

The recent lecture yielded no "muddy points" for me.

Friday, October 9, 2009

"Muddy point" for Week 5

A fairly basic question this week. I've never worked with Dublin Core, so I'm not sure how far its purview extends. For example, does the metadata by default feed into search engines, if the digital library exists online? Could the cataloger specify that as an option, or is Dublin Core's searchability always limited only to an institution's own network? It seems like it could be especially helpful for a commercial search engine to pick up on Dublin Core's metadata, but I could also understand why search engines might want to avoid metadata that has been pre-programmed by users (for example, to avoid spammers using false metadata as hooks to give their own sites a higher priority).

Sunday, October 4, 2009

Assignment 2, Part 1

For the first part of Assignment 2 I chose five digital images I had taken from the upstairs windows of my house in New Mexico, and saved the large master copies in their own folder. I then used Microsoft Office Picture Manager and Pixlr.com (a great site; I'm glad to know about it now!) to reduce the photographs to smaller versions and create extra-small thumbnails. Finally, I uploaded the reduced-dpi photos, the thumbnails, and (just for good measure) the master copies all to my Flickr page, creating a set for this assignment. Here's the link to the whole set:

http://www.flickr.com/photos/43242999@N03/sets/72157622394273731/

And here are copies of the images, just to enhance the look of my web page!

First, a sunset viewed from the balcony:

Next, a sunset detail:

Thirdly, a small rainstorm sweeping over South Mountain at sunset:

Fourthly, the full moon setting over the Sandia Mountains one morning:

And finally, a rainbow ending at South Mountain:

Enjoy!

Week 4 reading notes

Witten 2.2 --

It appears that information overload was a problem even as far back as 1674; the quotation from Hyde illustrates that it has always been problematic to try to condense vast amounts of information into mere subject headings and organize the headings helpfully. I'm glad to learn the term collocation, a more specific term for the methods of organizing information in a library, and I'm interested in the author's implication that confirmation is now an essential middle stage in digital information retrieval, along with increased priorities on acquisition and navigation.

The article also points out the more fluid boundaries of digital objects, which can be so easily copied/altered. This makes me think about how important it is to annotate versions and provide adequate metadata. Other interesting points were raised, too... for instance, the chart showing the dozens of spellings of Muammar Qaddafi's name illustrates how difficult it can be to acquire comprehensive metadata when so many variables are in place, as well as the need for programming the variants as cross-references.

Witten 5.4-5.7 --

Part of this chapter reiterates what I've learned in previous classes about how different bibliographic metadata format standards fill different needs: some (e.g. MARC) providing rich details (for example, for the unique documents held by archivists) and some (e.g. Dublin Core) aiming more for breadth and interoperability. I was interested to learn more about BibTEX and Refer, as well, since I had never even heard of these standards before. So many standards! I'm glad the article addressed the possibilities for their interoperability. This article also gave a good rundown of multimedia file formats, so I'll keep it in mind as a useful resource.

I was interested to hear that key phrase metadata can be obtained automatically from digital documents with some degree of success. This surely solves some of the problems Hyde was worrying about more than 300 years ago! And the article illustrated why it's helpful to build a key phrase hierarchy for enhanced data retrieval.

Gilliland - Introduction to Metadata --

This article provided a pretty good summary of metadata, finding aids, and the structure of information systems. I'm especially interested in what the author said about user-created metadata systems that are flourishing on the Web. I understand that lack of quality control is a concern with such grassroots-level tagging, but I also see how this is a helpful way for huge amounts of data to be collectively organized by multiple people.

Much of what the author said (e.g. about the value of metadata) was already clear to me after 1.5 years of library school, but I do appreciate the author's charts illustrating the various types/characteristics of metadata, which help me to see categorical distinctions among descriptors.

Weibel - Border Crossings: Reflections on a Decade of Metadata Consensus Building --

I was glad to read (albeit briefly; there weren't many details) about an attempt to involve representatives from so many different communities in bringing together disparate metadata standards. Interacting and networking among various professions and disciplines is crucial to effective Information Science practices, and this was a brief glimpse into some of the concerns/confusions/challenges that arise when various institutions attempt to collaborate, each employing its own standards and assumptions. As I said, I might have liked more details rather than just summary statements, but I'm glad at least to see that collaboration is happening.

Friday, October 2, 2009

No muddiness so far...

My small group has tentatively decided to use DSpace for our digital library project, so I've been poking around DSpace.org and taking notes on this week's guest lecture, preparing to install and experiment with it. It seems like a great way to archive materials once we learn how to use it, so I'm willing to put in the initial work that it will apparently require to make it happen.

So far I've been gathering materials for our project, but it hasn't yet come time to post them... I'll make another blog post if or when I run into any DSpace difficulties.

Digital Library Explorations