Digital Library Explorations: September 2009

Friday, September 25, 2009

Just wondering...

I realize that Prof. He said at the end of lecture we didn't have to write "muddiest points" for this week; I'm just making a quick blog post while the course is on my mind. I'm still not clear on the logistics of the exam next month -- how it will be administered, how much time we'll have, etc. -- but I suppose there will be enough time to figure it out before it occurs. And I assume there's only one exam this semester?

The lecture clarified a lot of points for me about digitization, the pros and cons of various standards and formats, and hosting content online. I'll have to check out the PURL examples at the OCLC site.

Friday, September 18, 2009

Week 3 reading notes

Michael Lesk, Understanding Digital Libraries - sections 2.1, 2.2, 2.7, and Chapter 3

2.1 Computer Typesetting

The history of text standards for both software and printing... WYSIWYG formats developed in the 1970s led to more user-friendly transparency between screen and page...Postscript & SGML both important developments for computer typesetting

2.2 Text Formats

Discussion of tags and searches within the three most notable standards (MARC, SGML, and HTML).... Difficulty of entering SGML labels in a WYSIWYG document... HTML, of course, supports hypertext links. Certain questions and issues arise when information professionals try to describe content, deduce keywords, rank search terms, etc.

2.7 Document Conversion

There are two primary strategies for text: scanning w/ Optical Character Recognition, and keying in materials (original or transcribed, with original errors usually included in transcriptions).

I was struck by the number of errors in certain scans of the newspaper text, but I see that progress is being made in OCR systems and software. This is clearly very important for digital archiving of older materials.

Chapter 3: Images of Pages

There are sizable differences among the quality of scanned images (depending on equipment) and the costs for such efforts (keying vs. machine scanning). Review of specific equipment, etc...

Various image formats (compression algorithms, etc.)

Display requirements should allow for easy legibility on a variety of computer equipment

PDFs help retain the author's originally intended format & appearance.
Postscript --> Adobe Acrobat, PDFs

Indexing: thumbnails are helpful for users, but not machine-readable

Print indexes can be based on other standards

Again, OCR helps for searchability

For old newspapers' digitization, sometimes clipping story-by-story as image files works best

Some files share texts & images, tables, other formats, etc. -- a page can be broken into separate regions

There are three library storage options: scanning, on-campus storage, and an off-site depository, each w/ pros & cons (off-site for rarer stuff)

ARMS. Chapters 9. http://www.cs.cornell.edu/wya/DigLib/MS1999/Chapter9.html.

Again, proof that OCR is vital to searchability of scanned texts.

Week 2 - the muddiest point from lecture

I wouldn't say any points from lecture were particularly "muddy," but certain things did surprise me. For example, I was disappointed to hear that there's not much funding these days for digital library development from the government; I would have hoped that such funding would be ongoing, given how very much printed material there is to put online.

In lecture we also heard a bit more about the aspect of "community" that I had questioned in the paper I wrote last week. It seems that community is considered a central starting-point for digital library development: you start with the society you want to serve, and develop everything according to that community's social, economic, and legal issues. The lecture helped me understand a few of the questions I raised about the necessity of developing a DL for a specific community, but I still wonder whether it's always essential.

Monday, September 14, 2009

A couple of odds and ends

The link within the syllabus to one of the Week 2 readings didn't work, so I had to get there through another route: http://www.cs.cornell.edu/wya/DigLib/MS1999/Chapter2.html. The article provided a brief introduction to the birth of the internet, which I had already read about, but it also clarified a few of the finer points regarding the particulars of the internet, such as the difference between IP and TCP and how these protocols work together to send packets. I was interested in the sidebar about the relatively early formation of the Los Alamos E-Print Archives, since it sounds like such a seamless way for researchers themselves to contribute to a digital library without having to make any adjustments for format, protocol, or tools. Its use of an open archive format, wherein the authors retain copyright of their materials, makes good sense and seems an interesting precursor to the World Wide Web as we now know it.

In other LIS 2670 news, I didn't see an assignment uploading tool within CourseWeb for this class, so I emailed my first brief essay to the professor, and I also hosted a copy of the file via Google Docs, just to be sure. It can be accessed at
http://docs.google.com/View?id=dfddpdf6_2fhffsfcx.

Friday, September 11, 2009

Week 2 Reading Notes

Just a few notes on the articles I read for this week...

A Framework for Building Open Digital Libraries

The technology behind Digital Libraries is evolving at a faster pace than the already well-established practices of library science, so standards such as the Dublin Core Metadata Element Set and the Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH) aim to make library and computer/data systems more interoperable.

Digital libraries are often custom-built, and therefore not designed to be interoperable with/within other systems. Program logic (which can be complex) varies according to community needs. As of the article date (Dec. 2001), few software tool kits existed for the express purpose of building digital libraries; in the absence of such unified starting-points, the OAI Protocol for Metadata Harvesting treats multiple DLs as searchable Open Archives. The Open Digital Library design allows for interoperability at the functional level across physically separate collections. Various components of an ODL network allow users to browse by categories, combine metadata from multiple sources, search and filter search results, sort by date, and so on. Ultimately, the authors hope to influence the development of DLs starting from the design stage.

Interoperability for Digital Objects and Repositories

This article describes the Cornell/Corporation for National Research Initiatives' own efforts to develop and employ a design for interoperable digital repositories. The authors' own approach is a hybrid of some of the traditional approaches to achieving interoperability, such as standardization (e.g., schema definition, data models, protocols), distributed object request architectures (e.g., CORBA), remote procedure calls, mediation (e.g., gateways, wrappers), and mobile computing (e.g., Java applets).

The article mentions the authors' focus on the preservation of digital items, standardization of format, and attention to access issues. The article lists certain components of accessing digital objects: disseminator types (usually outside operations), servlets (executable programs), and the notion of extensibility (how easily the new digital object can be used with additional functionality).

The authors performed experiments to test extensibility, interoperability, functionality, and other access/compatibility issues. The tests were mostly successful, and will guide the authors' future research on the implementation of programs designed to increase interoperability.

An Architecture for Information in Digital Libraries

This article gives an overview of the structure of stored information in digital libraries. I aim to fill in details as soon as I get the chance!

Tuesday, September 8, 2009

An Initial Post...

And so commences my blog for LIS 2670, Digital Libraries, at Pitt. Welcome!

We were asked to submit our "muddiest points" from lectures and/or readings. Most of the first week's readings were pretty straightforward, although in the Dewey Meets Turing article, I wasn't entirely clear about how or whether the grant money for the natural sciences translated into adequate funding for the digital library initiative. I assume that the natural sciences grants were stretched to fund the DLI, but that makes me wonder about funding for digital libraries in disciplines that don't traditionally receive as much grant money.

I'll post again soon with reactions to this week's readings.

Digital Library Explorations