I'm back from Pittsburgh, and although it was a rushed trip, I'm glad to have had the chance to meet with some of the people in my classes.
And yes, I'm a bit late in posting these reading notes, but better late than never, I suppose.
David Hawking, Web Search Engines Part I--
I had never heard the term "politeness" to describe a way to prevent too many server requests from forming a bottleneck, but it makes sense to introduce delays when necessary in order for a server not to be overwhelmed (in the same way highway on-ramps at rush hour only allow one car per green light). Parallelism is clearly important for making maximal use of the server's capacities; I was surprised to hear how complex (and prone to crash) these systems are.
David Hawking, Web Search Engines Part II--
I was interested to read about some of the ways that algorithms aim to improve result quality within search engines. This reminded me of some of the things Professor He said in the lecture when I was on campus about how the highest number of hits doesn't always equal the "right" search result: for example, "Java" might mean a programming language, a country, or coffee, and a good search engine will show all three of those on the front page of results. (Similarly, the article mentioned the importance of distinguishing a search for the political satire magazine "The Onion" from countless recipes and gardening sites.) Programmers clearly have to think of many subtleties when designing search engines; I was glad to learn about some aspects that I hadn't considered (skipping, caching, assigning document numbers intentionally, etc.).
Lesk Chapter 4 --
Text searches seem utterly simple by comparison to other digital media files' data. I know the technology is still developing, but it's remarkable to me that automatic recognition programs (for indexing images) work at all, and it's unsurprising that they currently work only imperfectly. I'm also not surprised that media formats whose content unfolds over time, such as video and audio, are still harder to search for content than images are. I know that textual tagging helps greatly, and can be used in numerous ways, as when Pandora.com uses musically relevant textual tags such as "downtempo beats" and "female vocalist" to hone in on a user's musical preferences.
Monday, October 19, 2009
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment