Posts

Showing posts from February, 2010

2010-02-17: Using Web Page Titles to Rediscover Lost Web Pages

Image
The object of my project was to glean from a web page's title whether the title could be used to find the resource within the yahoo search engines caches. Lost pages for this project are pages that return a 404. A 404 response code is an error message indicating that the client was able to communicate with the server but the server could not find what was requested. There are a multitude of possibilities why a page or an entire web site may disappear. These pages may reside only in the cache’s of search engines, or web archives, or just moved from one URI to another. In the context of this experiment Titles are denoted by the TITLE element within a web page. There can only be one title in a web page. The title may not contain anchors, highlighting, or paragraph marks. What would be most desirable for this experiment would be to take all URIs as our collection set. Regrettably, using the entire web as our test set is unrealistic. Capturing a representative sample set of web-sites

2010-02-11: Memento and OAC at the CNI Fall 2009 Membership Meeting

Image
Herbert , Rob and I were at the Coalition for Networked Information Fall 2009 Membership Meeting in Washington DC, December 14-15, 2009. The CNI meetings are always good and this one was no exception. We gave a presentation about Memento (direct link on vimeo ): Memento: Time Travel for the Web from CNI Video Editor on Vimeo . Note that this presentation was based on the initial version of Memento first presented in November 2009, not the slightly updated version from February 2010. While we were there, we were also interviewed by Gerry Bayne of EDUCAUSE . Here's an embedded version of the interview: Also at CNI Fall 2009, Rob gave a presentation about the Open Annotation Collaboration (OAC), of which I am on the technical committee. Rob's presentation is also available: Interoperable Annotation: Perspectives from the Open Annotation Collaboration from CNI Video Editor on Vimeo . We also did a short interview about OAC with EDUCAUSE: Rob

2010-02-08: Memento Meeting, San Francisco, Feb 2-3 2010

Image
The entire Memento team went to San Francisco, CA February 2-3, 2010 to meet with representatives from the Internet Archive , California Digital Library , Microsoft Research , Library of Congress , LOCKSS and WebBase . The full attendee list and agenda is available at the Memento site, including six detailed presentations. Based on the excellent feedback from the representatives, we ended up with two significant changes in our approach. The first change is simply moving the URI of the original resource (URI-R) from the Alternates: response header to a separate Link: header. The information returned from the TimeGate (URI-G) and Memento (URI-M) is the same, it has just moved from one header to another. The second change represents a larger change from the previous model. Instead of URI-R redirecting (302 response code) to URI-G when it sees an Accept-Datetime header, URI-R always returns one or more Link: response headers pointing to one or more TimeGates (whether or not the

2010-02-06: Superbowl XLIV

Image
Regardless of which team you are rooting for this is going to be a good football game. Both teams have explosive offenses captained by quarterbacks that are destined to be indoctrinated into the Hall of Fame. Peyton Manning is one cool character and if he can figure out the Saints defense the Colts are going to pull away and not look back. The Colts have been consistently good all season and they have a good chance of continuing that trend on Sunday. If the Colts have a weakness, it is their running game. Both offensively and defensively the Colts run game has performed below the league average. The Saints with Drew Brees, have the leagues best offense without question. They have more yards per attempt and less interceptions than the Colts. They can pass and run the ball very well and if they want to win they had better use it to their advantage. The Saints handicap is their defense. They are below the league average and the Saints secondary against Peyton makes me shudder. That being