Loading...
Please wait, while we are loading the content...
Similar Documents
Using the Web Infrastructure for Just-InTime Recovery of Missing Web Pages [ Extended
| Content Provider | Semantic Scholar |
|---|---|
| Author | Klein, Martin |
| Copyright Year | 2007 |
| Abstract | The Internet provides access to a great number of web sites, but the structure of the web is constantly changing. Missing web pages remain a pervasive problem that users experience every day. This dissertation is about creating a method to overcome this problem by automatically mapping between Uniform Resource Identifiers (URIs) and textual content of web pages using lexical signatures (LSs) and tags. We introduce a “just-in-time” approach to support the preservation of web content relying on the “living” web. We propose a method to harness the collective behavior of the Web Infrastructure and investigate the suitability of lexical signatures and tags to give a “good enough” description of the “aboutness” of missing pages. Utilizing Internet search engines by querying these LSs will return the replacement page or a very similar page which can be provided to the user. We investigate the evolution of lexical signatures over time and propose a framework to aid in the creation of LSs. Analyzing snapshots of the web from recent years will enable us to investigate the decay of such lightweight descriptions and also the characteristics of missing pages (http error code 404). We propose to evaluate and measure the quality of the framework with information retrieval methods such as precision and recall. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.cs.odu.edu/~mklein/publications/jcdl2007-doctoral_consortium.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |