A Hybrid Revisit Policy For Web Search

General Information

ISSN: 1798-2340 (Online)
Frequency: Monthly
DOI: 10.12720/jait
Indexing: ESCI (Web of Science), Scopus, CNKI, etc.
Acceptance Rate: 19%
APC: 500 USD
Average Days to Accept: 135 days
Journal Metrics:

Impact Factor 2022: 1.0

3.1

2022CiteScore

49th percentile

Powered by

Editor-in-Chief

Prof. Kin C. Yow

University of Regina, Saskatchewan, Canada

I'm delighted to serve as the Editor-in-Chief of Journal of Advances in Information Technology. JAIT is intended to reflect new directions of research and report latest advances in information technology. I will do my best to increase the prestige of the journal.

What's New

2024-03-28

Vol. 15, No. 3 has been published online!

2024-02-26

The papers published in Vol. 15, Nos. 1&2 have been registered with Crossref.

2024-02-26

Vol. 15, No. 2 has been published online!

Home > Published Issues > 2012 > Volume 3, No. 1, February 2012 >

Vipul Sharma, Mukesh Kumar, and Renu Vig

UIET, Panjab University Chandigarh, INDIA

Abstract – A crawler is a program that retrieves and stores pages from the Web, commonly for a Web search engine. A crawler often has to download hundreds of millions of pages in a short period of time and has to constantly monitor and refresh the downloaded pages. Once the crawler has downloaded a significant number of pages, it has to start revisiting the downloaded pages in order to refresh the downloaded collection. Due to resource constraints, search engines usually have difficulties keeping the entire local repository synchronized with the web. Given the size of web today and inherent resource constraints: re-crawling too frequently leads to wasted bandwidth, re-crawling too infrequently brings down the quality of the search engine. In this paper a hybrid approach is build on the basis of which a web crawler maintains the retrieved pages “fresh” in the local collection. Towards this goal the concept of Page rank and Age of a web page is used. As higher page rank means that more number of users are visiting that very web page and that page has higher link popularity. Age of web page is a measure that indicates how outdated the local copy is. Using these two parameters a hybrid approach is proposed that can identify important pages at the early stage of a crawl, and the crawler re-visit these important pages with higher priority.

Index Terms – Revisit Policy, Search Engines, Web Crawler

Cite: Vipul Sharma, Mukesh Kumar, and Renu Vig, "A Hybrid Revisit Policy For Web Search," Journal of Advances in Information Technology, Vol. 3, No. 1, pp. 36-47, February, 2012.doi:10.4304/jait.3.1.36-47

v3n1-07

PREVIOUS PAPER

Activity Recognition in Ubiquitous Learning Environment

NEXT PAPER

Multivariable control of nonlinear process using soft computing techniques

Home

Author Guide

Editor Guide

Reviewer Guide

Published Issues

Special Issue

Sections and Topics

journal menu