Home > Published Issues > 2010 > Volume 1, No. 1, February 2010 >

Multilingual Context Ontology Rule Enhanced Focused Web Crawler

Mukesh Kumar and Renu Vig
University Institute of Engineering and Technology, Panjab University, Chandigarh ,India

Abstract—Rapidly growing size and increasing number of Non-English resources on World-Wide-Web poses unprecedented challenges for general purpose crawlers and Search Engines. It is impossible for any search engine to index the complete Web. Focused crawler cope with the growing size by selectively seeking out pages that are relevant to a predefined set of topics and avoiding irrelevant regions of the Web. Rather than collecting and indexing all accessible Web documents, focused crawler analyses its crawl boundary to find the links likely to be the most relevant for the crawl. This paper presents a focused crawler whose crawl strategy is based upon the scores calculated from context ontologies and adaptive classification rules, and which is capable to deal with intermediate multilinguity situations (the situations in which the query language is same as that of target language but the intermediate path may pass through some pages which are written in mixed, in query and some other language, way). It enhances the quality of pages retrieved, because it may be possible that the English meaning of the other language word sequence may itself or point to some pages which are most relevant to the query, and hence should be included in the results, which, yet, are left untouched by all the existing crawlers.

Index Terms—Focused Crawler, Search Engines, Information Retrieval, Ontology, Adaptive Rules

Cite: Mukesh Kumar and Renu Vig, "Multilingual Context Ontology Rule Enhanced Focused Web Crawler," Journal of Advances in Information Technology, Vol. 1, No. 1, pp. 21-25, February, 2010.doi:10.4304/jait.1.1.21-25