Home > Published Issues > 2015 > Volume 6, No. 4, November 2015 >

A Novel System for Document Classification Using Genetic Programming

Saad M. Darwish, Adel A. EL-Zoghabi, and Doaa B. Ebaid
Institute of Graduate Studies and Researches, Alexandria University, Egypt

Abstract—With the increasing availability of electronic documents and the rapid growth of the World Wide Web, the task of automatic categorization of documents became the key method for organizing the information and knowledge discovery. Document retrieval, categorization, routing and filtering can all be formulated as classification problems. The complexity of natural languages and the extremely high dimensionality of the feature space of documents have made this classification problem very difficult. The proposed work mitigates this difficult by providing an algorithm to classify documents into more than two categories (multi-class classification) at the same time by combining multi-objective technique with the genetic programming of classifiers based on multi-tree representation of documents. This combination has the potential to attain lower errors because classification accuracy on each class is represented as a distinct objective. Empirical evaluations show encouraging results and confirm that the proposed algorithm is feasible and effective.

Index Terms—document classification, genetic programming, multi-objective techniques, multi-tree representation

Cite: Saad M. Darwish, Adel A. EL-Zoghabi, and Doaa B. Ebaid, "A Novel System for Document Classification Using Genetic Programming," Vol. 6, No. 4, pp. 194-200, November, 2015. doi: 10.12720/jait.6.4.194-200