Abstract—Due to the large volume of data set as well as complex and dynamic properties of data instances, several data mining algorithms have been applied for mining complex data streams in the last decades. Now a day, knowledge extraction from data streams is getting more complex because the structure of the data instance does not match the attribute values when considering the tabulated data, texts, web, images or videos etc. In this paper, we address some difficulties of mining complex data streams such as dealing with continuous attributes, input attribute selection, and classifier construction. The proposed discretization algorithm finds the possible cut points in continuous attributes using information gain heuristic and naïve Bayesian classifier that can separate the class distributions. We evaluate the proposed algorithms on several benchmark data sets from UCI machine learning repository. The experimental results demonstrate that the proposed method improves the quality of discretization of continuous attributes and scales up the classification accuracy for different types of classification problem.
Index Terms—cut point, contradictory example, data stream, data mining, interval border, redundant attribute
Cite: Dewan Md. Farid and Chowdhury Mofizur Rahman, "Mining Complex Data Streams: Discretization, Attribute Selection and Classification," Journal of Advances in Information Technology, Vol. 4, No. 3, pp. 129-135, August, 2013.doi:10.4304/jait.4.3.129-135
Copyright © 2013-2020. JAIT. All Rights Reserved
This work is licensed under the Creative Commons Attribution License (CC BY-NC-ND 4.0)