Home > Published Issues > 2010 > Volume 1, No. 3, August 2010 >

Discrete Characterization of Domain Using Semantic Clustering

Sanjay Madan1 and Shalini Batra2
1. Comviva Technologies Ltd., MBS-PACS, Gurgaon, India.
2. Computer Science and Engineering Department, Thapar University, Patiala, Punjab, India

Abstract—Lots of approaches have been developed to understand the software source code and majority of them are focused on program structural information which results in the loss of domain semantic crucial information contained in the text or symbols of source code. To understand software as a whole, we need to enrich these approaches with conceptual insights gained from the domain semantics. This paper proposes the mapping of domain to the code using the information retrieval techniques to use linguistic information, such as identifier names and comments in source code. Concept of Semantic Clustering has been introduced in this paper and an algorithm has been provided to group source artifacts based on how the synonymy and polysemy is related. Based on semantic similarity automatic labeling of the program code is done after detecting the clusters, and is visually explore in 3-Dimension for discrete characterization. This approach works at the source code textual level which makes it language independent. The approach correlates the semantics with structural information applies at different levels of abstraction (e.g. packages, classes, methods).

Index Terms— Information retrieval, Semantic clustering, Software reverse engineering

Cite: Sanjay Madan and Shalini Batra, "Discrete Characterization of Domain Using Semantic Clustering," Journal of Advances in Information Technology, Vol. 1, No. 3, pp. 127-132, August, 2010.doi:10.4304/jait.1.3.127-132