Home > Published Issues > 2026 > Volume 17, No. 5, 2026 >
JAIT 2026 Vol.17(5): 914-928
doi: 10.12720/jait.17.5.914-928

Considering Cluster Validity in Attribute Extension for Small Data Set Predictions

Luu-Ly Tran 1, Chih-Chieh Chang 2,*, and Hsiang-An Yu 3
1. Department of Information Management, National Taiwan University of Science and Technology, Taiwan
2. School of Management, National Taiwan University of Science and Technology, Taiwan
3. Business Administration Department, National Taiwan University of Science and Technology, Taiwan
Email: M11309813@mail.ntust.edu.tw (L.-L.T.); ccchang@mail.ntust.edu.tw (C.-C.C.); m11121026@mail.ntust.edu.tw (H.-A.Y.)
*Corresponding author

Manuscript received October 17, 2025; revised November 24, 2025; accepted February 6, 2026; published May 13, 2026.

Abstract—Cluster validity has been widely used in determining the optimal clusters with a huge data sample size in recent years. However, there is less discussion of the validity cluster in small data sizes. This study presents a new approach, which considers the cluster validity to improve predictive ability for small data set problems. The first step of the proposed method is the use of the K-means data clustering technique, with seven cluster validity indices to determine the optimal number of clusters; and the second step is to build up the attribute extending function for each attribute in clusters to generate new attributes by computing the membership possibility. Finally, cross-validation and t-tests are used on two real manufacturing cases of Thin-Film Transistor Liquid Crystal Display (TFT-LCD) quality and Photo-Spacer Height (PSH) to verify the effectiveness of the proposed method with backpropagation neural networks (BPNN) and support vector machine for regression (SVR) forecasting methods. The results show that the combinations of C-index and attribute extension yields significantly lower forecasting errors, reduced variance, and statistically validated superiority over other cluster-validity indices and over baseline BPNN and SVR and linear regression (LR) models without attribute extension.
 
Keywords—cluster validity index, small data set, k-means, support vector regression, mega-trend diffusion

Cite: Luu-Ly Tran, Chih-Chieh Chang, and Hsiang-An Yu, "Considering Cluster Validity in Attribute Extension for Small Data Set Predictions," Journal of Advances in Information Technology, Vol. 17, No. 5, pp. 914-928, 2026. doi: 10.12720/jait.17.5.914-928

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions