Document Type : Original Paper
Authors
1 Department of Statistics, Taribait modares university
2 Department of Statistics, Tarbiat Modares University, Tehran, Islamic Republic of Iran
Abstract
Clustering, a fundamental multivariate statistical method, serves as a valuable tool for extracting meaningful insights from complex datasets. Analyzing high-dimensional data, however, presents challenges, notably the curse of dimensionality. While various methods have been developed to address the dimensionality reduction, most overlooked the role of dependent variables. In contrast, supervised clustering leverages the inherent information in response variables, offering substantial benefits in data dimension reduction and accelerating clustering computations. This paper evaluates the efficacy of supervised clustering in the analysis of Persian handwritten images. Focusing on the multi-class nature of Persian handwritten data, the identification of important variables for each digit not only reduces data dimensionality but also this reduction in dimensionality does not compromise the accuracy of predicting new observations at any stage of the algorithm. Additionally, the approach demonstrates relatively high accuracy in predicting the response variable. This study contributes a novel perspective toward clustering methods, highlighting the integration of supervised techniques for improved performance in high-dimensional data analysis.
Keywords
Main Subjects