A New Approach for Normalizing Continuous Data, Applicable in Parametric and Nonparametric Continuous Studies

Document Type : Original Paper

Authors

1 1 Department of Statistics, Higher Education Center of Eghlid, Eghlid, Islamic Republic of Iran.

2 2 Department of Statistics, Faculty of Sciences, University of Qom, Qom, Islamic Republic of Iran.

3 3 Department of Statistics, Mathematics and Insurance, Faculty of Commerce, Benha University, Egypt

Abstract

Satisfying the normality assumption is fundamental to many statistical inferences, as its violation can significantly affect the validity and reliability of conclusions drawn from the data. In this paper, we introduce a novel method for normalizing data that applies to both parametric and non-parametric cases. This method is grounded in a refined version of the empirical distribution function (EDF), which enhances its flexibility and accuracy compared to traditional normalization techniques. By leveraging this new EDF formulation, our approach effectively addresses common issues associated with existing methods, such as sensitivity to outliers and the inability to handle skewed distributions efficiently. A key advantage of our technique is its reversibility, which enables normalized data to be effortlessly transformed back into their original form, thereby preserving the integrity of the raw data for further analysis or interpretation. To demonstrate the efficacy of our method, we evaluate its performance using multiple real-world examples, including datasets related to the COVID-19 pandemic. These datasets, characterized by their complexity and variability, provide a rigorous test of the proposed normalization approach. The results confirm that our method successfully normalizes the data while maintaining their underlying structure and relationships, thus improving the robustness of subsequent statistical analyses. This innovation not only expands the toolkit available for data preprocessing but also enhances the applicability of standard statistical techniques to a broader range of real-life datasets.

Keywords

Main Subjects

1.    Graybill FA. The Theory and Applications of the Linear Model (London, Duxbury Press). 1976.
2.    Thöni H. A table for estimating the mean of a lognormal distribution. Journal of the American Statistical Association. 1969 Jun 1;64(326):632-6
3.    Hoyle MH. Transformations: An introduction and a bibliography. The International      Statistical Review. 1973; 41(203-223).
4.    Box GEP and Cox, DR. An analysis of transformations. Journal of the Royal Statistical Society. 1964 B; 26(211-234).
5.    Sakia RM. The Box-Cox transformation technique: A review. The Statistician. 1992; 41(169-178).
6.    Yeo IK, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000 Dec 1;87(4):954-9.
7.    Tukey JW. The comparative anatomy of transformations. Annals of Mathematical Statistics 1957; 28(602-632). 
8.    Atkinson AC, Riani M, Corbellini A. The box–cox transformation: Review and extensions. Statistical Science. 2021;36(2):239–55.
9.    Riani M, Atkinson AC, Corbellini A. Automatic robust Box–Cox and extended Yeo–Johnson transformations in regression. Statistical Methods & Applications. 2023;32(1):75-102.