Document Type : Original Paper

Authors

1 Department of computer science, Kosar University of Bojnord, Iran.

2 Student Member, IEEE

3 Faculty of Mathematics and Computer Science, Kharazmi University, Tehran, IRAN

4 Science and Research Branch, Islamic Azad University, kerman, Iran

Abstract

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large number of features to attend as they play an essential role in detection efficiency. In this article, we're working on a feature selection method to e-mail spam. This approach is considered a hybrid of optimization algorithms and classifiers in machine learning. Binary Whale Optimization (BWO) and Binary Grey Wolf Optimization (BGWO) algorithms are used for feature selection and K-Nearest Neighbor (KNN) and Fuzzy K-Nearest Neighbor (FKNN) algorithms are applied as the classifiers in this research. The proposed method is tested on the "SPAMBASE" datasets from UCI Machine learning Repesotries and the experimental results revealed the highest accuracy of 97.61% on this dataset. The obtained results indicateed that the proposed method is suitable and capable to provide excellent performance in comparison with other methods.

Keywords

  1. Esmaeili M., Arjomandzadeh A. and Shams R., An Anti-Spam System using Naive Bayes Method and Feature Selection Methods. Int. J. Comput. Appl. 165(4):1-5 (2017).
  2. Rusland NF., Wahid N., Kasim S. and Hafit H., Analysis of Naive Bayes Algorithm for Email Spam Filtering across Multiple Dataset. Iop Conf. Ser. Mater. Sci. Eng. 226: 1-9 (2017).
  3. Zhang Y., Wang S., Phillips P. and Genlin J., Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl. Based. Syst. 64: 22-31 (2014).
  4. Idris I., Selamat A. and Omat S., Hybrid email spam detection model with negative selection algorithm and differential evolution.  Eng. Appl. Artif. Intel. 28: 97–110 (2014).
  5. Sohrabi MK. and Karim F., A Feature Selection Approach to Detect Spam in the Facebook Social Network. Arab J. Sci. Eng. 43: 949–958 (2018).
  6. Rajamohana SP. and Umamaheswari K., A Hybrid Approach to Optimize Feature Selection Process Using iBPSO- BFPA for Review Spam Detection. Appl. Math. Inform. Sci. 11(5): 1443-1449 (2017).
  7. singh S. and singh Ak., web-spam features selection using CFS-PSO. Procedia. Comput. Sci. 125: 568–575 (2018).
  8. Sharma S. and Arora A., Adaptive approach for spam detection. Int. J. Comput. Sci. Netw. 10(4): 23-26 (2013).
  9. Abdulhamid SM., Shuaib M., Osho O., Ismaila I. and Alhassan JK., Comparative Analysis of Classification Algorithms for Email Spam Detection.I. J. Computer Network and Information Security60-67 ( 2018).
  10. Feng W., Sun J., Zhang L., Cao C. and Yang Q., A support vector machine based naive Bayes algorithm for spam filtering. 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) 1-8 (2016).
  11. Bassiouni M., Ali M., and El-Dahshan EA., Ham and Spam E-Mails Classification Using Machine Learning Techniques. J. Appl. Secur. Res. 13(3): 315-331 (2018).
  12. Oreski S. and Oreski G., Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert. Syst. Appl. 41: 2052–2064 (2014).
  13. Roberto HW., George DC. and Renato FC., A global-ranking local feature selection method for text categorization. Expert. Syst. Appl. 39(17): 12851–12857 (2012).
  14. Mirjalili S. and Lewis A., the whale optimization algorithm. Adv. Eng. Softw. 95: 51-67 (2016).
  15. Hussien AG., Houssein EH., Hassanien AE., A binary whale optimization algorithm with hyperbolic tangent fitness function for feature selection. 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS)  166-172 (2017).
  16. Amine LM, and Nadjet K.,  A Multiobjective Binary Bat Algorithm. Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication(IPAC ’15) 75: 1-5 (2015).
  17. Lua C., Gaob L. and Yic J., Grey Wolf Optimizer with Cellular Topological Structure.Expert. Syst. Appl. 107:  89-114 (2018).
  18. Dipayan G., Kumar RP. and Subrata B., Load frequency control of large scale power system using quasi-oppositional grey wolf optimization algorithm. Eng. Sci. Technol. Int. J. 19: 1693–1713 (2016).
  19. Emary E., Zawbaa HM.  and Hassanien AE., Binary grey wolf optimization approaches for feature selection. Neurocomputing 172: 371-381 (2016).
  20. Jabbar M., Prediction of heart disease using k-nearest neighbor and particle swarm optimization. Biomed. Res. 28(9):1-10 (2017).
  21. Keller JM., Gray MR. and Givens JA., A fuzzy k-nearest neighbor algorithm. IEEE T. Syst. Man Cyb. 15(4): 580–5 (1985).
  22. Patel H. and Thakur GS., An Improved Fuzzy K-Nearest Neighbor Algorithm for Imbalanced Data using Adaptive Approach. Iete. J. Res. 65(6): 1-10 (2018).
  23. Shang W., Huang H., Zhu h., Lin Y., Qu Y. and Wang Z., A novel feature selection algorithm for text categorization. Expert. Syst. Appl. 33: 1–5 (2007).
  24. UCI Machine Learning Repository Spambase Dataset. http://archive.ics.uci.edu/ml/datasets/Spambase.