Text classification of traditional and national songs using naïve bayes algorithm

(1) Triyanti Simbolon Mail (Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
(2) * Aji Prasetya Wibawa Mail (Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
(3) Ilham Ari Elbaith Zaeni Mail (Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
(4) Amelia Ritahani Ismail Mail (Department of Computer Science, International Islamic University Malaysia, Malaysia)
*corresponding author

Abstract


In this research, we investigate the effectiveness of the multinomial Naïve Bayes algorithm in the context of text classification, with a particular focus on distinguishing between folk songs and national songs. The rationale for choosing the Naïve Bayes method lies in its unique ability to evaluate word frequencies not only within individual documents but across the entire dataset, leading to significant improvements in accuracy and stability. Our dataset includes 480 folk songs and 90 national songs, categorized into six distinct scenarios, encompassing two, four, and 31 labels, with and without the application of Synthetic Minority Over-sampling Technique (SMOTE). The research journey involves several essential stages, beginning with pre-processing tasks such as case folding, punctuation removal, tokenization, and TF-IDF transformation. Subsequently, the text classification is executed using the multinomial Naïve Bayes algorithm, followed by rigorous testing through k-fold cross-validation and SMOTE resampling techniques. Notably, our findings reveal that the most favorable scenario unfolds when SMOTE is applied to two labels, resulting in a remarkable accuracy rate of 93.75%. These findings underscore the prowess of the multinomial Naïve Bayes algorithm in effectively classifying small data label categories.

Keywords


Traditional songs; National songs; Multinomial naïve bayes; SMOTE; Text classification

   

DOI

https://doi.org/10.31763/sitech.v3i2.1215
      

Article metrics

10.31763/sitech.v3i2.1215 Abstract views : 352 | PDF views : 131

   

Cite

   

Full Text

Download

References


[1] M. G. C. Njoku, L. A. Jason, and R. B. Johnson, “Global Perspectives on Personal Peace, Children and Adolescents, and Social Justice,” in The Psychology of Peace Promotion, M. G. C. Njoku, L. A. Jason, and R. B. Johnson, Eds. Cham: Springer International Publishing, p. 251, 2019, doi: 10.1007/978-3-030-14943-7 .

[2] L. Mueller et al., “Agricultural Landscapes: History, Status and Challenges,” in In: Mueller, L., Sychev, V.G., Dronin, N.M., Eulenstein, F. (eds) Exploring and Optimizing Agricultural Landscapes. Innovations in Landscape Research, 2021, pp. 3–54, doi: 10.1007/978-3-030-67448-9.

[3] R. Mountain, “Music: a versatile interface for explorations in art & science,” Interdiscip. Sci. Rev., vol. 47, no. 2, pp. 243–258, Apr. 2022, doi: 10.1080/03080188.2022.2035107.

[4] D. D. Wiebe, “Music and Religion: Trends in Recent English-Language Literature (2015–2021),” Religions, vol. 12, no. 10, p. 833, Oct. 2021, doi: 10.3390/rel12100833.

[5] Y. Zhu, “Conformity and Contestation in Cultural Production,” in Media Power and its Control in Contemporary China, Singapore: Springer Nature Singapore, 2022, pp. 37–78, doi: 10.1007/978-981-19-6917-1_2 .

[6] J. D. Lomas and H. Xue, “Harmony in Design: A Synthesis of Literature from Classical Philosophy, the Sciences, Economics, and Design,” She Ji J. Des. Econ. Innov., vol. 8, no. 1, pp. 5–64, 2022, doi: 10.1016/j.sheji.2022.01.001.

[7] A. Silke and J. Morrison, “Gathering Storm: An Introduction to the Special Issue on Climate Change and Terrorism,” Terror. Polit. Violence, vol. 34, no. 5, pp. 883–893, Jul. 2022, doi: 10.1080/09546553.2022.2069444.

[8] Y. Fauziah, S. Saifullah, and A. S. Aribowo, “Design Text Mining for Anxiety Detection using Machine Learning based-on Social Media Data during COVID-19 pandemic,” in Proceeding of LPPM UPN “Veteran” Yogyakarta Conference Series 2020–Engineering and Science Series, 2020, vol. 1, no. 1, pp. 253–261. [Online]. Available at: https://proceeding.researchsynergypress.com/index.php/ess/article/view/117.

[9] M. N. Asim, M. Wasim, M. U. Ghani Khan, N. Mahmood, and W. Mahmood, “The Use of Ontology in Retrieval: A Study on Textual, Multilingual, and Multimedia Retrieval,” IEEE Access, vol. 7, pp. 21662–21686, 2019, doi: 10.1109/ACCESS.2019.2897849.

[10] A. Ali and N. Nimat Saleem, “Classification of Software Systems attributes based on quality factors using linguistic knowledge and machine learning: A review.,” J. Educ. Sci., vol. 31, no. 3, pp. 66–90, Sep. 2022, doi: 10.33899/edusj.2022.134024.1245.

[11] S. Kusal, S. Patil, J. Choudrie, K. Kotecha, D. Vora, and I. Pappas, “A Review on Text-Based Emotion Detection -- Techniques, Applications, Datasets, and Future Directions,” p. 74, Apr. 2022. [Online]. Available at: https://arxiv.org/abs/2205.03235.

[12] J. Zhang, S. Wang, L. Chen, and P. Gallinari, “Multiple Bayesian discriminant functions for high-dimensional massive data classification,” Data Min. Knowl. Discov., vol. 31, no. 2, pp. 465–501, Mar. 2017, doi: 10.1007/s10618-016-0481-y.

[13] S. Saifullah, Y. Fauziah, and A. S. Aribowo, “Comparison of Machine Learning for Sentiment Analysis in Detecting Anxiety Based on Social Media Data,” pp. 45-55, Jan. 2021. [Online]. Available at: 10.26555/jifo.v15i1.a20111.

[14] M. Abbas, S. Memon, K. A. Memon, A. A. Jamali, and A. Ahmed, “Multinomial Naive Bayes Classification Model for Sentiment Analysis,” Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 3, pp. 62–67, 2019, doi: 10.13140/RG.2.2.30021.40169.

[15] A. A. Farisi, Y. Sibaroni, and S. Al Faraby, “Sentiment analysis on hotel reviews using Multinomial Naïve Bayes classifier,” J. Phys. Conf. Ser., vol. 1192, p. 012024, Mar. 2019, doi: 10.1088/1742-6596/1192/1/012024.

[16] A. P. Ardhana, D. E. Cahyani, and Winarno, “Classification of Javanese Language Level on Articles Using Multinomial Naive Bayes and N-Gram Methods,” J. Phys. Conf. Ser., vol. 1306, no. 1, p. 012049, Aug. 2019, doi: 10.1088/1742-6596/1306/1/012049.

[17] A. T. Akbar, R. Husaini, B. M. Akbar, and S. Saifullah, “A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator,” J. Teknol. dan Sist. Komput., vol. 8, no. 4, pp. 276–283, Oct. 2020, doi: 10.14710/jtsiskom.2020.13625.

[18] B. Santoso, H. Wijayanto, K. A. Notodiputro, and B. Sartono, “Synthetic Over Sampling Methods for Handling Class Imbalanced Problems : A Review,” IOP Conf. Ser. Earth Environ. Sci., vol. 58, p. 012031, Mar. 2017, doi: 10.1088/1755-1315/58/1/012031.

[19] D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans. Neural Networks Learn. Syst., vol. 34, no. 9, pp. 6390–6404, Sep. 2023, doi: 10.1109/TNNLS.2021.3136503.

[20] A. R. Safitri and M. A. Muslim, “Improved Accuracy of Naive Bayes Classifier for Determination of Customer Churn Uses SMOTE and Genetic Algorithms,” J. Soft Comput. Explor., vol. 1, no. 1, pp. 70-75, Sep. 2020, doi: 10.52465/joscex.v1i1.5.

[21] P. Lestari and L. H. Sihombing, “The Portrait of Nationalism in The Superman Is Dead’s Song, Jadilah Legenda,” Virtuoso J. Pengkaj. dan Pencipta. Musik, vol. 5, no. 1, pp. 57–64, Jun. 2022, doi: 10.26740/vt.v5n1.p57-64.

[22] M. I. Munandar and J. Newton, “Indonesian EFL teachers’ pedagogic beliefs and classroom practices regarding culture and interculturality,” Lang. Intercult. Commun., vol. 21, no. 2, pp. 158–173, Mar. 2021, doi: 10.1080/14708477.2020.1867155.

[23] E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-processing in Sentiment Analysis,” Procedia Comput. Sci., vol. 17, pp. 26–32, 2013, doi: 10.1016/j.procs.2013.05.005.

[24] E. O. Abiodun, A. Alabdulatif, O. I. Abiodun, M. Alawida, A. Alabdulatif, and R. S. Alkhawaldeh, “A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities,” Neural Comput. Appl., vol. 33, no. 22, pp. 15091–15118, Nov. 2021, doi: 10.1007/s00521-021-06406-8.

[25] K. Maharana, S. Mondal, and B. Nemade, “A review: Data pre-processing and data augmentation techniques,” Glob. Transitions Proc., vol. 3, no. 1, pp. 91–99, Jun. 2022, doi: 10.1016/j.gltp.2022.04.020.

[26] M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” Appl. Sci., vol. 12, no. 17, p. 8765, Aug. 2022, doi: 10.3390/app12178765.

[27] U. Hasanah, T. Astuti, R. Wahyudi, Z. Rifai, and R. A. Pambudi, “An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian,” in 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), Nov. 2018, pp. 230–234, doi: 10.1109/ICITISEE.2018.8720957.

[28] R. Egger and E. Gokce, “Natural Language Processing (NLP): An Introduction,” in In: Egger, R. (eds) Applied Data Science in Tourism. Tourism on the Verge., 2022, pp. 307–334, doi: 10.1007/978-3-030-88389-8_15.

[29] N. H. Cahyana, S. Saifullah, Y. Fauziah, A. S. Aribowo, and R. Drezewski, “Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 10, pp. 147-151, 2022, doi: 10.14569/IJACSA.2022.0131020.

[30] T. Dogan and A. K. Uysal, “A novel term weighting scheme for text classification: TF-MONO,” J. Informetr., vol. 14, no. 4, p. 101076, Nov. 2020, doi: 10.1016/j.joi.2020.101076.

[31] Z. Jiang and H. N. Huynh, “Unveiling music genre structure through common-interest communities,” Soc. Netw. Anal. Min., vol. 12, no. 1, p. 35, Dec. 2022, doi: 10.1007/s13278-022-00863-2.

[32] E. Dias Canedo and B. Cordeiro Mendes, “Software Requirements Classification Using Machine Learning Algorithms,” Entropy, vol. 22, no. 9, p. 1057, Sep. 2020, doi: 10.3390/e22091057.

[33] N. J. Prottasha et al., “Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning,” Sensors, vol. 22, no. 11, p. 4157, May 2022, doi: 10.3390/s22114157.

[34] S. Kumar, A. Sharma, B. K. Reddy, S. Sachan, V. Jain, and J. Singh, “An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation,” Int. J. Syst. Assur. Eng. Manag., vol. 13, no. 3, pp. 1341–1355, Jun. 2022, doi: 10.1007/s13198-021-01471-7.

[35] H. I. Abdalla and A. A. Amer, “On the integration of similarity measures with machine learning models to enhance text classification performance,” Inf. Sci. (Ny)., vol. 614, pp. 263–288, Oct. 2022, doi: 10.1016/j.ins.2022.10.004.

[36] A. P. Rodrigues et al., “Real-Time Twitter Spam Detection and Sentiment Analysis using Machine Learning and Deep Learning Techniques,” Comput. Intell. Neurosci., vol. 2022, pp. 1–14, Apr. 2022, doi: 10.1155/2022/5211949.

[37] V. Rupapara, F. Rustam, H. F. Shahzad, A. Mehmood, I. Ashraf, and G. S. Choi, “Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model,” IEEE Access, vol. 9, pp. 78621–78634, 2021, doi: 10.1109/ACCESS.2021.3083638.

[38] M. Temraz and M. T. Keane, “Solving the class imbalance problem using a counterfactual method for data augmentation,” Mach. Learn. with Appl., vol. 9, p. 100375, Sep. 2022, doi: 10.1016/j.mlwa.2022.100375.

[39] P. Shamsolmoali, M. Zareapoor, L. Shen, A. H. Sadka, and J. Yang, “Imbalanced data learning by minority class augmentation using capsule adversarial networks,” Neurocomputing, vol. 459, pp. 481–493, Oct. 2021, doi: 10.1016/j.neucom.2020.01.119.

[40] D. F. Oliveira, A. S. Nogueira, and M. A. Brito, “Performance Comparison of Machine Learning Algorithms in Classifying Information Technologies Incident Tickets,” AI, vol. 3, no. 3, pp. 601–622, Jul. 2022, doi: 10.3390/ai3030035.

[41] A. Vabalas, E. Gowen, E. Poliakoff, and A. J. Casson, “Machine learning algorithm validation with a limited sample size,” PLoS One, vol. 14, no. 11, p. e0224365, Nov. 2019, doi: 10.1371/journal.pone.0224365


Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Aji Prasetya Wibawa, Triyanti Simbolon, Ilham Ari Elbaith Zaeni, Amelia Ritahani Ismail

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
Science in Information Technology Letters
ISSN 2722-4139
Published by Association for Scientific Computing Electrical and Engineering (ASCEE)
W : http://pubs2.ascee.org/index.php/sitech
E : sitech@ascee.org, andri@ascee.org, andri.pranolo.id@ieee.org

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

View My Stats