Performance analysis of random forest on quartile classification journal

Cornaldo Beliarding Sucahyo; Fajriwati Qoyyum Rizqini; Ayyub Naufal; Hengky Yandratama; Jabar Ash Shiddiqy; Agung Bella Putra Utama; Nastiti Susetyo Fanany Putri; Aji Prasetya Wibawa

doi:10.31763/aet.v3i1.1189


Performance analysis of random forest on quartile classification journal

⁽¹⁾ Cornaldo Beliarding Sucahyo

(Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
⁽²⁾ Fajriwati Qoyyum Rizqini

(Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
⁽³⁾ Ayyub Naufal

(Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
⁽⁴⁾ Hengky Yandratama

(Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
⁽⁵⁾ Jabar Ash Shiddiqy

(Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
⁽⁶⁾ Agung Bella Putra Utama

(Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
⁽⁷⁾ Nastiti Susetyo Fanany Putri

(Department of Information Science and Engineering, Faculty of Science and Engineering, Saga University, Japan)
^{(8) *} Aji Prasetya Wibawa

(Department of Electrical Engineering and Informatics, Faculty of Engineering, Universitas Negeri Malang, Indonesia)
^*corresponding author

Abstract

Journals play a pivotal role in disseminating scientific knowledge, housing a multitude of valuable research articles. In this digital age, the evaluation of journals and their quality is essential. The SCImago Journal Rank (SJR) stands as one of the prominent platforms for ranking journals, categorizing them into five index classes: Q1, Q2, Q3, Q4, and NQ. Determining these index classes often relies on classification methodologies. This research, drawing inspiration from the Cross-Industry Standard Process for Data Mining (CRISP-DM), seeks to employ the Random Forest method to classify journals, thus contributing to the refinement of journal ranking processes. Random Forest stands out as a robust choice due to its remarkable ability to mitigate overfitting, a common challenge in machine learning classification tasks. In the context of approximating SJR index classes, Random Forest, when utilizing the Gini index, exhibits promise, albeit with an initial accuracy rate of 62.12%. The Gini index, an impurity measure, enables Random Forest to make informed decisions while classifying journals into their respective SJR index classes. However, it is worth noting that this accuracy rate represents a starting point, and further refinement and feature engineering may enhance the model's performance. This research underscores the significance of machine learning techniques in the domain of journal classification and journal-ranking systems. By harnessing the power of Random Forest, this study aims to facilitate more accurate and efficient categorization of journals, thereby aiding researchers, academics, and institutions in identifying and accessing high-quality scientific literature.

Keywords

Journal; Random Forest; SCImago Journal Rank; CRISP-DM

DOI

https://doi.org/10.31763/aet.v3i1.1189

Article metrics

10.31763/aet.v3i1.1189 Abstract views : 218 | PDF views : 145

Cite

How to cite item

Full Text

Download

References

[1] A. P. Wibawa, A. C. Kurniawan, H. A. Rosyid, and A. M. M. Salah, “International Journal Quartile Classification Using the K-Nearest Neighbor Method,” in 2019 International Conference on Electrical, Electronics and Information Engineering (ICEEIE), Oct. 2019, pp. 336–341, doi: 10.1109/ICEEIE47180.2019.8981413.

[2] K. D. S. Mendes, R. C. de C. P. Silveira, and C. M. Galvão, “Use of the bibliographic reference manager in the selection of primary studies in integrative reviews,” Texto Context. - Enferm., vol. 28, 2019, doi: 10.1590/1980-265x-tce-2017-0204.

[3] M. Hennink and B. N. Kaiser, “Sample sizes for saturation in qualitative research: A systematic review of empirical tests,” Soc. Sci. Med., vol. 292, p. 114523, Jan. 2022, doi: 10.1016/j.socscimed.2021.114523.

[4] I. Ianoş and A.-I. Petrişor, “An Overview of the Dynamics of Relative Research Performance in Central-Eastern Europe Using a Ranking-Based Analysis Derived from SCImago Data,” Publications, vol. 8, no. 3, p. 36, Jul. 2020, doi: 10.3390/publications8030036.

[5] P. dwi Nurfadila, A. P. Wibawa, I. A. E. Zaeni, and A. Nafalski, “Journal Classification Using Cosine Similarity Method on Title and Abstract with Frequency-Based Stopword Removal,” Int. J. Artif. Intell. Res., vol. 3, no. 2, Jul. 2019, doi: 10.29099/ijair.v3i2.99.

[6] R. P. Adiperkasa, A. P. Wibawa, I. A. E. Zaeni, and T. Widiyaningtyas, “International Reputable Journal Classification Using Inter-correlated Naïve Bayes Classifier,” in 2019 2nd International Conference of Computer and Informatics Engineering (IC2IE), Sep. 2019, pp. 49–52, doi: 10.1109/IC2IE47452.2019.8940887.

[7] A. P. Wibawa et al., “Naïve Bayes Classifier for Journal Quartile Classification,” Int. J. Recent Contrib. from Eng. Sci. IT, vol. 7, no. 2, p. 91, 2019, doi: 10.3991/ijes.v7i2.10659.

[8] J. N.P. and R. Aruna, “Big data analytics in health care by data mining and classification techniques,” ICT Express, vol. 8, no. 2, pp. 250–257, Jun. 2022, doi: 10.1016/j.icte.2021.07.001.

[9] K. Trang and A. H. Nguyen, “A Comparative Study of Machine Learning-based Approach for Network Traffic Classification,” Knowl. Eng. Data Sci., vol. 4, no. 2, p. 128, Jan. 2022, doi: 10.17977/um018v4i22021p128-137.

[10] R. Snell et al., “Methods for Rapid Pore Classification in Metal Additive Manufacturing,” JOM, vol. 72, no. 1, pp. 101–109, Jan. 2020, doi: 10.1007/s11837-019-03761-9.

[11] N. S. Pangaribuan and F. Marpaung, “Analysis of Corn Agriculture Data to Predict Harvest Results with Data Mining Algorithm C4. 5,” Login J. Teknol., vol. 14, no. 2, pp. 235–243, 2020. [Online]. Available at: https://login.seaninstitute.org/index.php/Login/article/view/53.

[12] S. A. Salloum, M. Alshurideh, A. Elnagar, and K. Shaalan, “Mining in Educational Data: Review and Future Directions,” 2020, pp. 92–102, doi: 10.1007/978-3-030-44289-7_9.

[13] M. A. Ledhem, “Data mining techniques for predicting the financial performance of Islamic banking in Indonesia,” J. Model. Manag., vol. 17, no. 3, pp. 896–915, Aug. 2022, doi: 10.1108/JM2-10-2020-0286.

[14] O. Almadani and R. Alshammari, “Prediction of Stroke using Data Mining Classification Techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 1, 2018, doi: 10.14569/IJACSA.2018.090163.

[15] S. Zhang, “Cost-sensitive KNN classification,” Neurocomputing, vol. 391, pp. 234–242, May 2020, doi: 10.1016/j.neucom.2018.11.101.

[16] G. M. Borkar, L. H. Patil, D. Dalgade, and A. Hutke, “A novel clustering approach and adaptive SVM classifier for intrusion detection in WSN: A data mining concept,” Sustain. Comput. Informatics Syst., vol. 23, pp. 120–135, Sep. 2019, doi: 10.1016/j.suscom.2019.06.002.

[17] F.-J. Yang, “An Implementation of Naive Bayes Classifier,” in 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Dec. 2018, pp. 301–306, doi: 10.1109/CSCI46756.2018.00065.

[18] S. Mishra, H. K. Tripathy, P. K. Mallick, A. K. Bhoi, and P. Barsocchi, “EAGA-MLP—An Enhanced and Adaptive Hybrid Classification Model for Diabetes Diagnosis,” Sensors, vol. 20, no. 14, p. 4036, Jul. 2020, doi: 10.3390/s20144036.

[19] V. K. Gupta, A. Gupta, D. Kumar, and A. Sardana, “Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model,” Big Data Min. Anal., vol. 4, no. 2, pp. 116–123, Jun. 2021, doi: 10.26599/BDMA.2020.9020016.

[20] C. Bogdal, R. Schellenberg, O. Höpli, M. Bovens, and M. Lory, “Recognition of gasoline in fire debris using machine learning: Part I, application of random forest, gradient boosting, support vector machine, and naïve bayes,” Forensic Sci. Int., vol. 331, p. 111146, Feb. 2022, doi: 10.1016/j.forsciint.2021.111146.

[21] M. Radak, H. Y. Lafta, and H. Fallahi, “Machine learning and deep learning techniques for breast cancer diagnosis and classification: a comprehensive review of medical imaging studies,” J. Cancer Res. Clin. Oncol., vol. 149, no. 12, pp. 10473–10491, Sep. 2023, doi: 10.1007/s00432-023-04956-z.

[22] Q. Ren, H. Cheng, and H. Han, “Research on machine learning framework based on random forest algorithm,” 2020, p. 080020, doi: 10.1063/1.4977376.

[23] M.-J. Jun, “A comparison of a gradient boosting decision tree, random forests, and artificial neural networks to model urban land use changes: the case of the Seoul metropolitan area,” Int. J. Geogr. Inf. Sci., vol. 35, no. 11, pp. 2149–2167, Nov. 2021, doi: 10.1080/13658816.2021.1887490.

[24] N. Singh and P. Singh, “A novel Bagged Naïve Bayes-Decision Tree approach for multi-class classification problems,” J. Intell. Fuzzy Syst., vol. 36, no. 3, pp. 2261–2271, Mar. 2019, doi: 10.3233/JIFS-169937.

[25] B. Ramosaj and M. Pauly, “Consistent estimation of residual variance with random forest Out-Of-Bag errors,” Stat. Probab. Lett., vol. 151, pp. 49–57, Aug. 2019, doi: 10.1016/j.spl.2019.03.017.

[26] R. R. Putra and H. W. Dhany, “Determination of accuracy value in id3 algorithm with gini index and gain ratio with minimum size for split, minimum leaf size, and minimum gain,” IOP Conf. Ser. Mater. Sci. Eng., vol. 725, no. 1, p. 012088, Jan. 2020, doi: 10.1088/1757-899X/725/1/012088.

[27] Kowsari, Jafari Meimandi, Heidarysafa, Mendu, Barnes, and Brown, “Text Classification Algorithms: A Survey,” Information, vol. 10, no. 4, p. 150, Apr. 2019, doi: 10.3390/info10040150.

[28] K. Taunk, S. De, S. Verma, and A. Swetapadma, “A Brief Review of Nearest Neighbor Algorithm for Learning and Classification,” in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), May 2019, pp. 1255–1260, doi: 10.1109/ICCS45141.2019.9065747.

[29] I. Iddrisu, P. Appiahene, O. Appiah, and I. Fuseini, “Exploring the Impact of Students Demographic Attributes on Performance Prediction through Binary Classification in the KDP Model,” Knowl. Eng. Data Sci., vol. 6, no. 1, pp. 24–40, 2023, doi: 10.17977/um018v6i12023p24-40.

[30] V. K. Pandey, K. K. Sharma, H. R. Pourghasemi, and S. K. Bandooni, “Sedimentological characteristics and application of machine learning techniques for landslide susceptibility modelling along the highway corridor Nahan to Rajgarh (Himachal Pradesh), India,” CATENA, vol. 182, p. 104150, Nov. 2019, doi: 10.1016/j.catena.2019.104150.

[31] M. M. Ghiasi and S. Zendehboudi, “Application of decision tree-based ensemble learning in the classification of breast cancer,” Comput. Biol. Med., vol. 128, p. 104089, Jan. 2021, doi: 10.1016/j.compbiomed.2020.104089.

[32] M. Zounemat-Kermani, D. Stephan, M. Barjenbruch, and R. Hinkelmann, “Ensemble data mining modeling in corrosion of concrete sewer: A comparative study of network-based (MLPNN & RBFNN) and tree-based (RF, CHAID, & CART) models,” Adv. Eng. Informatics, vol. 43, p. 101030, Jan. 2020, doi: 10.1016/j.aei.2019.101030.

[33] A. Vrisna, H. Ar, M. Yasser, and S. Nazir, “Optimizing Random Forest Algorithm to Classify Player ’ s Memorisation via In-game Data,” Knowl. Eng. Data Sci., vol. 6, no. 1, pp. 103–113, 2023, doi: 10.17977/um018v6i12023p103-113.

[34] T.-T. Wong and P.-Y. Yeh, “Reliable Accuracy Estimates from k -Fold Cross Validation,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 8, pp. 1586–1594, Aug. 2020, doi: 10.1109/TKDE.2019.2912815.

[35] M. A. Khan et al., “Geopolymer Concrete Compressive Strength via Artificial Neural Network, Adaptive Neuro Fuzzy Interface System, and Gene Expression Programming With K-Fold Cross Validation,” Front. Mater., vol. 8, May 2021, doi: 10.3389/fmats.2021.621163.

[36] F. Martinez-Plumed et al., “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 8, pp. 3048–3061, Aug. 2021, doi: 10.1109/TKDE.2019.2962680.

[37] M. I. Zulfa, A. Fadli, and Y. Ramadhani, “Classification model for graduation on time study using data mining techniques with SVM algorithm,” 2019, p. 020006, doi: 10.1063/1.5097475.

[38] J. F. Pinto da Costa and M. Cabral, “Statistical Methods with Applications in Data Mining: A Review of the Most Recent Works,” Mathematics, vol. 10, no. 6, p. 993, Mar. 2022, doi: 10.3390/math10060993.

[39] K. Rahayu, L. Novianti, and M. Kusnandar, “Implementation Data Mining With K-Means Algorithm For Clustering Distribution Rabies Case Area In Palembang City,” J. Phys. Conf. Ser., vol. 1500, no. 1, p. 012121, Apr. 2020, doi: 10.1088/1742-6596/1500/1/012121.

[40] A. Tawakuli, D. Kaiser, and T. Engel, “Transforming IoT Data Preprocessing,” in Proceedings of the Twentieth ACM Conference on Embedded Networked Sensor Systems, Nov. 2022, pp. 1083–1088, doi: 10.1145/3560905.3567762.

[41] C. V. Gonzalez Zelaya, “Towards Explaining the Effects of Data Preprocessing on Machine Learning,” in 2019 IEEE 35th International Conference on Data Engineering (ICDE), Apr. 2019, pp. 2086–2090, doi: 10.1109/ICDE.2019.00245.

[42] H. Benhar, A. Idri, and J. L. Fernández-Alemán, “Data preprocessing for heart disease classification: A systematic literature review,” Comput. Methods Programs Biomed., vol. 195, p. 105635, 2020, doi: 10.1016/j.cmpb.2020.105635.

[43] M. D. Luecken et al., “Benchmarking atlas-level data integration in single-cell genomics,” Nat. Methods, vol. 19, no. 1, pp. 41–50, Jan. 2022, doi: 10.1038/s41592-021-01336-8.

[44] W. Sun, Z. Cai, Y. Li, F. Liu, S. Fang, and G. Wang, “Data Processing and Text Mining Technologies on Electronic Medical Records: A Review,” J. Healthc. Eng., vol. 2018, pp. 1–9, 2018, doi: 10.1155/2018/4302425.

[45] V. Sandfort, K. Yan, P. J. Pickhardt, and R. M. Summers, “Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks,” Sci. Rep., vol. 9, no. 1, p. 16884, Nov. 2019, doi: 10.1038/s41598-019-52737-x.

[46] U. Ayub and S. A. Moqurrab, “Predicting crop diseases using data mining approaches: Classification,” in 2018 1st International Conference on Power, Energy and Smart Grid (ICPESG), Apr. 2018, pp. 1–6, doi: 10.1109/ICPESG.2018.8384523.

[47] J. A. Solano, D. J. Lancheros Cuesta, S. F. Umaña Ibáñez, and J. R. Coronado-Hernández, “Predictive models assessment based on CRISP-DM methodology for students performance in Colombia - Saber 11 Test,” Procedia Comput. Sci., vol. 198, pp. 512–517, 2022, doi: 10.1016/j.procs.2021.12.278.

[48] O. Sagi and L. Rokach, “Ensemble learning: A survey,” WIREs Data Min. Knowl. Discov., vol. 8, no. 4, Jul. 2018, doi: 10.1002/widm.1249.

[49] T. Darmawan, “Credit Classification Using CRISP-DM Method On Bank ABC Customers,” Int. J. Emerg. Trends Eng. Res., vol. 8, no. 6, pp. 2375–2380, Jun. 2020, doi: 10.30534/ijeter/2020/28862020.

[50] A. W. Syaputri, E. Irwandi, and M. Mustakim, “Naïve Bayes Algorithm for Classification of Student Major’s Specialization,” J. Intell. Comput. Heal. Informatics, vol. 1, no. 1, p. 17, Mar. 2020, doi: 10.26714/jichi.v1i1.5570.

[51] S. Li, S. Ning, Y. Yezhou, T. Jingjing, Z. Wenxue, and C. Liang, “Application of Data Mining Technology in the Recall of Defective Automobile Products in China ——A Typical Case of the Construction of Digital China,” in 2019 2nd International Conference on Safety Produce Informatization (IICSPI), Nov. 2019, pp. 541–545, doi: 10.1109/IICSPI48186.2019.9095877.

[52] W. Li, “Big Data Precision Marketing Approach under IoT Cloud Platform Information Mining,” Comput. Intell. Neurosci., vol. 2022, pp. 1–11, Jan. 2022, doi: 10.1155/2022/4828108.

[53] A. Yulianto, P. Sukarno, and N. A. Suwastika, “Improving AdaBoost-based Intrusion Detection System (IDS) Performance on CIC IDS 2017 Dataset,” J. Phys. Conf. Ser., vol. 1192, p. 012018, Mar. 2019, doi: 10.1088/1742-6596/1192/1/012018.

[54] J. Yu, A. W. Schumann, Z. Cao, S. M. Sharpe, and N. S. Boyd, “Weed Detection in Perennial Ryegrass With Deep Learning Convolutional Neural Network,” Front. Plant Sci., vol. 10, Oct. 2019, doi: 10.3389/fpls.2019.01422.

[55] N. Tanjung, D. Irmayani, and V. Sihombing, “Implementation of C5.0 Algorithm for Prediction of Student Learning Graduation in Computer System Architecture Subjects,” Sinkron, vol. 7, no. 1, pp. 274–280, Feb. 2022, doi: 10.33395/sinkron.v7i1.11259.

[56] K. Devi and S. Ratnoo, “Predicting student dropouts using random forest,” J. Stat. Manag. Syst., vol. 25, no. 7, pp. 1579–1590, Oct. 2022, doi: 10.1080/09720510.2022.2130570.

[57] I. D. Mienye, Y. Sun, and Z. Wang, “Prediction performance of improved decision tree-based algorithms: a review,” Procedia Manuf., vol. 35, pp. 698–703, 2019, doi: 10.1016/j.promfg.2019.06.011.

[58] T. Berhane et al., “Decision-Tree, Rule-Based, and Random Forest Classification of High-Resolution Multispectral Imagery for Wetland Mapping and Inventory,” Remote Sens., vol. 10, no. 4, p. 580, Apr. 2018, doi: 10.3390/rs10040580.

[59] Y. Liu and J. L. Gastwirth, “On the capacity of the Gini index to represent income distributions,” METRON, vol. 78, no. 1, pp. 61–69, Apr. 2020, doi: 10.1007/s40300-020-00164-8.

[60] A. N. Iman and T. Ahmad, “Improving Intrusion Detection System by Estimating Parameters of Random Forest in Boruta,” in 2020 International Conference on Smart Technology and Applications (ICoSTA), Feb. 2020, pp. 1–6, doi: 10.1109/ICoSTA48221.2020.1570609975.

[61] D. Liu et al., “Optimisation and evaluation of the random forest model in the efficacy prediction of chemoradiotherapy for advanced cervical cancer based on radiomics signature from high-resolution T2 weighted images,” Arch. Gynecol. Obstet., vol. 303, no. 3, pp. 811–820, Mar. 2021, doi: 10.1007/s00404-020-05908-5.

[62] S. Hong and H. S. Lynn, “Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction,” BMC Med. Res. Methodol., vol. 20, no. 1, p. 199, Dec. 2020, doi: 10.1186/s12874-020-01080-1.

[63] V.-H. Nhu et al., “Shallow Landslide Susceptibility Mapping by Random Forest Base Classifier and Its Ensembles in a Semi-Arid Region of Iran,” Forests, vol. 11, no. 4, p. 421, Apr. 2020, doi: 10.3390/f11040421.

Refbacks

There are currently no refbacks.

Copyright (c) 2024 Cornaldo Beliarding Sucahyo, Fajriwati Qoyyum Rizqini, Ayyub Naufal, Hengky Yandratama, Jabar Ash Shiddiqy, Agung Bella Putra Utama, Nastiti Susetyo Fanany Putri, Aji Prasetya Wibawa

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Applied Engineering and Technology
ISSN: 2829-4998
Email: aet@ascee.org | andri.pranolo.id@ieee.org
Published by: Association for Scientic Computing Electronics and Engineering (ASCEE)
Organized by: Association for Scientic Computing Electronics and Engineering (ASCEE), Universitas Negeri Malang, Universitas Ahmad Dahlan

View My Stats AET

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me