Hybrid approach redefinition with progressive boosting for class imbalance problem

(1) * Hartono Hartono Mail (Department of Computer Science, STMIK IBBI, Medan, Indonesia)
(2) Erianto Ongko Mail (Department of Informatics, Akademi Teknologi Industri Immanuel, Medan, Indonesia)
*corresponding author

Abstract


Problems of Class Imbalance in data classification have received attention from many researchers. It is because the imbalance class will affect the accuracy of the classification results. The problem of the imbalance class itself will ignore the minority class, which is a class with a smaller number of instances even though the minority class is an exciting class to observe. In overcoming the imbalanced class problem, it is necessary to pay attention to diversity data, the number of classifiers, and also classification performance. Several methods have been proposed to overcome the imbalanced class problem, one of which is the Hybrid Approach Redefinition Method. This method is a good hybrid ensemble method in dealing with imbalance class problems, which can provide useful diversity data and also a smaller number of classifiers. This research will combine the Hybrid Approach Redefinition by replacing the use of SMOTE Boost by using Progressive Boosting to get better data diversity, a small number of classifiers, and better performance. This study will conduct testing in handling imbalance class problems using datasets sourced from the KEEL-Dataset Repository. The results of this study indicate that the Hybrid Approach Redefinition with Progressive Boosting will provide better results in the number of classifiers, data diversity, and classification performance.

Keywords


Class Imbalance; Classification; Hybrid Approach Redefinition; Hybrid Ensembles; Progressive Boosting

   

DOI

https://doi.org/10.31763/sitech.v1i1.34
      

Article metrics

10.31763/sitech.v1i1.34 Abstract views : 2139 | PDF views : 742

   

Cite

   

Full Text

Download

References


A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., vol. 91, pp. 216–231, Jul. 2019, doi: 10.1016/j.patcog.2019.02.023.

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Trans. Syst. Man, Cybern. Part C (Applications Rev., vol. 42, no. 4, pp. 463–484, 2012, doi: 10.1109/TSMCC.2011.2161285.

A. Fernández, V. López, M. Galar, M. J. del Jesus, and F. Herrera, “Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches,” Knowledge-Based Syst., vol. 42, pp. 97–110, Apr. 2013, doi: 10.1016/j.knosys.2013.01.018.

N. Japkowicz and R. Holte, “AAAI 2000 Workshop Reports on learning from imbalanced data-sets,” AI Mag., vol. 22, no. 1, pp. 127–136, Mar. 2001, doi: 10.1609/aimag.v22i1.1552.

S. Wang and X. Yao, “Diversity analysis on imbalanced data sets by using ensemble models,” in 2009 IEEE Symposium on Computational Intelligence and Data Mining, Mar. 2009, pp. 324–331, doi: 10.1109/CIDM.2009.4938667.

N. Lachiche and P. Flach, “Improving Accuracy and Cost of Two-Class and Multi-Class Probabilistic Classifiers Using ROC Curves,” in Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 2003, pp. 416–423.

Z. Yang, T. Zhang, J. Lu, D. Zhang, and D. Kalui, “Optimizing area under the ROC curve via extreme learning machines,” Knowledge-Based Syst., vol. 130, pp. 74–89, Aug. 2017, doi: 10.1016/j.knosys.2017.05.013.

Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, “Cost-sensitive boosting for classification of imbalanced data,” Pattern Recognit., vol. 40, no. 12, pp. 3358–3378, Dec. 2007, doi: 10.1016/j.patcog.2007.04.009.

C. Jian, J. Gao, and Y. Ao, “A new sampling method for classifying imbalanced data based on support vector machine ensemble,” Neurocomputing, vol. 193, pp. 115–122, Jun. 2016, doi: 10.1016/j.neucom.2016.02.006.

F. Ren, P. Cao, W. Li, D. Zhao, and O. Zaiane, “Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm,” Comput. Med. Imaging Graph., vol. 55, pp. 54–67, Jan. 2017, doi: 10.1016/j.compmedimag.2016.07.011.

Hartono, E. Ongko, O. S. Sitompul, Tulus, E. B. Nababan, and D. Abdullah, “Hybrid Approach Redefinition (HAR) Method with Loss Factors in Handling Class Imbalance Problem,” in Proceeding - 2018 International Symposium on Advanced Intelligent Informatics: Revolutionize Intelligent Informatics Spectrum for Humanity, SAIN 2018, 2019, doi: 10.1109/SAIN.2018.8673370.

A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” Journal of Artificial Intelligence Research. 2018, doi: 10.1613/jair.1.11192.

J. F. Díez-Pastor, J. J. Rodríguez, C. I. García-Osorio, and L. I. Kuncheva, “Diversity techniques improve the performance of the best imbalance learning ensembles,” Inf. Sci. (Ny)., vol. 325, pp. 98–117, Dec. 2015, doi: 10.1016/j.ins.2015.07.025.

L. I. Kuncheva, Combining Pattern Classifiers. Wiley, 2004.

R. Soleymani, E. Granger, and G. Fumera, “Progressive boosting for class imbalance and its application to face re-identification,” Expert Syst. Appl., vol. 101, pp. 271–291, Jul. 2018, doi: 10.1016/j.eswa.2018.01.023.

J. Alcalá-Fdez et al., “KEEL: a software tool to assess evolutionary algorithms for data mining problems,” Soft Comput., vol. 13, no. 3, pp. 307–318, 2009, doi: 10.1007/s00500-008-0323-y.

G. U. Yule, “VII. On the association of attributes in statistics: with illustrations from the material of the childhood society, &c,” Philos. Trans. R. Soc. London. Ser. A, Contain. Pap. a Math. or Phys. Character, vol. 194, no. 252–261, pp. 257–319, Jan. 1900, doi: 10.1098/rsta.1900.0019.

L. B. Lusted, “Signal Detectability and Medical Decision-Making,” Science (80-. )., vol. 171, no. 3977, pp. 1217–1219, Mar. 1971, doi: 10.1126/science.171.3977.1217.

C. Gigliarano, S. Figini, and P. Muliere, “Making classifier performance comparisons when ROC curves intersect,” Comput. Stat. Data Anal., vol. 77, pp. 300–312, Sep. 2014, doi: 10.1016/j.csda.2014.03.008.

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: A review,” Int. J. Adv. Soft Comput. its Appl., vol. 7, no. 3, pp. 176–204, 2015.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 Hartono Hartono, Erianto Ongko

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
Science in Information Technology Letters
ISSN 2722-4139
Published by Association for Scientific Computing Electrical and Engineering (ASCEE)
W : http://pubs2.ascee.org/index.php/sitech
E : sitech@ascee.org, andri@ascee.org, andri.pranolo.id@ieee.org

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

View My Stats