Semi-supervised labelling of chest x-ray images using unsupervised clustering for ground-truth generation

(1) * Victor Ikechukwu Agughasi Mail (Maharaja Institute of Technology Mysore, India)
(2) Murali Srinivasiah Mail (Maharaja Institute of Technology Mysore, India)
*corresponding author

Abstract


Supervised classifiers require a lot of data with accurate labels to learn to recognize chest X-ray images (CXR). However, manually labeling an extensive collection of CXR images is time-consuming and costly. To address this issue, a method for the semi-supervised labelling of extensive collections of CXR images is proposed leveraging unsupervised clustering with minimum expert knowledge to generate ground truth images. The proposed methodology entails: using unsupervised clustering techniques such as K-Means and Self-Organizing Maps. Second, the images are fed to five different feature vectors to utilize the potential differences between features to their full advantage. Third, each data point gets the label of the cluster’s center to which it belongs. Finally, a majority vote is used to decide the ground truth image. The number of clusters created by the method chosen strictly limits the amount of human involvement. To evaluate the effectiveness of the proposed method, experiments were conducted on two publicly available CXR datasets, namely VinDR-CXR and Montgomery datasets. The experiments showed that, for a KNN classifier, manually labeling only 1% (VinDr-CXR), or 10% (Montgomery) of the training data, gives a similar performance as labeling the whole dataset. The proposed methodology efficiently generates ground-truth images from publicly available CXR datasets. To our knowledge, this is the first study to use the VinDr-CXR and Montgomery datasets for ground truth image generation. Extensive experimental analysis using machine learning and statistical techniques shows that the proposed methodology efficiently generates ground truth images from CXR datasets.


Keywords


Chest x-ray; Ground-truth generation; Semi-supervised classifier; Unsupervised Clustering; VinDR-CXR dataset

   

DOI

https://doi.org/10.31763/aet.v2i3.1143
      

Article metrics

10.31763/aet.v2i3.1143 Abstract views : 774 | PDF views : 348

   

Cite

   

Full Text

Download

References


[1] G. Litjens et al., “A survey on deep learning in medical image analysis,” Med. Image Anal., vol. 42, pp. 60–88, Dec. 2017, doi: 10.1016/j.media.2017.07.005.

[2] P. Rajpurkar et al., “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,” arXiv Comput. Vis. Pattern Recognit., pp. 1–7, Nov. 2017. [Online]. Available at: https://arxiv.org/abs/1711.05225.

[3] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016, doi: 10.1038/nature16961.

[4] R. Sivaramakrishnan et al., “Comparing deep learning models for population screening using chest radiography,” in Medical Imaging 2018: Computer-Aided Diagnosis, Feb. 2018, vol. 10575, p. 49, doi: 10.1117/12.2293140.

[5] C. D. Mathers and D. Loncar, “Projections of Global Mortality and Burden of Disease from 2002 to 2030,” PLoS Med., vol. 3, no. 11, p. e442, Nov. 2006, doi: 10.1371/journal.pmed.0030442.

[6] N. Zhong et al., “Prevalence of Chronic Obstructive Pulmonary Disease in China,” Am. J. Respir. Crit. Care Med., vol. 176, no. 8, pp. 753–760, Oct. 2007, doi: 10.1164/rccm.200612-1749OC.

[7] J. Wanchaitanawong et al., “A Predictive Model using Artificial Intelligence on Chest Radiograph in Addition to History and Physical Examination to Diagnose Chronic Obstructive Pulmonary Disease,” J. Med. Assoc. Thail., vol. 104, no. Suppl. 4, pp. 79–87, Oct. 2021, doi: 10.35755/jmedassocthai.2021.S04.00049.

[8] “Copd Chronic Obstructive Lung Disease ( Copd) / Emphysema.” https://www.stritch.luc.edu/lumen/meded/radio/curriculum/medicine/emphysema.htm.

[9] A. Feragen et al., “Geometric Tree Kernels: Classification of COPD from Airway Tree Geometry,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7917 LNCS, Springer, Berlin, Heidelberg, 2013, pp. 171–183, doi: 10.1007/978-3-642-38868-2_15.

[10] S. Bodduluri, J. D. Newell, E. A. Hoffman, and J. M. Reinhardt, “Registration-Based Lung Mechanical Analysis of Chronic Obstructive Pulmonary Disease (COPD) Using a Supervised Machine Learning Framework,” Acad. Radiol., vol. 20, no. 5, pp. 527–536, May 2013, doi: 10.1016/j.acra.2013.01.019.

[11] V. Cheplygina, L. Sorensen, D. M. J. Tax, J. H. Pedersen, M. Loog, and M. de Bruijne, “Classification of COPD with Multiple Instance Learning,” in 2014 22nd International Conference on Pattern Recognition, Aug. 2014, pp. 1508–1513, doi: 10.1109/ICPR.2014.268.

[12] J. Faigl and G. A. Hollinger, “Autonomous Data Collection Using a Self-Organizing Map,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 5, pp. 1703–1715, May 2018, doi: 10.1109/TNNLS.2017.2678482.

[13] J. Li, H. Mouchere, and C. Viard-Gaudin, “Reducing Annotation Workload Using a Codebook Mapping and Its Evaluation in On-Line Handwriting,” in 2012 International Conference on Frontiers in Handwriting Recognition, Sep. 2012, pp. 752–757, doi: 10.1109/ICFHR.2012.259.

[14] K.-C. Yuan, L.-W. Tsai, K. Lai, S.-T. Teng, Y.-S. Lo, and S.-J. Peng, “Using Transfer Learning Method to Develop an Artificial Intelligence Assisted Triaging for Endotracheal Tube Position on Chest X-ray,” Diagnostics, vol. 11, no. 10, p. 1844, Oct. 2021, doi: 10.3390/diagnostics11101844.

[15] W. S. H. M. Wan Ahmad, W. M. D. W Zaki, and M. F. Ahmad Fauzi, “Lung segmentation on standard and mobile chest radiographs using oriented Gaussian derivatives filter,” Biomed. Eng. Online, vol. 14, no. 1, p. 20, Dec. 2015, doi: 10.1186/s12938-015-0014-8.

[16] Y. Shao, Y. Gao, Y. Guo, Y. Shi, X. Yang, and D. Shen, “Hierarchical Lung Field Segmentation With Joint Shape and Appearance Sparse Learning,” IEEE Trans. Med. Imaging, vol. 33, no. 9, pp. 1761–1780, Sep. 2014, doi: 10.1109/TMI.2014.2305691.

[17] D. K. Iakovidis, M. A. Savelonas, and G. Papamichalis, “Robust model-based detection of the lung field boundaries in portable chest radiographs supported by selective thresholding,” Meas. Sci. Technol., vol. 20, no. 10, p. 104019, Oct. 2009, doi: 10.1088/0957-0233/20/10/104019.

[18] B. van Ginneken and B. M. ter Haar Romeny, “Automatic segmentation of lung fields in chest radiographs,” Med. Phys., vol. 27, no. 10, pp. 2445–2455, Oct. 2000, doi: 10.1118/1.1312192.

[19] G. González et al., “Disease Staging and Prognosis in Smokers Using Deep Learning in Chest Computed Tomography,” Am. J. Respir. Crit. Care Med., vol. 197, no. 2, pp. 193–203, Jan. 2018, doi: 10.1164/rccm.201705-0860OC.

[20] C. Hatt, C. Galban, W. Labaki, E. Kazerooni, D. Lynch, and M. Han, “Convolutional Neural Network Based COPD and Emphysema Classifications Are Predictive of Lung Cancer Diagnosis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11040 LNCS, Springer Verlag, 2018, pp. 302–309, doi: 10.1007/978-3-030-00946-5_30.

[21] L. Y. W. Tang, H. O. Coxson, S. Lam, J. Leipsic, R. C. Tam, and D. D. Sin, “Towards large-scale case-finding: training and validation of residual networks for detection of chronic obstructive pulmonary disease using low-dose CT,” Lancet Digit. Heal., vol. 2, no. 5, pp. e259–e267, May 2020, doi: 10.1016/S2589-7500(20)30064-9.

[22] D. A. Ragab, M. Sharkas, S. Marshall, and J. Ren, “Breast cancer detection using deep convolutional neural networks and support vector machines,” PeerJ, vol. 7, no. 1, p. e6201, Jan. 2019, doi: 10.7717/peerj.6201.

[23] A. M. Tahir et al., “Deep Learning for Reliable Classification of COVID-19, MERS, and SARS from Chest X-ray Images,” Cognit. Comput., vol. 14, no. 5, pp. 1752–1772, Sep. 2022, doi: 10.1007/s12559-021-09955-1.

[24] A. Victor Ikechukwu, S. Murali, R. Deepu, and R. C. Shivamurthy, “ResNet-50 vs VGG-19 vs training from scratch: A comparative analysis of the segmentation and classification of Pneumonia from chest X-ray images,” Glob. Transitions Proc., vol. 2, no. 2, pp. 375–381, Nov. 2021, doi: 10.1016/j.gltp.2021.08.027.

[25] A. Victor Ikechukwu and M. S, “CX-Net: an efficient ensemble semantic deep neural network for ROI identification from chest-x-ray images for COPD diagnosis,” Mach. Learn. Sci. Technol., vol. 4, no. 2, p. 025021, Jun. 2023, doi: 10.1088/2632-2153/acd2a5.

[26] V. I. Agughasi, Y. DK, and S. M. Das, “Early Prognosis of Heart Failure from Clinical Symptoms using K-Means and Naïve Bayes Algorithms - Peer-reviewed Journal,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 9, no. 7, pp. 55–61, 2020. [Online]. Available at: https://ijarcce.com/papers/early-prognosis-of-heart-failure-from-clinical-symptoms-using-k-means-and-naive-bayes-algorithms/.

[27] L. Brunese, F. Mercaldo, A. Reginelli, and A. Santone, “Explainable Deep Learning for Pulmonary Disease and Coronavirus COVID-19 Detection from X-rays,” Comput. Methods Programs Biomed., vol. 196, p. 105608, Nov. 2020, doi: 10.1016/j.cmpb.2020.105608.

[28] H. Q. Nguyen et al., “VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations,” Sci. Data, vol. 9, no. 1, p. 429, Jul. 2022, doi: 10.1038/s41597-022-01498-w.

[29] F. Rahimi and H. Rabbani, “A dual adaptive watermarking scheme in contourlet domain for DICOM images,” Biomed. Eng. Online, vol. 10, no. 1, p. 53, Jun. 2011, doi: 10.1186/1475-925X-10-53.

[30] S. Candemir et al., “Lung Segmentation in Chest Radiographs Using Anatomical Atlases With Nonrigid Registration,” IEEE Trans. Med. Imaging, vol. 33, no. 2, pp. 577–590, Feb. 2014, doi: 10.1109/TMI.2013.2290491.

[31] A. V. Ikechukwu and S. Murali, “i-Net: a deep CNN model for white blood cancer segmentation and classification,” Int. J. Adv. Technol. Eng. Explor., vol. 9, no. 95, pp. 1448–1464, Oct. 2022, doi: 10.19101/IJATEE.2021.875564.

[32] S. Sun and R. Zhang, “Region of Interest Extraction of Medical Image based on Improved Region Growing Algorithm,” in Proceedings of the 2017 International Conference on Material Science, Energy and Environmental Engineering (MSEEE 2017), Aug. 2017, pp. 471–475, doi: 10.2991/mseee-17.2017.87.

[33] M. Wei et al., “A Benign and Malignant Breast Tumor Classification Method via Efficiently Combining Texture and Morphological Features on Ultrasound Images,” Comput. Math. Methods Med., vol. 2020, pp. 1–12, Oct. 2020, doi: 10.1155/2020/5894010.

[34] K. E. Barner, “Region of interest identification in collimated x-ray images utilizing nonlinear preprocessing and the Radon transform,” J. Electron. Imaging, vol. 14, no. 3, p. 033011, Jul. 2005, doi: 10.1117/1.2005042.

[35] G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science (80-. )., vol. 313, no. 5786, pp. 504–507, Jul. 2006, doi: 10.1126/science.1127647.

[36] C. Chen, H. Seo, C. H. Jun, and Y. Zhao, “Pavement crack detection and classification based on fusion feature of LBP and PCA with SVM,” Int. J. Pavement Eng., vol. 23, no. 9, pp. 3274–3283, Jul. 2022, doi: 10.1080/10298436.2021.1888092.

[37] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognit. Lett., vol. 31, no. 8, pp. 651–666, Jun. 2010, doi: 10.1016/j.patrec.2009.09.011.

[38] T. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9, pp. 1464–1480, 1990, doi: 10.1109/5.58325.

[39] E. Fix and J. L. Hodges, “Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties,” Int. Stat. Rev. / Rev. Int. Stat., vol. 57, no. 3, p. 238, Dec. 1989, doi: 10.2307/1403797.

[40] A. Torralba, R. Fergus, and W. T. Freeman, “80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 1958–1970, Nov. 2008, doi: 10.1109/TPAMI.2008.128.

[41] L. I. Kuncheva, Combining Pattern Classifiers. Wiley, p. 350, Jul. 2004, doi: 10.1002/0471660264.

[42] J. Kittler and F. M. Alkoot, “Sum versus vote fusion in multiple classifier systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 1, pp. 110–115, Jan. 2003, doi: 10.1109/TPAMI.2003.1159950.

[43] D. Sarrut, A. Etxebeste, E. Muñoz, N. Krah, and J. M. Létang, “Artificial Intelligence for Monte Carlo Simulation in Medical Physics,” Front. Phys., vol. 9, p. 738112, Oct. 2021, doi: 10.3389/fphy.2021.738112.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Victor Ikechukwu Agughasi, Murali Srinivasiah

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Applied Engineering and Technology
ISSN: 2829-4998
Email: aet@ascee.org | andri.pranolo.id@ieee.org
Published by: Association for Scientic Computing Electronics and Engineering (ASCEE)
Organized by: Association for Scientic Computing Electronics and Engineering (ASCEE), Universitas Negeri Malang, Universitas Ahmad Dahlan

View My Stats AET
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.