Semi-supervised labelling of chest x-ray images using  unsupervised clustering for ground-truth generation

Victor Ikechukwu Agughasi; Murali Srinivasiah

doi:10.31763/aet.v2i3.1143


Semi-supervised labelling of chest x-ray images using unsupervised clustering for ground-truth generation

^{(1) *} Victor Ikechukwu Agughasi

(Maharaja Institute of Technology Mysore, India)
⁽²⁾ Murali Srinivasiah

(Maharaja Institute of Technology Mysore, India)
^*corresponding author

Abstract

Supervised classifiers require a lot of data with accurate labels to learn to recognize chest X-ray images (CXR). However, manually labeling an extensive collection of CXR images is time-consuming and costly. To address this issue, a method for the semi-supervised labelling of extensive collections of CXR images is proposed leveraging unsupervised clustering with minimum expert knowledge to generate ground truth images. The proposed methodology entails: using unsupervised clustering techniques such as K-Means and Self-Organizing Maps. Second, the images are fed to five different feature vectors to utilize the potential differences between features to their full advantage. Third, each data point gets the label of the clusterâ€™s center to which it belongs. Finally, a majority vote is used to decide the ground truth image. The number of clusters created by the method chosen strictly limits the amount of human involvement. To evaluate the effectiveness of the proposed method, experiments were conducted on two publicly available CXR datasets, namely VinDR-CXR and Montgomery datasets. The experiments showed that, for a KNN classifier, manually labeling only 1% (VinDr-CXR), or 10% (Montgomery) of the training data, gives a similar performance as labeling the whole dataset. The proposed methodology efficiently generates ground-truth images from publicly available CXR datasets. To our knowledge, this is the first study to use the VinDr-CXR and Montgomery datasets for ground truth image generation. Extensive experimental analysis using machine learning and statistical techniques shows that the proposed methodology efficiently generates ground truth images from CXR datasets.

Keywords

Chest x-ray; Ground-truth generation; Semi-supervised classifier; Unsupervised Clustering; VinDR-CXR dataset

DOI

https://doi.org/10.31763/aet.v2i3.1143

Article metrics

10.31763/aet.v2i3.1143 Abstract views : 1678 | PDF views : 806

Cite

How to cite item

Full Text

Download

References

[1] G. Litjens et al., â€œA survey on deep learning in medical image analysis,â€ Med. Image Anal., vol. 42, pp. 60â€“88, Dec. 2017, doi: 10.1016/j.media.2017.07.005.

[2] P. Rajpurkar et al., â€œCheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,â€ arXiv Comput. Vis. Pattern Recognit., pp. 1â€“7, Nov. 2017. [Online]. Available at: https://arxiv.org/abs/1711.05225.

[3] D. Silver et al., â€œMastering the game of Go with deep neural networks and tree search,â€ Nature, vol. 529, no. 7587, pp. 484â€“489, Jan. 2016, doi: 10.1038/nature16961.

[4] R. Sivaramakrishnan et al., â€œComparing deep learning models for population screening using chest radiography,â€ in Medical Imaging 2018: Computer-Aided Diagnosis, Feb. 2018, vol. 10575, p. 49, doi: 10.1117/12.2293140.

[5] C. D. Mathers and D. Loncar, â€œProjections of Global Mortality and Burden of Disease from 2002 to 2030,â€ PLoS Med., vol. 3, no. 11, p. e442, Nov. 2006, doi: 10.1371/journal.pmed.0030442.

[6] N. Zhong et al., â€œPrevalence of Chronic Obstructive Pulmonary Disease in China,â€ Am. J. Respir. Crit. Care Med., vol. 176, no. 8, pp. 753â€“760, Oct. 2007, doi: 10.1164/rccm.200612-1749OC.

[7] J. Wanchaitanawong et al., â€œA Predictive Model using Artificial Intelligence on Chest Radiograph in Addition to History and Physical Examination to Diagnose Chronic Obstructive Pulmonary Disease,â€ J. Med. Assoc. Thail., vol. 104, no. Suppl. 4, pp. 79â€“87, Oct. 2021, doi: 10.35755/jmedassocthai.2021.S04.00049.

[8] â€œCopd Chronic Obstructive Lung Disease ( Copd) / Emphysema.â€ https://www.stritch.luc.edu/lumen/meded/radio/curriculum/medicine/emphysema.htm.

[9] A. Feragen et al., â€œGeometric Tree Kernels: Classification of COPD from Airway Tree Geometry,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7917 LNCS, Springer, Berlin, Heidelberg, 2013, pp. 171â€“183, doi: 10.1007/978-3-642-38868-2_15.

[10] S. Bodduluri, J. D. Newell, E. A. Hoffman, and J. M. Reinhardt, â€œRegistration-Based Lung Mechanical Analysis of Chronic Obstructive Pulmonary Disease (COPD) Using a Supervised Machine Learning Framework,â€ Acad. Radiol., vol. 20, no. 5, pp. 527â€“536, May 2013, doi: 10.1016/j.acra.2013.01.019.

[11] V. Cheplygina, L. Sorensen, D. M. J. Tax, J. H. Pedersen, M. Loog, and M. de Bruijne, â€œClassification of COPD with Multiple Instance Learning,â€ in 2014 22nd International Conference on Pattern Recognition, Aug. 2014, pp. 1508â€“1513, doi: 10.1109/ICPR.2014.268.

[12] J. Faigl and G. A. Hollinger, â€œAutonomous Data Collection Using a Self-Organizing Map,â€ IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 5, pp. 1703â€“1715, May 2018, doi: 10.1109/TNNLS.2017.2678482.

[13] J. Li, H. Mouchere, and C. Viard-Gaudin, â€œReducing Annotation Workload Using a Codebook Mapping and Its Evaluation in On-Line Handwriting,â€ in 2012 International Conference on Frontiers in Handwriting Recognition, Sep. 2012, pp. 752â€“757, doi: 10.1109/ICFHR.2012.259.

[14] K.-C. Yuan, L.-W. Tsai, K. Lai, S.-T. Teng, Y.-S. Lo, and S.-J. Peng, â€œUsing Transfer Learning Method to Develop an Artificial Intelligence Assisted Triaging for Endotracheal Tube Position on Chest X-ray,â€ Diagnostics, vol. 11, no. 10, p. 1844, Oct. 2021, doi: 10.3390/diagnostics11101844.

[15] W. S. H. M. Wan Ahmad, W. M. D. W Zaki, and M. F. Ahmad Fauzi, â€œLung segmentation on standard and mobile chest radiographs using oriented Gaussian derivatives filter,â€ Biomed. Eng. Online, vol. 14, no. 1, p. 20, Dec. 2015, doi: 10.1186/s12938-015-0014-8.

[16] Y. Shao, Y. Gao, Y. Guo, Y. Shi, X. Yang, and D. Shen, â€œHierarchical Lung Field Segmentation With Joint Shape and Appearance Sparse Learning,â€ IEEE Trans. Med. Imaging, vol. 33, no. 9, pp. 1761â€“1780, Sep. 2014, doi: 10.1109/TMI.2014.2305691.

[17] D. K. Iakovidis, M. A. Savelonas, and G. Papamichalis, â€œRobust model-based detection of the lung field boundaries in portable chest radiographs supported by selective thresholding,â€ Meas. Sci. Technol., vol. 20, no. 10, p. 104019, Oct. 2009, doi: 10.1088/0957-0233/20/10/104019.

[18] B. van Ginneken and B. M. ter Haar Romeny, â€œAutomatic segmentation of lung fields in chest radiographs,â€ Med. Phys., vol. 27, no. 10, pp. 2445â€“2455, Oct. 2000, doi: 10.1118/1.1312192.

[19] G. GonzÃ¡lez et al., â€œDisease Staging and Prognosis in Smokers Using Deep Learning in Chest Computed Tomography,â€ Am. J. Respir. Crit. Care Med., vol. 197, no. 2, pp. 193â€“203, Jan. 2018, doi: 10.1164/rccm.201705-0860OC.

[20] C. Hatt, C. Galban, W. Labaki, E. Kazerooni, D. Lynch, and M. Han, â€œConvolutional Neural Network Based COPD and Emphysema Classifications Are Predictive of Lung Cancer Diagnosis,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11040 LNCS, Springer Verlag, 2018, pp. 302â€“309, doi: 10.1007/978-3-030-00946-5_30.

[21] L. Y. W. Tang, H. O. Coxson, S. Lam, J. Leipsic, R. C. Tam, and D. D. Sin, â€œTowards large-scale case-finding: training and validation of residual networks for detection of chronic obstructive pulmonary disease using low-dose CT,â€ Lancet Digit. Heal., vol. 2, no. 5, pp. e259â€“e267, May 2020, doi: 10.1016/S2589-7500(20)30064-9.

[22] D. A. Ragab, M. Sharkas, S. Marshall, and J. Ren, â€œBreast cancer detection using deep convolutional neural networks and support vector machines,â€ PeerJ, vol. 7, no. 1, p. e6201, Jan. 2019, doi: 10.7717/peerj.6201.

[23] A. M. Tahir et al., â€œDeep Learning for Reliable Classification of COVID-19, MERS, and SARS from Chest X-ray Images,â€ Cognit. Comput., vol. 14, no. 5, pp. 1752â€“1772, Sep. 2022, doi: 10.1007/s12559-021-09955-1.

[24] A. Victor Ikechukwu, S. Murali, R. Deepu, and R. C. Shivamurthy, â€œResNet-50 vs VGG-19 vs training from scratch: A comparative analysis of the segmentation and classification of Pneumonia from chest X-ray images,â€ Glob. Transitions Proc., vol. 2, no. 2, pp. 375â€“381, Nov. 2021, doi: 10.1016/j.gltp.2021.08.027.

[25] A. Victor Ikechukwu and M. S, â€œCX-Net: an efficient ensemble semantic deep neural network for ROI identification from chest-x-ray images for COPD diagnosis,â€ Mach. Learn. Sci. Technol., vol. 4, no. 2, p. 025021, Jun. 2023, doi: 10.1088/2632-2153/acd2a5.

[26] V. I. Agughasi, Y. DK, and S. M. Das, â€œEarly Prognosis of Heart Failure from Clinical Symptoms using K-Means and NaÃ¯ve Bayes Algorithms - Peer-reviewed Journal,â€ Int. J. Adv. Res. Comput. Commun. Eng., vol. 9, no. 7, pp. 55â€“61, 2020. [Online]. Available at: https://ijarcce.com/papers/early-prognosis-of-heart-failure-from-clinical-symptoms-using-k-means-and-naive-bayes-algorithms/.

[27] L. Brunese, F. Mercaldo, A. Reginelli, and A. Santone, â€œExplainable Deep Learning for Pulmonary Disease and Coronavirus COVID-19 Detection from X-rays,â€ Comput. Methods Programs Biomed., vol. 196, p. 105608, Nov. 2020, doi: 10.1016/j.cmpb.2020.105608.

[28] H. Q. Nguyen et al., â€œVinDr-CXR: An open dataset of chest X-rays with radiologistâ€™s annotations,â€ Sci. Data, vol. 9, no. 1, p. 429, Jul. 2022, doi: 10.1038/s41597-022-01498-w.

[29] F. Rahimi and H. Rabbani, â€œA dual adaptive watermarking scheme in contourlet domain for DICOM images,â€ Biomed. Eng. Online, vol. 10, no. 1, p. 53, Jun. 2011, doi: 10.1186/1475-925X-10-53.

[30] S. Candemir et al., â€œLung Segmentation in Chest Radiographs Using Anatomical Atlases With Nonrigid Registration,â€ IEEE Trans. Med. Imaging, vol. 33, no. 2, pp. 577â€“590, Feb. 2014, doi: 10.1109/TMI.2013.2290491.

[31] A. V. Ikechukwu and S. Murali, â€œi-Net: a deep CNN model for white blood cancer segmentation and classification,â€ Int. J. Adv. Technol. Eng. Explor., vol. 9, no. 95, pp. 1448â€“1464, Oct. 2022, doi: 10.19101/IJATEE.2021.875564.

[32] S. Sun and R. Zhang, â€œRegion of Interest Extraction of Medical Image based on Improved Region Growing Algorithm,â€ in Proceedings of the 2017 International Conference on Material Science, Energy and Environmental Engineering (MSEEE 2017), Aug. 2017, pp. 471â€“475, doi: 10.2991/mseee-17.2017.87.

[33] M. Wei et al., â€œA Benign and Malignant Breast Tumor Classification Method via Efficiently Combining Texture and Morphological Features on Ultrasound Images,â€ Comput. Math. Methods Med., vol. 2020, pp. 1â€“12, Oct. 2020, doi: 10.1155/2020/5894010.

[34] K. E. Barner, â€œRegion of interest identification in collimated x-ray images utilizing nonlinear preprocessing and the Radon transform,â€ J. Electron. Imaging, vol. 14, no. 3, p. 033011, Jul. 2005, doi: 10.1117/1.2005042.

[35] G. E. Hinton and R. R. Salakhutdinov, â€œReducing the Dimensionality of Data with Neural Networks,â€ Science (80-. )., vol. 313, no. 5786, pp. 504â€“507, Jul. 2006, doi: 10.1126/science.1127647.

[36] C. Chen, H. Seo, C. H. Jun, and Y. Zhao, â€œPavement crack detection and classification based on fusion feature of LBP and PCA with SVM,â€ Int. J. Pavement Eng., vol. 23, no. 9, pp. 3274â€“3283, Jul. 2022, doi: 10.1080/10298436.2021.1888092.

[37] A. K. Jain, â€œData clustering: 50 years beyond K-means,â€ Pattern Recognit. Lett., vol. 31, no. 8, pp. 651â€“666, Jun. 2010, doi: 10.1016/j.patrec.2009.09.011.

[38] T. Kohonen, â€œThe self-organizing map,â€ Proc. IEEE, vol. 78, no. 9, pp. 1464â€“1480, 1990, doi: 10.1109/5.58325.

[39] E. Fix and J. L. Hodges, â€œDiscriminatory Analysis. Nonparametric Discrimination: Consistency Properties,â€ Int. Stat. Rev. / Rev. Int. Stat., vol. 57, no. 3, p. 238, Dec. 1989, doi: 10.2307/1403797.

[40] A. Torralba, R. Fergus, and W. T. Freeman, â€œ80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition,â€ IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 1958â€“1970, Nov. 2008, doi: 10.1109/TPAMI.2008.128.

[41] L. I. Kuncheva, Combining Pattern Classifiers. Wiley, p. 350, Jul. 2004, doi: 10.1002/0471660264.

[42] J. Kittler and F. M. Alkoot, â€œSum versus vote fusion in multiple classifier systems,â€ IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 1, pp. 110â€“115, Jan. 2003, doi: 10.1109/TPAMI.2003.1159950.

[43] D. Sarrut, A. Etxebeste, E. MuÃ±oz, N. Krah, and J. M. LÃ©tang, â€œArtificial Intelligence for Monte Carlo Simulation in Medical Physics,â€ Front. Phys., vol. 9, p. 738112, Oct. 2021, doi: 10.3389/fphy.2021.738112.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Applied Engineering and Technology
ISSN: 2829-4998
Email: aet@ascee.org | andri.pranolo.id@ieee.org
Published by:Â Association for Scientic Computing Electronics and Engineering (ASCEE)
Organized by:Â Association for Scientic Computing Electronics and Engineering (ASCEE), Universitas Negeri Malang, Universitas Ahmad Dahlan

View My Stats AET

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me