An Integrated Deep Learning Framework Combining LSTM-CRF, GRU-CRF, and CNN-CRF with Word Embedding Techniques for Arabic Named Entity Recognition

Mahdi Ahmed Ali; Ahmed Bahaaulddin A. Alwahhab; Yagoub Farjami

doi:10.31763/ijrcs.v5i2.1752


An Integrated Deep Learning Framework Combining LSTM-CRF, GRU-CRF, and CNN-CRF with Word Embedding Techniques for Arabic Named Entity Recognition

^{(1) *} Mahdi Ahmed Ali

(Middle Technical University, Iraq)
⁽²⁾ Ahmed Bahaaulddin A. Alwahhab

(Middle Technical University, Iraq)
⁽³⁾ Yagoub Farjami

(University of Qom, Iran, Islamic Republic of)
^*corresponding author

Abstract

Named entity recognition (NER) is the main function of natural language processing (NLP) and has many applications. Arabic NER systems aim to identify and classify Arabic NEs in Arabic text, which provide unique problems due to the language's complex morphology and syntactic structures. This paper provides an integrated deep learning system that incorporates three deep learning architectures—LSTM-CRF, GRU-CRF, and CNN-CRF—as well as three word embedding techniques: GloVe, Word2Vec, and FastText, all trained on Arabic corpus. To develop NER state-of-the-art in Arabic language, the present paper proposed a 3-stage process of pre-processing, feature extraction, and a combination of various deep network schemes. In the preprocessing section, operations such as removing irrelevant words, correcting words, etc. will be used to improve the system's efficiency. In the feature extraction section, three-word embedding methods, Glove, word2vec, and fasttext, which are trained with Arabic texts, are used, and finally, three LSTM-CRF, GRU-CRF, and CNN-CRF models are trained with each word embedding, and the results they are combined. Experimental results on benchmark dataset, ANERcorp show that our methodology is effective, with an accuracy of 94.39%, which outperforms other cutting-edge methods. However, combining multiple deep learning models with word embeddings increases computational complexity and resource requirements, potentially complicating implementation in resource-constrained contexts. Future efforts will concentrate on optimizing the framework to lower computational costs while keeping good performance.

Keywords

Arabic NLP; Conditional Random Field; Deep Learning (DL); Named Entity Recognition; Word Embedding

DOI

https://doi.org/10.31763/ijrcs.v5i2.1752

Article metrics

10.31763/ijrcs.v5i2.1752 Abstract views : 289 | PDF views : 71

Cite

How to cite item

Full Text

Download

References

[1] X. Qu, Y. Gu, Q. Xia, Z. Li, Z. Wang and B. Huai, "A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends," IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 3, pp. 943-959, 2024, https://doi.org/10.1109/TKDE.2023.3303136.

[2] I. Keraghel, S. Morbieu, and M. Nadif, "Recent Advances in Named Entity Recognition: A Comprehensive Survey and Comparative Study," Computation and Language, 2024, https://doi.org/10.48550/arXiv.2401.10825.

[3] T. E. Moussaoui and C. Loqman, "Advancements in Arabic Named Entity Recognition: A Comprehensive Review," IEEE Access, vol. 12, pp. 180238-180266, 2024, https://doi.org/10.1109/ACCESS.2024.3491897.

[4] R. Salah, M. Mukred, L.Q. binti Zakaria, and F.A.M. Al-Yarimi, "A Machine Learning Approach for Named Entity Recognition in Classical Arabic Natural Language Processing," KSII Transactions on Internet and Information Systems, vol. 18, no. 10, pp. 2895-2919, 2024, http://dx.doi.org/10.3837/tiis.2024.10.005.

[5] Z. Hu, W. Hou, and X. Liu, "Deep learning for named entity recognition: a survey," Neural Computing and Applications, vol. 36, pp. 8995–9022, 2024, https://doi.org/10.1007/s00521-024-09646-6.

[6] M. Abedi, L. Hempel, S. Sadeghi, and T. Kirsten, "GAN-Based Approaches for Generating Structured Data in the Medical Domain," Applied Science, vol. 12, no. 14, p. 7075, 2022, https://doi.org/10.3390/app12147075.

[7] R. Anam et al., "A deep learning approach for Named Entity Recognition in Urdu language," PLoS ONE, vol. 19, no. 3, p. e0300725, 2024, https://doi.org/10.1371/journal.pone.0300725.

[8] M. N. A. Ali, G. Tan and A. Hussain, "Boosting Arabic Named-Entity Recognition With Multi-Attention Layer," IEEE Access, vol. 7, pp. 46575-46582, 2019, https://doi.org/10.1109/ACCESS.2019.2909641.

[9] E. Çano and M. Morisio, "Word Embeddings for Sentiment Analysis: A Comprehensive Empirical Survey," Computation and Language, 2019, https://doi.org/10.48550/arXiv.1902.00753.

[10] F. Almeida and G. Xexéo, "Word embeddings: A survey," Computation and Language, 2023, https://doi.org/10.48550/arXiv.1901.09069.

[11] A. Allahim and A. Cher, "Advancing Arabic Word Embeddings: A Multi-Corpora Approach with Optimized Hyperparameters and Custom Evaluation," Applied Science, vol. 14, no. 23, p. 11104, 2024, https://doi.org/10.3390/app142311104.

[12] K. Ullah, A. Rashad, M. Khan, Y. Ghadi, H. Aljuaid, Z. Nawaz, "A Deep Neural Network-Based Approach for Sentiment Analysis of Movie Reviews," Complexity, vol. 2022, no. 1, pp. 1-9, https://doi.org/10.1155/2022/5217491.

[13] S. F. Sabbeh and H. A. Fasihuddin, "A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification," Electronics, vol. 12, no. 6, p. 1425, 2023, https://doi.org/10.3390/electronics12061425.

[14] V. -I. Ilie, C. -O. Truic?, E. -S. Apostol and A. Paschke, "Context-Aware Misinformation Detection: A Benchmark of Deep Learning Architectures Using Word Embeddings," IEEE Access, vol. 9, pp. 162122-162146, 2021, https://doi.org/10.1109/ACCESS.2021.3132502.

[15] N. Alsaaran and M. Alrabiah, "Arabic Named Entity Recognition: A BERT-BGRU Approach," Computers, Materials & Continua, vol. 68, no. 1, pp. 471–485, 2021, https://doi.org/10.32604/cmc.2021.016054.

[16] W. Antoun, F. Baly, and H. Hajj, "Arabert: Transformer-based model for Arabic language understanding," Computation and Language, 2020, https://doi.org/10.48550/arXiv.2003.00104.

[17] C. Helwe, G. Dib, M. Shamas, and S. Elbassuoni, "A Semi-Supervised BERT Approach for Arabic Named Entity Recognition," Proceedings of the Fifth Arabic Natural Language Processing Workshop, pp. 49-57, 2020, https://aclanthology.org/2020.wanlp-1.5/.

[18] C. Helwe, G. Dib, M. Shamas, and S. Elbassuoni, "A Semi-Supervised BERT Approach for Arabic Named Entity Recognition," Seminar Slides, 2020, https://a3nm.net/work/seminar/slides/20210204-helwe.pdf.

[19] H. Mahdhaoui, A. Mars, and M. Zrigui, "Active Learning with AraGPT2 for Arabic Named Entity Recognition," Advances in Computational Collective Intelligence, pp. 123–135, 2023, https://doi.org/10.1007/978-3-031-41774-0_18.

[20] H. Mahdhaoui, A. Mars, and M. Zrigui, "Building the ArabNER Corpus for Arabic Named Entity Recognition Using ChatGPT and Bard," Intelligent Information and Database Systems, pp. 159–170, 2024, https://doi.org/10.1007/978-981-97-4982-9_13.

[21] H. Nayel, N. Marzouk and A. Elsawy, "Named Entity Recognition for Arabic Medical Texts Using Deep Learning Models," 2023 Intelligent Methods, Systems, and Applications (IMSA), pp. 281-285, 2023, https://doi.org/10.1109/IMSA58542.2023.10217658.

[22] Z. Zheng, Y. Cang, W. Yang, Q. Tian, and D. Sun, "Named Entity Recognition: A Comparative Study of Advanced Pre-trained Models," Journal of Computer Technology and Software, vol. 3, no. 5, 2024, https://doi.org/10.5281/zenodo.136240.

[23] N. Alsaaran and M. Alrabiah, "Classical Arabic Named Entity Recognition Using Variant Deep Neural Network Architectures and BERT," IEEE Access, vol. 9, pp. 91537-91547, 2021, https://doi.org/10.1109/ACCESS.2021.3092261.

[24] H. Wei et al., "Named Entity Recognition From Biomedical Texts Using a Fusion Attention-Based BiLSTM-CRF," IEEE Access, vol. 7, pp. 73627-73636, 2019, https://doi.org/10.1109/ACCESS.2019.2920734.

[25] A. Shaker, A. Aldarf, and I. Bessmertny, "Using LSTM and GRU with a New Dataset for Named Entity Recognition in the Arabic Language," Computation and Language, 2023, https://doi.org/10.48550/arXiv.2304.03399.

[26] S. Kumar-Birthriya, P. Ahlawat, and A. K. Jain, "Enhanced Phishing Website Detection Using Dual-Layer CNN and GRU with Attention Mechanism and Lexical NLP Features," SN Computer Science, vol. 5, p. 929, 2024, https://doi.org/10.1007/s42979-024-03282-6.

[27] H. Mahdhaoui, A. Mars and M. Zrigui, "Optimizing Arabic Named Entity Recognition through Active Learning and AraBERT," 2023 International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1-5, 2023, https://doi.org/10.1109/INISTA59065.2023.10310315.

[28] A. Chaimae, E. Y. Yacine, M. Rybinski and J. F. A. Montes, "BERT for Arabic Named Entity Recognition," 2020 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), pp. 1-6, 2020, https://doi.org/10.1109/ISAECT50560.2020.9523676.

[29] S. Albahli, “An Advanced Natural Language Processing Framework for Arabic Named Entity Recognition: A Novel Approach to Handling Morphological Richness and Nested Entities,” Applied Sciences, vol. 15, no. 6, p. 3073, 2025, https://doi.org/10.3390/app15063073.

[30] N. Alshammari, S. Alanazi, “An Arabic dataset for disease named entity recognition with multi-annotation schemes,” Data, vol. 5, no. 3, p. 60, 2020, https://doi.org/10.3390/data5030060.

[31] M. Al-Smadi, S. Al-Zboon, Y. Jararweh and P. Juola, "Transfer Learning for Arabic Named Entity Recognition With Deep Neural Networks," IEEE Access, vol. 8, pp. 37736-37745, 2020, https://doi.org/10.1109/ACCESS.2020.2973319.

[32] H.-S. Le, T.-V. H. Do, M. H. Nguyen, H.-A. Tran, T.-T. T. Pham, N. T. Nguyen, and V.-H. Nguyen, "Predictive Model for Customer Satisfaction Analytics in E-commerce Sector Using Machine Learning and Deep Learning," International Journal of Information Management Data Insights, vol. 4, no. 2, p. 100295, 2024, https://doi.org/10.1016/j.jjimei.2024.100295.

[33] A. Youssef, M. Elattar and S. R. El-Beltagy, "A Multi-Embeddings Approach Coupled with Deep Learning for Arabic Named Entity Recognition," 2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES), pp. 456-460, 2020, https://doi.org/10.1109/NILES50944.2020.9257975.

[34] M. Zhang, G. Geng, and J. Chen, "Semi-Supervised Bidirectional Long Short-Term Memory and Conditional Random Fields Model for Named-Entity Recognition Using Embeddings from Language Models Representations," Entropy, vol. 22, no. 2, p. 252, 2020, https://doi.org/10.3390/e22020252.

[35] A. B. Sadallah, O. Ahmed, S. Mohamed, O. Hatem, D. Hesham and A. H. Yousef, "ANER: Arabic and Arabizi Named Entity Recognition using Transformer-Based Approach," 2023 Intelligent Methods, Systems, and Applications (IMSA), pp. 263-268, 2023, https://doi.org/10.1109/IMSA58542.2023.10217635.

[36] K. Abainia, "DZDC12: A New Multipurpose Parallel Algerian Arabizi–French Code-Switched Corpus," Language Resources and Evaluation, vol. 54, pp. 419–455, 2020, https://doi.org/10.1007/s10579-019-09454-8.

[37] M.N.A. Ali and G. Tan, "Bidirectional Encoder–Decoder Model for Arabic Named Entity Recognition," Arabian Journal for Science and Engineering, vol. 44, pp. 9693–9701, 2019, https://doi.org/10.1007/s13369-019-04068-2.

[38] B. A. Benali, S. Mihi, N. Laachfoubi, A. A. Mlouk, “Arabic named entity recognition in arabic tweets using bert-based models,” Procedia Computer Science, vol. 203, pp. 733-738, 2022, https://doi.org/10.1016/j.procs.2022.07.109.

[39] A. Mousa, I. Shahin, A. B. Nassif and A. Elnagar, "Cascaded RBF-CBiLSTM for Arabic Named Entity Recognition," 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), pp. 1-5, 2020, https://doi.org/10.1109/CCCI49893.2020.9256638.

[40] F. Genuario, G. Santoro, M. Giliberti, S. Bello, E. Zazzera, and D. Impedovo, "Machine Learning-Based Methodologies for Cyber-Attacks and Network Traffic Monitoring: A Review and Insights," Information, vol. 15, no. 11, p. 741, 2024, https://doi.org/10.3390/info15110741.

[41] A. Aldumaykhi, S. Otai and A. Alsudais, "Comparing Open Arabic Named Entity Recognition Tools," 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI), pp. 46-51, 2023, https://doi.org/10.1109/IRI58017.2023.00016.

[42] O. Obeid et al, "CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing," Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 7022–7032, 2020, https://aclanthology.org/2020.lrec-1.868.

[43] G. Bourahouat, M. Abourezq, and N. Daoudi, "Word Embedding as a Semantic Feature Extraction Technique in Arabic Natural Language Processing: An Overview," The International Arab Journal of Information Technology, vol. 21, no. 2, pp. 313-325, 2024, https://doi.org/10.34028/iajit/21/2/13.

[44] A. Kutuzov, "Distributional Word Embeddings in Modeling Diachronic Semantic Change," University of Oslo Library, 2020, http://urn.nb.no/URN:NBN:no-84130.

[45] S. Helmstetter and H. Paulheim, "Collecting a Large Scale Dataset for Classifying Fake News Tweets Using Weak Supervision," Future Internet, vol. 13, no. 5, p. 114, 2021, https://doi.org/10.3390/fi13050114.

[46] C. Zhang et al., "From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions for Large Language Models," Computation and Language, 2024, https://doi.org/10.48550/arXiv.2411.05036.

[47] I. Gagliardi and M.T. Artese, "Semantic Unsupervised Automatic Keyphrases Extraction by Integrating Word Embedding with Clustering Methods," Multimodal Technologies and Interaction, vol. 4, no. 2, p. 30, 2020, https://doi.org/10.3390/mti4020030.

[48] B. Yu and Z. Fan, "A Comprehensive Review of Conditional Random Fields: Variants, Hybrids and Applications," Artificial Intelligence Review, vol. 53, pp. 4289–4333, 2020, https://doi.org/10.1007/s10462-019-09793-6.

[49] T. Mayer, E. Cabrio, and S. Villata, "Transformer-Based Argument Mining for Healthcare Applications," ECAI 2020, vol. 325, 2020, https://doi.org/10.3233/FAIA200334.

[50] E. Dayanik, "Challenges of Computational Social Science Analysis with NLP Methods," Online Publications of University Stuttgart, 2022, https://doi.org/10.18419/opus-12530.

[51] M. Al-Duwais, H. Al-Khalifa, and A. Al-Salman, "A Benchmark Evaluation of Multilingual Large Language Models for Arabic Cross-Lingual Named-Entity Recognition," Electronics, vol. 13, no. 17, p. 3574, 2024, https://doi.org/10.3390/electronics13173574.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About the Journal	Journal Policies	Author	Information
Focus and Scope Editorial Board International Peer Review Open Access Statement Sponsorships Contact Us Google Scholar Most Cited Paper	Publication Ethics Peer Review Policy Review Guideline Archiving	Author Guidelines Online Submission Author Fee / Article Publication Charge Plagiarism Policy Article withdrawal	For Readers For Authors Journal History

International Journal of Robotics and Control Systems
e-ISSN: 2775-2658
Website: https://pubs2.ascee.org/index.php/IJRCS
Email: ijrcs@ascee.org
Organized by: Association for Scientific Computing Electronics and Engineering (ASCEE), Peneliti Teknologi Teknik Indonesia, Department of Electrical Engineering, Universitas Ahmad Dahlan and Kuliah Teknik Elektro
Published by: Association for Scientific Computing Electronics and Engineering (ASCEE)
Office: Jalan Janti, Karangjambe 130B, Banguntapan, Bantul, Daerah Istimewa Yogyakarta, Indonesia

Username
Password
Remember me