A Hidden Markov Model-based Part of Speech Tagger for Shekki’noono Language

Authors

  • Alebachew Chiche
  • Hiwot Kadi
  • Tibebu Bekele

DOI:

https://doi.org/10.47839/ijc.20.4.2448

Keywords:

Parts of speech tagger, HMM, NLP, Shekki’noono language, Bigram

Abstract

Natural language processing plays a great role in providing an interface for human-computer communication. It enables people to talk with the computer in their formal language rather than machine language. This study aims at presenting a Part of speech tagger that can assign word class to words in a given paragraph sentence. Some of the researchers developed parts of speech taggers for different languages such as English Amharic, Afan Oromo, Tigrigna, etc. On the other hand, many other languages do not have POS taggers like Shekki’noono language.  POS tagger is incorporated in most natural language processing tools like machine translation, information extraction as a basic component. So, it is compulsory to develop a part of speech tagger for languages then it is possible to work with an advanced natural language application. Because those applications enhance machine to machine, machine to human, and human to human communications. Although, one language POS tagger cannot be directly applied for other languages POS tagger. With the purpose for developing the Shekki’noono POS tagger, we have used the stochastic Hidden Markov Model. For the study, we have used 1500 sentences collected from different sources such as newspapers (which includes social, economic, and political aspects), modules, textbooks, Radio Programs, and bulletins.  The collected sentences are labeled by language experts with their appropriate parts of speech for each word.  With the experiments carried out, the part of speech tagger is trained on the training sets using Hidden Markov model. As experiments showed, HMM based POS tagging has achieved 92.77 % accuracy for Shekki’noono. And the POS tagger model is compared with the previous experiments in related works using HMM. As a future work, the proposed approaches can be utilized to perform an evaluation on a larger corpus.

References

B. A. Bilel and Y. Fethijarra, “Genetic approach tagging,” International Journal on Natural Language Computing (IJNLC), vol. 2, no. 3, pp. 1-12, 2013. https://doi.org/10.5121/ijnlc.2013.2301.

K. Deepika and J. Vinesh, “POS tagging approaches: A comparison,” International Journal of Computer Applications, vol. 118, no. 6, pp. 32-38, 2015. https://doi.org/10.5120/20752-3148.

A. Tukur, K. Umar and S. A. S. Muhammad, “Parts-of-speech tagging of Hausa-based texts using hidden Markov model,” Dutse Journal of Pure and Applied Sciences (DUJOPAS), vol. 6, no. 2, pp. 303-313, 2020.

M. Getachew, M. Million, “Parts of speech tagging for Afaan Oromo,” Int J Adv Comput Sci Appl, 2015. https://doi.org/10.14569/SpecialIssue.2011.010301.

Y. Getnet, Unsupervised POS tagging for Amharic, Master's Thesis, University of Gondar, Ethiopia, unpublished, 2015.

J. Singh, N. Joshi and I. Mathur, “Part of speech tagging of Marathi text abstract using trigram method,” International Journal of Advanced Information Technology, vol. 3, no. 2, pp. 35-41, 2013. https://doi.org/10.5121/ijait.2013.3203.

D. Kumar, “Part of speech tagger for morphologically rich Indian languages: A survey,” International Journal of Computer Applications, vol. 6, no. 5, pp. 1-9, 2010. https://doi.org/10.5120/1078-1409.

G. Lisette, P. Aurora and R. Leonel, “A proposal of a morphological tagger for Spanish based on Cuban corpora,” Proceedings of the International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 210-214, 2018.

Z. Fantahun, Unsupervised Part of Speech Tagger for Amharic Language, MSc. Thesis, Addis Ababa University, Addis Ababa, 2013.

A. J. P. M. P. Jayaweera, and N. G. J. Dias, “Hidden Markov model based on art of speech tagger for Sinhala language,” International Journal on Natural Language Computing (IJNLC), vol. 3, no. 3, pp. 9-23, 2014. https://doi.org/10.5121/ijnlc.2014.3302.

S. Mohammed, “Using machine learning to build POS tagger for under-resourced language: the case of Somali,” International Journal Information Technology, vol. 12, pp. 717-729, 2020. https://doi.org/10.1007/s41870-020-00480-2.

K. R. Singha, B. S. Purkayastha and K. D. Singha, “Part of speech tagging in Manipuri with hidden Markov model,” IJCSI International Journal of Computer Science, vol. 9, no. 6, pp. 146-149, 2012.

B. Gamback, “Tagging and verifying an Amharic news corpus,” Proceedings of the Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL8/AfLaT2012), 2012, pp. 79-84.

S. T. Abate and M. Y. Tachbelie, “Designing and creation of pronunciation lexicons for speech processing in under-resourced and morphologically rich languages: The case of Amharic,” Project Report of a Google supported research, 2014.

Z. Mekuria, Design and Development of Part-of-Speech Tagger for Kafi-Noonoo Language, MSc. thesis, Addis Ababa University, School of Graduate Studies, College of Natural Sciences, Department of Computer Science, November, 2013. https://doi.org/10.1007/978-3-642-54906-9_17.

K. Yemane, Y. Kazuhide, M. Ashuboda, “Tigrinya part-of-speech tagging with morphological patterns and the new Nagaoka Tigrinya corpus,” Int J Comput Appl, vol. 146, no. 14, pp. 975–987, 2016. https://doi.org/10.5120/ijca2016910943.

A. J. P. M. P. Jayaweera, and N.G.J. Dias, “Hidden Markov model based on art of speech tagger for Sinhala language,” International Journal on Natural Language Computing (IJNLC), vol. 3, no. 3, pp. 9-23, 2014. https://doi.org/10.5121/ijnlc.2014.3302.

Gashaw and H. L. Shashirekha, “Machine learning approaches for amharic parts-of-speech tagging,” Proceedings of the ICON-2018, Patiala, India, pp. 69–74, 2018.

Diesn, “Part of speech tagging for English text data,” School of Computer Science, Carnegie Mellon University, Unpublished, p. 1–8.

Bechiro, D. Ambo, Shekki’noone Fiinniiyeessona shicheesse sheero, Maasha, Shekka: unpublished, version 3, 2017.

T. Nedjo, D. Huang, X. Liu, “Automatic part-of-speech tagging for Oromo language using maximum entropy Markov model (MEMM),” J Inf Comput Sci, vol. 11, no. 10, pp. 3319–3334, 2014. https://doi.org/10.12733/jics20103906.

Downloads

Published

2021-12-31

How to Cite

Chiche, A., Kadi, H., & Bekele, T. (2021). A Hidden Markov Model-based Part of Speech Tagger for Shekki’noono Language. International Journal of Computing, 20(4), 587-595. https://doi.org/10.47839/ijc.20.4.2448

Issue

Section

Articles