Google/Yandex Translation Detection in the Patterns Identifying System of Multilingual Texts
Keywords:Google Translate, Yandex.translate, English, Russian, Kazakh, FuzzyWuzzy
The object of this work is to develop a script for evaluating the ability of online translators to translate text from one language to another. For this purpose, we used Google Translate and Yandex.Translate. Examples from English, Kazakh and Russian languages were used for the analysis of 147 news items and about 1800 sentences. The texts are taken from an Internet resource astana.gov.kz. A corpus of parallel texts for three languages has been created. We used development for the “sentence” pattern with the prospect of further development for the “text” pattern. We analyzed errors in the following categories: untranslated/omitted words, extra words, incorrect word endings, incorrect word order, punctuation errors, mutilate translation and incorrect translation. Based on the analysis of the obtained data we have concluded that it is better to do the translation of the Russian text into Kazakh or English in the YandexTranslate than in Google Translate. The developed comparison script and error analysis script are available on the Internet in open access.
Wikipedia, List of languages by number of native speakers, 2020, [Online]. Available at: https://en.wikipedia.org/w/index.php?title=List_of_languages_by_number_of_native_speakers&oldid=957968997.
Language Learning, Multilingual People, 2018, [Online]. Available at: http://ilanguages.org/bilingual.php.
Research and Markets, Global Language Services Market 2020-2024, 2020. [Online]. Available at: https://www.researchandmarkets.com/reports/4894434/global-language-services-market-2020-2024.
Google Translator, 2020. [Online]. Available at: https://translate.google.com/.
Yandex.Translator, 2020. [Online]. Available at: https://translate.yandex.com/.
S. Seljan, M. Tucaković, I. Dunđer, “Human evaluation of online machine translation services for English/Russian-Croatian,” in: A. Rocha, A.M. Correia, S. Costanzo, L.P. Reis (Eds.), New Contributions in Information Systems and Technologies, Springer International Publishing, Cham, 2015, pp. 1089–1098. https://doi.org/10.1007/978-3-319-16486-1_108.
A. Sukhoverkhov, D. DeWitt, I. Manasidi, K. Nitta, V. Krstic, “Lost in machine translation: Contextual linguistic uncertainty,” Science Journal of VolSU. Linguistics, vol. 18, pp. 129–144, 2019. https://doi.org/10.15688/jvolsu2.2019.4.10.
Z. Bülbül, A. Çetinkaya, and F. Arıcı, Google Translate and Yandex Translate’s Differences in Naturalness, Clarity, and Accuracy: A Comparison Study on Machine Translation, 2020, [Online]. Available at: https://www.researchgate.net/publication/339029502_Google_Translate_and_Yandex_Translate's_Differences_in_Naturalness_Clarity_and_Accuracy_A_Comparison_Study_on_Machine_Translation.
O. Mohammed, S. Samad, “Machine translation strategies of translating death euphemistic expressions from Arabic into English and vice versa,” An International Peer-Reviewed Open Access Journal, pp. 114-121, 2020.
Google Trends, Comparison, 2020. [Online]. Available at: https://trends.google.com/trends/explore?date=all&geo=KZ&q=%2Fm%2F025sndk,%2Fg%2F11x1nzgtw,%2Fm%2F02z9kkt.
PHP, similar_text – Manual, 2020. [Online]. Available at: https://www.php.net/manual/en/function.similar-text.php.
ChairNerd, FuzzyWuzzy: Fuzzy String Matching in Python, 2011. [Online]. Available at: https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/.
R.S. Sandhu, J. Shin, K.C. Wang, and G. Shih, “Single-center experience implementing the LOINC-RSNA radiology playbook for adult Abdomen/Pelvis CT and MR procedures using a semi-automated method,” Journal of Digital Imaging, vol. 31, pp. 124–132, 2018. https://doi.org/10.1007/s10278-017-0016-0.
P. Kanani and Dr. M. Padole, “Improving pattern matching performance in genome sequences using run length encoding in distributed Raspberry Pi clustering environment,” Procedia Computer Science, vol. 171, pp. 1670–1679, 2020. https://doi.org/10.1016/j.procs.2020.04.179.
G. Yerkebulan, Scripts developed to compare Google translate and Yandex.Translator, 2020. [Online]. Available at: http://102030.kz/works.php.
PHP, mb_strtolower – Manual, 2020. [Online]. Available at: https://www.php.net/manual/en/function.mb-strtolower.php.
PHP, preg_replace – Manual, 2020. [Online]. Available at: https://www.php.net/manual/en/function.preg-replace.php.
M. Porter, Porter Stemming Algorithm, 2006. [Online]. Available at: https://tartarus.org/martin/PorterStemmer/
M. Porter, Russian Stemming Algorithm, 2020. [Online]. Available at: http://snowball.tartarus.org/algorithms/russian/stemmer.html.
Wyndow, fuzzywuzzy, 2017. [Online]. Available at: https://github.com/wyndow/fuzzywuzzy.
Bing Microsoft Translator, 2020. [Online]. Available at: https://www.bing.com/translator.
G. Yerkebulan, Yandex and Google Translate Compare – Google Disk, 2020. [Online]. Available at: https://drive.google.com/drive/folders/1tPI42nCbaNZvlggnxclkf1vQoQS0ecgk?usp=sharing.
Wikipedia, Newline, 2020. [Online]. Available at: https://en.wikipedia.org/w/index.php?title=Newline&oldid=957966639.
How to Cite
LicenseInternational Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:
• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.