Contextual Interpretation of Nasa Yuwe Language Expressions into Spanish Using Natural Language Processing Techniques
DOI:
https://doi.org/10.18041/1900-3803/entramado.1.13152Keywords:
Indigenous languages, Natural Language Processing, Machine translation, Bilingual corpus, Nasa Yuwe, DiacriticsAbstract
The Nasa Páez Indigenous community, located mainly in the department of Cauca, Colombia, faces a cultural risk due to the progressive loss of its ancestral language, Nasa Yuwe, particularly among younger generations. As language constitutes the core of their worldview, oral tradition, and community life, this study draws on Natural Language Processing techniques to propose technological tools aimed at preserving Nasa Yuwe. While Western languages such as English, Spanish, and French benefit from numerous automatic translation solutions, implementing such technologies for Nasa Yuwe represents a significant challenge, as it is an ultra–low-resource language in digital terms when compared to those languages. To ensure coherence in the collection and systematization of Nasa Yuwe expressions, this research is delimited to the San Lorenzo de Caldono Indigenous Reserve, allowing the study to focus on a specific linguistic variant. To evaluate the developed interpreter, an intention confusion matrix was employed, yielding an efficiency of 100%, measured through the F1-score, and a precision of 100% in the recognition of cultural expressions, based on a total of 144 evaluated samples. Additionally, the evaluation of cultural entity extraction, conducted using the confusion matrix generated by the DIETClassifier model, showed an average (macro) precision of 91.1%, a recall of 95.3%, and an overall efficiency of 92.8% over 955 annotated entities, confirming the system’s effective performance in identifying semantic components inherent to the Nasa Yuwe language.
Downloads
References
1. AGUILAR SANTIAGO, César Antonio; GARCÍA ZÚÑIGA, Hamlet Antonio. Tecnologías del lenguaje aplicadas al procesamiento de lenguas indígenas en México: una visión general. Lingüística y Literatura. 2023. vol. 44, no. 84, p.79-102. https://doi.org/10.17533/udea.lyl.n84a04
2. AMERICASNLP. Recursos y proyectos de procesamiento de lenguas naturales en América. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas – Universidad Nacional Autónoma de México. 2026 https://turing.iimas.unam.mx/americasnlp/
3. BAUTISTA MORALES, Rolando; MARTÍNEZ RAMÍREZ, Yobani; ROCHA PEÑA, Luis Enrique; MONTES SANTIAGO, Reyna Elisa. Arquitectura de un traductor automático para el idioma Mixteco: un enfoque específico para lenguas indígenas con escasos recursos lingüísticos. En: Revista de Investigación en Tecnologías de la Información. 2024. vol. 12, no. 28. p. 71-81
https://doi.org/10.36825/RITI.12.28.007
4. BAYRAM Mehmet, Ali; FINCAN, Ali Arda; GÜMÜŞ, Ahmet Semih; KARAKAŞ, Sercan; DIRI, Banu; YILDIRIM, Savaş. Tokenization standards for linguistic integrity: Turkish as a benchmark. In: arXiv. 2025. https://doi.org/10.48550/arXiv.2502.07057
5. BUNK, Tobias; VARSHNEYA, Devang; VLASOV, Vladislav; NICHOL, Alan. DIET: Lightweight language understanding for dialogue systems. In: arXiv preprint, 2020. https://doi.org/10.48550/arXiv.2004.09936
6. COLOMBIA MINISTERIO DE CULTURA. Plan decenal de lenguas nativas de Colombia. Bogotá: Ministerio de Cultura de Colombia, 2020 https://www.onic.org.co/images/Cartilla_plan_decenal_de_lenguas_nativas.pdf
7. COLOMBIA MUNICIPIO DE CALDONO CAUCA. Plan de Desarrollo Municipal 2020–2023. Caldono: Alcaldía Municipal de Caldono, 2023
8. CONSEJO REGIONAL INDÍGENA DEL CAUCA-CRIC. Estructura organizativa. Popayán, 2024
https://www.cric-colombia.org/portal/estructura-organizativa/
9. DEPARTAMENTO ADMINISTRATIVO NACIONAL DE ESTADÍSTICA - DANE. DANE - Infografía: Perfil Cauca. Bogotá, 2018 https://sitios.dane.gov.co/cnpv/app/views/informacion/perfiles/19_infografia.pdf
10. DEPARTAMENTO ADMINISTRATIVO NACIONAL DE ESTADÍSTICA - DANE. Población indígena de Colombia. Bogotá, 2019
11. DOWNEY, Anna; ETXEBERRIA, Urtzi; MÜLLER, Mathias; COTTERELL, Ryan. Unsupervised Multilingual Sequential Segmentation for Extremely Low-Resource Languages. En: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021.
12. FENG, Steven Y.; GANGAL Varun; WEI, Jason; CHANDAR, Sarath; VOSOUGHI, Soroush; MITAMURA, Teruko; HOVY, Eduard. A survey of data augmentation approaches for NLP. In: arXiv preprint, 2021
https://doi.org/10.48550/arXiv.2105.03075
13. GUTIÉRREZ ARRIAGA, Óscar Felipe; ALVARADO RODRÍGUEZ, María Elena. La importancia social y jurídica de la conservación de las lenguas indígenas en la Ciudad de México. In: LATAM Revista Latinoamericana de Ciencias Sociales y Humanidades. 2024. vol. 5, no. 5. p. 1133-1148 https://doi.org/10.56712/latam.v5i5.2650
14. JEHANGIR, Basra; RADHAKRISHNAN, Shyam; AGARWAL, Ramesh. A survey on named entity recognition: datasets, tools, and methodologies. In: Natural Language Processing Journal. 2023, vol. 3. e100017
https://doi.org/10.1016/j.nlp.2023.100017
15. JURAFSKY, Daniel; MARTIN, James H. Speech and language processing. 3ª ed. Stanford: Stanford University, 2025 https://web.stanford.edu/~jurafsky/slp3/ed3book_aug25.pdf
16. KANN, Katharina; ETTINGER, Allyson; ANTONIO, Esteban; MURADOĞLU, S. M.; MAGER, Manuel; ONO, Hiroki; ORTIZ, María; VARGAS, Ricardo; WU, Shijie. AmericasNLI: Evaluating Zero-shot Natural Language Inference in Indigenous Languages of the Americas. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). Dublin, Ireland: Association for Computational Linguistics, 2022
17. LLERENA GARCÍA, Edgardo; DÍAZ PERNETT, Manuel. Herramientas tecnológicas y didácticas para la conservación de la lengua Emberá. En: Lingüística y Literatura. 2023. vol. 44, no. 84, p. 124-153
https://doi.org/10.17533/udea.lyl.n84a06
18. LOZANO LOZANO, Carlos Aníbal; CARVAJAL CALDERÓN, Paola Hasbleidy; ÁLVAREZ GÓMEZ, Cristina. Tecnologías emergentes para la formación sobre consulta previa y liderazgo: una propuesta para jóvenes de la comunidad indígena Wayuú de la Guajira. Universidad Ean. 2025. https://hdl.handle.net/10882/15374
19. MUÑOZ BURBANO, Pablo Enrique; JOJOA GÓMEZ, Pedro Enrique; CASTRO CAICEDO, Fernando Manuel. Implementation of a Voice Recognition System in the Nasa Yuwe Language Based on Convolutional Neural Networks. In: Revista Ingeniería Solidaria. 2023, vol. 19, no. 1 https://dialnet.unirioja.es/servlet/articulo?codigo=10116949
20. NAGOUDI, El Moatez Billah; CHEN, Wei-Rui; ABDUL MAGEED, Muhammad; CAVUSOGLU, Hasan. IndT5: a text-to-text transformer for 10 indigenous languages. In: arXiv Preprint. 2021. https://doi.org/10.48550/arXiv.2104.07483
21. PARANKUSHAM, Kanishka; RIZK, Rodrigue; KC, Santosh. LakotaBERT: A Transformer-based Model for Low Resource Lakota Language. In: arXiv Preprint. 2025. https://arxiv.org/abs/2503.18212
22. POWERS, David M. W. Evaluation: from precision, recall and F‑Measure to ROC, Informedness, Markedness and Correlation. In: arXiv preprint, 2020. https://doi.org/10.48550/arXiv.2010.16061
23. RASCHKA, Sebastian; MIRJALILI, Vahid. Python machine learning. 3ª ed. Birmingham–Mumbai: Packt Publishing, 2019
24. RASA-TECHNOLOGIES-GMBH. Custom actions. Rasa Open Source Documentation, 2023
https://rasa.com/docs/rasa/custom-actions
25. RASA-TECHNOLOGIES-GMBH.Rasa documentation, 2025 https://rasa.com/docs/
26. RASA-TECHNOLOGIES-GMBH. Evaluating your assistant (E2E testing). Berlín, 2025 https://rasa.com/docs/pro/testing/evaluating-assistant/
27. RONGALI, Sateesh Kumar. Natural language processing (NLP) in artificial intelligence. In: World Journal of Advanced Research and Reviews. 2025. vol. 25, no. 1, p. 2515-2519 https://doi.org/10.30574/wjarr.2025.25.1.0277
28. SALAZAR CÁRDENAS, Isabel. Machine translation strategies for low-resource Colombian indigenous languages. Universidad de los Andes, 2022. https://hdl.handle.net/1992/62941
29. SIERRA, Luz Marina; COBOS, Carlos Alberto; CORRALES, Juan Carlos; ROJAS CURIEUX, Tulio. Building a Nasa Yuwe language test collection. In: GELBUKH Alexander (ed.). Computational Linguistics and Intelligent Text Processing. Cham: Springer International Publishing, 2015. Lecture Notes in Computer Science, vol. 9041. p.112-123. https://doi.org/10.1007/978-3-319-18111-0_9
30. SIERRA MARTÍNEZ, Luz Marina; COBOS, Carlos Alberto; CORRALES, Juan Carlos. Tokenizer adapted for the Nasa Yuwe language. In: Computación y Sistemas. 2016. vol. 20, no. 3. p. 355-364
https://doi.org/10.13053/CyS-20-3-2455
31. SIERRA MARTÍNEZ, Luz Marina; COBOS, Carlos Alberto; CORRALES MUÑOZ, Juan Carlos; ROJAS CURIEUX, Tulio; HERRERA-VIEDMA, Enrique; PELUFFO-ORDÓÑEZ, Diego Hernán. Building a Nasa Yuwe language corpus and tagging with a metaheuristic approach. In: Computación y Sistemas. 2018. vol. 22. p. 881-894
http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462018000300881
32. SIERRA MARTÍNEZ, Luz Marina; COBOS, Carlos Alberto; CORRALES, Juan Carlos; ROJAS CURIEUX, Tulio; GÓMEZ, Luis Carlos. Sistema de recuperación de información para apoyar la revitalización del Nasa Yuwe. In: RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação. 2019. no. 17, p. 407-422 https://www.researchgate.net/publication/330844598_Information_Retrieval_System_for_Nasa_Yuwe
33. SOLANO JIMÉNEZ, Miguel Alexis; TOBAR CIFUENTES, José Julio; SIERRA MARTÍNEZ, Luz Marina; COBOS, Carlos Alberto. Adaptation, comparison, and improvement of metaheuristic algorithms to the part-of-speech tagging problem. In: Revista Facultad de Ingeniería Universidad de Antioquia. 2020, vol. 29, no. 54, e11762 https://doi.org/10.19053/01211129.v29.n54.2020.11762
34. SOYLU, Dilek; ŞAHİN, Ayşe. The Role of AI in Supporting Indigenous Languages. In: AI and Tech in Behavioral and Social Sciences, 2024. vol. 2, no. 4, p. 11-18 https://doi.org/10.61838/kman.aitech.2.4.2
35. TONJA, Atnafu Lambebo; BALOUCHZAHI, Fazlourrahman; BUTT, Sabur; KOLESNIKOVA, Olga; CEBALLOS, Héctor; GELBUKH, Alexander; SOLORIO, Thamar. NLP progress in indigenous Latin American languages. Proceedings of NAACL, 2024 https://doi.org/10.18653/v1/2024.naacl-long.385
36. UNESCO Office Montevideo and Regional Bureau for Science in Latin America and the Caribbean; GONZÁLEZ ZEPEDA, Luz Elena; MARTÍNEZ PINTO, Cristina Elena. Inteligencia artificial centrada en los pueblos indígenas: perspectivas desde América Latina y el Caribe. Montevideo: UNESCO Office Montevideo and Regional Bureau for Science in Latin America and the Caribbean, 2023. 53 p. https://unesdoc.unesco.org/ark:/48223/pf0000387814
37. URBINA PULIDO, Fabián-Andrés. Proyecto de traducción del navegador Firefox a la lengua Nasa: un paso en la inclusión digital. En: Inclusión y Desarrollo, 2018. vol. 5, no. 2. p. 125-142
https://doi.org/10.26620/uniminuto.inclusion.5.2.2018.125-142
38. URIBE MUÑOZ, Cristian Mauricio. Linguistic Weakening of Nasa-Yuwe in the Path Yu’ Community (Cauca, Colombia). En: Íkala, Revista de Lenguaje y Cultura. 2025. vol. 30, no. 3. https://doi.org/10.17533/udea.ikala.360057
39. ZHONG, Chenxing; FEITOSA, Daniel; AVGERIOU, Paris; HUANG, Huang; LI-Yue; ZHANG-He. PairSmell: A novel perspective inspecting software modular structure. In: arXiv preprint, 2024 https://arxiv.org/abs/2411.01012
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Entramado

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.