Issues in the Design and Implementation of Chatbots for Oral Language Assessment

Document Type : Research Article


1 Filologia Moderna-Intituto Franklin, College of Education, Universidad de Alcala, Alcala Henares, Spain

2 Linguistica Aplicada, E.T.S. de Ing. de Caminos Canales, Universidad Politacnica de Madrid, Madrid, Spain

3 Department, E.T.S.I. Aeroespacial y Diseño Industrial, Universidad Politecnica de Valencia, Valencia, Spain


Oral assessment in computer-assisted language learning is one of the best-known challenges at a technical and implementation level in an official language certification. The case of Spain is especially critical since the government has delayed for years the completion of the listening comprehension and oral expression test in the University Access Test (EVAU). This article presents first the evolution of oral tests at a general level, then a SWOT analysis of the potential of such implementation, and, third, how to implement them and the paper concludes that there is evidence that chatbots adapted to language learning can also be used for evaluation. Chatbot-assisted language learning with artificial intelligence adapted to voice recognition and its processing to obtain semi-automatic assessment supervised by the teacher can become a tool to be implemented within the language learning processes and / or included in the language certification methodology. Another interesting aspect could be the development and design of the interfaces of future adapted chatbots that must consider multimodality in the interaction between man-machine so that communication is effective and can be validated based on the knowledge available to the student.


Ayedoun E., Hayashi Y., & Seta, K. (2015). A Conversational agent to encourage willingness to communicate in the context of English as a foreign language. Procedia Computer Science, 60(1), 1433-1442.
Ayedoun, E., Hayashi, Y., & Seta, K. (2019). L2 learners’ preferences of dialogue agents: A key to achieve adaptive motivational support? In Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., & Luckin, R. (Eds.), Artificial Intelligence in Education. AIED 2019. Lecture Notes in Computer Science, 11626 (pp. 19-23). Springer, Cham.
Börsting, I., & Hesenius, M. (2021). Towards a systematic approach for chatbot development in digital work environments. In Klumpp, M., & Ruiner, C. (Eds.), Digital Supply Chains and the Human Factor. Lecture Notes in Logistics (pp. 79-94). Springer, Cham.
Darabi Bazvand, A., & Ahmadi, A. (2020). Interpreting the validity of a high-stakes test in light of the argument-based framework: Implications for test improvement. Journal of Research in Applied Linguistics, 11(1), 66-88.  
Cerdeira J. M., Catela Nunes L., Balcão Reis A., & Seabra, C. (2018). Predictors of student success in higher education: Secondary school internal scores versus national exams. Higher Education Quarterly, 72(4), 303-313.
Díez-Arcón, P., & Martin-Monje, E. (2023). Language teacher development in computer-mediated collaborative work and digital peer assessment: An innovative proposal. Journal of Research in Applied Linguistics, 14(2), 40-54. doi: 10.22055/rals.2023.44054.3080
European Comission (2021). Proposal for a regulation of the European Parliament and of the Council amending Regulation (EU) No 910/2014 as regards establishing a framework for a European Digital Identity. EUR-Lex, European Digital Identity Regulation.
Fernandez Alvarez, M., García Laborda, J., & Magal-Royo, T. (2022). Subrepresentación del constructo en exámenes estandarizados de lengua extranjera en España: propuesta de examen asistido por ordenador. Porta Linguarum Revista Interuniversitaria De Didáctica De Las Lenguas Extranjeras, Monográfico 2022, 27–45.
Finney, S. J., Myers, A. J., & Mathers, C. E. (2018) Test instructions do not moderate the indirect effect of perceived test importance on test performance in low-stakes testing contexts. International Journal of Testing, 18(4), 297-322.
Fryer, L. K., Coniam, D., Carpenter, R., & Lăpușneanu, D. (2020). Bots for language learning now: Current and future directions. Language, Learning and Technology, 24(2), 8-22.
Fryer, L. K., Ainley, M., Thompson, A., Gibson, A., & Sherlock, Z. (2017). Stimulating and sustaining interest in a language course: An experimental comparison of chatbot and human task partners. Computers in Human Behavior 75, 461–468.
Følstad, A., & Brandtzæg, P. B. (2017). Chatbots and the new world of HCI. Interactions, 24(4), 38–42.
García Laborda, J., & Amengual Pizarro, M. (2022). Revisando el mundo de la evaluación lingüística. Análisis DAFO de 2020. In Romera Ciria, M., & Camino Bueno Alastuey, M. (Coord.) Didáctica de la lengua, multimodalidad y nuevos entornos de aprendizaje (pp. 265-278). Graó Editors, Barcelona, Spain.
García Laborda, J., & Fernández Álvarez, M. (2021). Multilevel language tests: Walking into the land of the unexplored. Language Learning & Technology, 25(2), 1–25.
García Laborda, J., & Martín-Monje, E. (2013). Item and test construct definition for the new Spanish baccalaureate final evaluation: A proposal. International Journal of English Studies, 13(2), 69-88.
Goertler, S., & Gacs, A. (2018). Assessment in online German: Assessment methods and results. Die Unterrichtspraxis. Teaching German, 51(2), 156–174.
Guapacha Chamorro, M. E.  (2022). Cognitive validity evidence of computer-and paper-based writing tests and differences in the impact on EFL test-takers in classroom assessment. Assessing Writing, 51.  
Haristiani, N. (2019). Artificial intelligence (AI) chatbot as language learning medium: An inquiry. Journal of Physics: Conference Series, 1387, International Conference on Education, Science and Technology, 13–16 March 2019, Padang, Indonesia.
Harley, J. M., Mantou Lou, N., Liu, Y., Cutumisu, M., Daniels, L. M, Leighton, J. P. & Nadon, L. (2021). University students’ negative emotions in a computer-based examination: the roles of trait test-emotion, prior test-taking methods and gender. Assessment & Evaluation in Higher Education, 46(6), 956-972.
Huang W, Hew K.F. & Fryer L.K. (2021). Chatbots for language learning. Are they really useful? A systematic review of chatbot-supported language learning. Journal of Computer Assisted Learning, 38(1), 237-257.
In’nami, Y. (2006). The effects of test anxiety on listening test performance. System, 34(3), 317-340.
Jia, J., Chen, Y., Ding, Z., & Ruan, M. (2012). Effects of a vocabulary acquisition and assessment system on students’ performance in a blended learning class for English subject. Computers & education, 58(1), 63-76.
Kim, H. S., Kim, N. Y., & Cha, Y. (2021). Is it beneficial to use AI chatbots to improve learners' speaking performance? The Journal of Asia TEFL, 18(1), 161-178. 
Kumar, J. A. (2021). Educational chatbots for project-based learning: investigating learning outcomes for a team-based design course. International Journal of Education Technology High Education, 18(65).
Lorenzo Moledo, M., Argos González, J., Hernández García, J., & Vera Vila, J. (2014). El acceso y la entrada del estudiante a la universidad: situación y propuestas de mejora facilitadoras del tránsito = Access and student entrance to the University: status and improvement proposals facilitating transit. Educación XXI: Revista de la Facultad de Educación, 17(1), 15-38.
Magal Royo, T., & García Laborda, J. (2018). Standardization of design interfaces applied to language test on-line through ubiquitous devices. International Journal of Interactive Mobile Technologies (iJIM), 12(4), 21-31.
Magal Royo, T., & García Laborda, J. (2022). Communicative competence of mediation assessment language learning through the use of chatbots. EDULEARN22 Proceedings. 14th International Conference on Education and New Learning Technologies, 4-6 July 2022 Palma, Spain, 3463-3469.
Martín Mazón, R. (2021). A chatbot on syntactic issues: A proposal for innovative help in the first year of baccalaureate classroom. Alcalibe: Revista Centro Asociado a la UNED Ciudad de la Cerámica, 21, 83-110. 
Nguyen, Q., Sidorova, A., & Torres, R. (2021). User interactions with chatbot interfaces vs. Menu-based interfaces. Computers in Human Behavior, 128.
Nordberg, O. E., & Guribye, F. (2023). Interacting with the news through voice user interfaces. In Følstad, A., et al. (Eds.), Chatbot Research and Design. CONVERSATIONS 2022. Lecture Notes in Computer Science, 13815. Springer, Cham.
Okonkwo, C. W., & Ibijola, A. A (2021). Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence, 2.
Olani, A. (2009). Predicting first year university students' academic success. Electronic Journal of Research in Educational Psychology, 7(3), 1053-1072.
Pahissa, I., & Tragant, E. (2009). Grammar and the non-native secondary school teacher in Catalonia. Language Awareness, 18(1), 47-60.
Peñate Cabrera, M. (2014). Choosing a speaking test in English as a foreign language for the university entrance exam. Didáctica (lengua y literatura), 26, 377-400.
Poehner, M. E., & Lantolf, J. P. (2023). Advancing L2 dynamic assessment: Innovations in Chinese contexts. Language Assessment Quarterly, 20(1), 1-19.
Poehner, M. E., & Leontjev, D. (2020). To correct or to cooperate: Mediational processes and L2 development. Language Teaching Research, 24(3), 295-316.
Ross, S. J., & Okabe, J. (2006). The subjective and objective interface of bias detection on language tests. International Journal of Testing, 6(3), 229-253.
Stenlund, T., Lyrén, P. E., & Eklöf, H. (2018). The successful test taker: exploring test-taking behavior profiles through cluster analysis. European journal of psychology of education, 33(2), 403-417.
Soodmand Afshar, H. (2020). Test-takers’ perceptions of paired speaking tests and the role of interlocutor variables in pairing. Journal of Research in Applied Linguistics11(1), 89-123. doi: 10.22055/rals.2020.15418
Vuorikari, R., Punie, Y., Carretero Gomez, S., & Van Den Brande, G. (2016). DigComp 2.0: The digital competence framework for citizens. Update phase 1: The conceptual reference model. EUR 27948 EN. Luxembourg, Publications Office of the European Union.
Vuorikari, R., Kluzer, S. & Punie, Y., (2022). DigComp 2.2: The digital competence framework for citizens with new examples of knowledge, skills and attitudes, EUR 31006 EN. Luxembourg, Publications Office of the European Union. https:///
Weir, C. J. (2005). Language testing and validation. An evidence-based approach. Research and practice in applied linguistics (RPAL), Palgrave Macmillan, London.
Winkler, R.., & Soellner, M. (2018). Unleashing the potential of chatbots in education: A state-of-the-art analysis. In Academy of Management Annual Meeting Proceedings (Vol. 1, 15903).
Xu, Y., Wang, D., Collins, P., Lee, H., & Warschauer, M. (2021). Same benefits, different communication patterns: Comparing Children's reading with a conversational agent vs. a human partner. Computers & Education, 161.
Yin, J., Goh, T. T., Yang, B., & Xiaobin, Y. (2021). Conversation technology with micro-learning: The impact of chatbot-based learning on students’ learning motivation and performance. Journal of Educational Computing Research, 59(1), 154–177.