Authors
- Yarmakhov Boris B. Candidate of Philosophical Science, Associate Professor
- Dreytser Sofya I. Candidate of Philosophical Science, Associate Professor
Annotation
This study raises problematic issues about changing the quality of text generation using large language models using the example of generating answer options for multiple choice questions as part of the development of an adaptive biology textbook. The material provides an overview of the most common metrics for assessing the quality of problem solving using large language models and suggests an author’s metric that is more suitable for solving the identified tasks. After applying the metric to more than 1,000 generated responses, the study concludes on the quality of the generated responses. The disadvantages of measurement are also highlighted, namely the subjectivity of the expert approach to evaluating response options and the problems of automating such measurements are raised.
How to link insert
Yarmakhov, B. B. & Dreytser, S. I. (2025). DEVELOPING ASSIGNMENTS FOR DIDACTIC UNITS OF A DIGITAL ADAPTIVE TEXTBOOK USING LARGE LANGUAGE MODELS Bulletin of the Moscow City Pedagogical University. Series "Pedagogy and Psychology", № 3 (73), 50. https://doi.org/10.24412/2072-9014-2025-373-50-60
References
1.
1. Crocker L. Introduction to classical and modern test theory: textbook / L. Crocker, D. Algina. Moscow: Logos, 2010. 668 p.
2.
2. Karpenko A. P. Test method of quality control of education and criteria of quality of educational tests. Review / A. P. Karpenko, A. S. Domnikov, V. V. Belous // Mechanical engineering and computer technologies. 2011. No. 4. Р. 1–28.
3.
3. Kowash M. Evaluating the Quality of Multiple Choice Question in Paediatric Dentistry Postgraduate Examinations / M. Kowash, I. Hussein, M. Al Halabi // Sultan Qaboos Univ Med J. 2019. No. 19 (2). P. e135–e141.
4.
4. Sajjad M. Nonfunctional distractor analysis: An indicator for quality of Multiple choice questions / M. Sajjad, S. Iltaf, R. A. Khan // Pak J Med Sci. 2020. No. 36 (5). Р. 982–986.
5.
5. Vatsal R. Assessing Distractors in Multiple-Choice Tests / R. Vatsal, V. Raina, A. Liusie, M. Gales // Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems. 2023. P. 12–22.
6.
6. Ansari M. Assessment of distractor efficiency of MCQS in item analysis / M. Ansari, R. Sadaf, A. Akbar // Professional Med. Journal. 2022. No. 29 (5). Р. 730–734.
7.
7. Qiu Zh. Automatic Distractor Generation for Multiple Choice Questions in Standard Tests / Zh. Qiu, X. Wu, W. Fan // Proceedings of the 28th International Conference on Computational Linguistics. 2020. Р. 2096–2106.
8.
8. Pho V.-M. Distractor quality evaluation in Multiple Choice Questions / V.-M. Pho, A.-L. Ligozat, B. Grau // Artificial Intelligence in Education. 2015. Vol. 9112. Р. 377–386.
9.
9. Baldwin P. A Natural-Language-Processing-Based Procedure for Generating Distractors for Multiple-Choice Questions / P. Baldwin, J. Mee, V. Yaneva // Evaluation & the Health Professions. 2022. No. 45 (4). Р. 327–340.
10.
10. Tran A. Generating Multiple Choice Questions for Computing Courses Using Large Language Models / A. Tran, K. Angelikas, E. Rama // IEEE Frontiers in Education Conference (FIE). 2023. Р. 1–8.
11.
11. Sosnin A. V. The relationship of expert categories and automatic metrics used to assess the quality of translation / A. V. Sosnin, Yu. V. Balakina, A. N. Kashchikhin // Bulletin of the Saint Petersburg University. Language and literature. 2022. Vol. 19. No. 1. P. 125–148.
12.
12. Yarmakhov B. B. Generation of questions for an adaptive biology textbook based on artificial intelligence technologies / B. B. Yarmakhov, S. I. Dreitzer // Pedagogical innovation and continuing education in the 21st century: proceedings of the II International Scientific and Practical Conference (Kirov, May 20, 2024). Kirov: Vyatka GATU, 2024. P. 161–165.
13.
13. Tikhonova M. I. Methods of evaluating language models in the tasks of understanding natural language: diss. kand. a computer.Sciences / M. I. Tikhonova. Moscow. 2023. 77 р.
14.
14. Snover M. A study of translation edit rate with targeted human annotation / M. Snover, B. Dorr, R. Schwartz // Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. Cambridge, 2006. P. 223–231.
15.
15. Nuriev V. A. Methods for assessing the quality of machine translation: the current state / V. A. Nuriev, A. Yu. Egorova // Inform. and its application. 2021. Vol. 15. No. 2. P. 104–111.
16.
16. Timokhin I. V. Automation of headline generation for news articles / I. V. Timokhin, N. B. Osipenko // Problems of physics, mathematics and engineering. 2020. No. 3 (44). P. 92–94.
17.
17. Mitrenina O. V. How and which translation is (not) evaluated by computers / O. V. Mitrenina, A. G. Mukhambetkalieva // Journal of applied linguistics and lexicography. 2021. Vol. 3. No. 2. P. 77–84.
18.
18. Yarmakhov B. B. Digital adaptive biology textbook: development and testing / B. B. Yarmakhov, S. V. Sumatokhin, O. V. Kukushkina // Biology at school. 2024. No. 2. P. 23–31.
19.
19. Yarmakhov B. B. The transformation of dialogue in the era of large language models / B. B. Yarmakhov // Dialogue of cultures — the culture of dialogue in a multinational urban space: Proceedings of the Fourth International Scientific and Practical Conference (Moscow, February 27, 2024). Moscow: Languages of the Peoples of the World, 2024. Р. 295–301.
20.
20. Pasechnik V. V. Biology. 6th grade: textbook / V. V. Pasechnik, S. V. Sumatokhin, Z. G. Gaponyuk. Moscow: Prosveshchenie, 2023. 160 p.

