Yan, W., Liu, H., wang, yunkun, Li, Y., Chen, Q., Wang, W., Lin, T., Zhao, W., Zhu, L., Sundaram, H., & Deng, S. (2024). CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 5511–5558). https://doi.org/10.18653/v1/2024.acl-long.301