Collins, K. M., Jiang, A. Q., Frieder, S., Wong, L., Zilka, M., Bhatt, U., Lukasiewicz, T., Wu, Y., Tenenbaum, J. B., Hart, W., Gowers, T., Li, W., Weller, A., & Jamnik, M. (2024). Evaluating Language Models for Mathematics through Interactions. Proceedings of the National Academy of Sciences of the United States of America, 121(24), Article e2318124121. https://doi.org/10.1073/pnas.2318124121