ChatGPT less than 50% accurate in accounting exams, study finds
Accounting students found to be considerably more capable across the board
Accounting students found to be considerably more capable across the board
Accounting students are significantly more capable of answering exam questions correctly than ChatGPT, a global study of educational institutions has found.
Conducted by the American Accounting Association (AAA), the study evaluated ChatGPT’s performance on accounting-specific content by feeding it more than 25,000 assessment questions from 187 institutions around the world and cross-referencing the results with the performance of accounting students.
Across all assessments, including topics such as audit, financial accounting, management accounting and tax, students scored an average of 76.7%, while ChatGPT scored just 47.4%.
“This study provides important insights into the current capabilities of AI compared to human performance in an accounting-specific context,” the study concluded. “It highlights the limitations of an AI chatbot trained on general material.”
Emma Rawson, technical officer at the Association for Accounting Technicians, argues that the results of the study speak to the nuance required in accounting exams, and that while AI-powered chatbots possess the knowledge to perform reasonably well in exams, they lack the practical application skills to excel.
“I think the results are very promising, and speak to the robustness of professional exams. A well written exam question requires candidates to not just repeat rules or quote legislation and guidance, but apply their knowledge to the specific facts and circumstances and identify potential issues.”
But while humans performed better than Chat GPT on the whole, there were several exceptions. For instance, the chatbot superseded the student average on 11.3% of the assessments.
Chat GPT also outperformed students on certain topic areas such as audit, in addition to true/false and multiple-choice questions. In contrast, it struggled with short-answer questions and questions requiring working out.
This squares with Rawson’s view that “the human brain outperforms AI” in a number of disciplines.
“From what I have personally seen, Chat GPT may also be very good at coming to a conclusion, but will not necessarily set out their reasoning in the same level of detail that you would expect a strong candidate to,” she says.
According to the study, the range of results also indicates that the ongoing debate pitting humans against chatbots is “multifaceted”, and that the bot can clearly “approximate human average performance in some topic areas”.
Additionally, it noted that the gap in performance between AI and humans “will likely close”, pointing out that the current ChatGPT model was trained on 175 billion parameters, whereas a model trained on one trillion parameters is likely to become publicly available in 2023.
“However, we can’t be complacent. As these tools learn and evolve they will become more sophisticated,” Rawson adds. “This is an area that the profession, and the professional bodies in particular, need to keep an eye on and adapt accordingly.”
The AAA study also went on to encourage educators to prepare for an AI-powered future, warning of issues such as overreliance on the technology “hampering students’ learning ability”, and “short-circuiting the learning process” through cheating.
The most reliable defence against these things, it argued, is for educators to prepare for a future that includes “broad AI access”, and to engage in discussions about the impact of AI on their teaching.
But the study also acknowledged the potential power of ChatGPT in the world of accounting, arguing that it could “provide the much-needed stimulus” to “reimagine accounting education practices”.
“These are all important questions that accounting educators should discuss and research. As AI technology continues to improve, educators need to prepare themselves and their students for the future, making AI technology a promising area for future research.”