A groundbreaking study by researchers at the University of Chicago has revealed that OpenAI's GPT-4 can outperform human analysts in predicting future earnings based on financial statement analysis. This discovery has significant implications for the future of financial analysis and decision-making, as it demonstrates the potential for large language models to augment and streamline the work of financial professionals.
GPT-4 achieved a prediction accuracy of 60.35% in determining the direction of future earnings, surpassing the 52.71% accuracy of human analysts. The model also outperformed analysts in terms of F1-score, which balances precision and recall, with GPT-4 scoring 60.90% compared to 54.48% for human analysts.12
The study utilized anonymized financial data from the Compustat database, spanning from 1968 to 2021, and compared GPT-4's performance to human analysts' predictions sourced from the IBES database.1 By removing company names and dates from the standardized financial statements provided to GPT-4, the researchers ensured a fair comparison between the model and human analysts.13
GPT-4's performance was commensurate with advanced machine learning models, such as artificial neural networks (ANNs), specifically designed for earnings predictions. In certain aspects, GPT-4 even outperformed these specialized models, demonstrating its robustness in financial analysis.12
The researchers compared GPT-4 with state-of-the-art ANNs and found that the language model generated performance on par with these narrowly specialized applications. This finding highlights the potential for general-purpose AI to rival or surpass purpose-built models in complex analytical tasks.13
The researchers employed a "Chain of Thought" (CoT) reasoning approach with GPT-4, mimicking the analytical steps a human analyst would take. This method involved identifying changes in financial statements, computing key financial ratios, and synthesizing this information to predict earnings trends.12
The CoT prompts played a pivotal role in helping GPT-4 identify trends, compute financial ratios, and synthesize information akin to a human analyst. This approach allowed GPT-4 to analyze and generate accurate predictions even when given raw financial data devoid of context.13
Despite the promising results, some skepticism remains. Critics have questioned the validity of comparing GPT-4's performance with that of human analysts and specialized ANNs, pointing out potential differences in the complexity of tasks and the models used for comparison.12
The study acknowledges the difficulty in pinpointing exactly how and why GPT-4 performs well, highlighting the challenges in understanding the inner workings of large language models. As AI researcher Matt Holden noted, it is unlikely that GPT-4 could select stocks that outperform broader indexes like the S&P 500.1