A Simulation Characterizing Out-of-the-Box Stock Trading Behavior of Various LLMs

Dr. Sal Barbosa, from MTSU’s Department of Computer Science, discusses the behavior of LLM in the analysis of financial markets.

The intersection of artificial intelligence and finance has led to a surge of interest as Large Language Models (LLMs) have come on the scene and promise to automate decision-making in many tasks, including stock trading. However, each model, shaped by distinct training data, architectures, and design philosophies, exhibits specific strengths and weaknesses in its decision-making behavior, thereby influencing its performance on the trading task. Therefore, understanding the “personality” of individual LLMs is crucial to ensuring that the “right” model is chosen as the basis for stock trading agents and assistants. This research is a comparative analysis of the trading behavior and performance of five LLMs (llama3.2, mistral, gemma3, phi4, and qwen2.5), and sheds light on the unique characteristics and idiosyncrasies of each model. The experiments, in a custom simulation framework, analyze stock buy and sell trades/recommendations and measure the frequency of various errors, assessing their impact on performance. The experimental design employs these models as released for use in Ollama, via a common prompt and without fine-tuning or retrieval-augmented generation (RAG), and assesses their performance through multiple trials, over trading periods in both rising and declining market conditions. The return on investment is compared across models, and between market conditions, the “personality” of each model is characterized through a set of metrics, and the number and types of errors made by the LLMs are contrasted. The results of this investigation inform the choice of models as agents for automated stock trading/recommendation and create a path for future research in this rapidly evolving area.

Watch the discussion here .

Data Science