Valerio Capraro: Using Large Language Models (LLMs) in Economics

Xiaoyue Sun
Aug 11
4 min read

In May 2024, the Saint Pierre International Security Center (SPCIS) launched the “Global Tech Policy at the Forefront” series, featuring conversations with leading experts on the impact of emerging technologies—such as AI, blockchain, biometrics, and robotics—on global governance and public policy.

On July 2, 2025, we had the pleasure of interviewing Prof. Valerio Capraro on the fascinating topic of LLMs’ implications for economics and LLM-based simulations in economics. Prof. Capraro is now an associate professor at the University of Milan-Bicocca. He got his PhD in mathematics but later turned into a social scientist. As an economist and psychologist, he uses behavioral experiments, mathematical modeling, and numerical simulations to study cooperation, honesty, and other moral behaviors.

Below is the transcription of the interview.

SPCIS：Professor Valerio Capraro, it’s wonderful to connect with you. Your research suggests a shift from outcome-based to language-based utility functions in game theory. How do LLMs formalize this shift, and in what ways do they challenge or complement traditional game theory assumptions (e.g., rational self-interest) when modeling human behavior?

Prof. Valerio Capraro：LLMs don’t actually formalize the shift themselves. The shift from outcome-based to language-based utility functions can be expressed with standard mathematical tools, by adding a component that reflects how language influences decisions. What’s new with LLMs is that they allow us to quantify this language-based component for the first time. In the past, we knew language mattered, but we didn’t have a way to measure it. Now we do. In my upcoming book The Economics of Language: How LLMs can reshape behavioural economics (Cambridge University Press), I show several examples of how LLMs can help capture the emotional and normative weight of words in economic decision-making.

It’s also important to clarify that language-based utility functions do not challenge the rationality assumption of game theory. Agents still maximize their utility. What they do challenge is the consequentialist assumption, the idea that utility depends only on the economic consequences of available actions. This assumption underlies most behavioural economics models, from social preferences to prospect theory. Language-based utility functions suggest a richer framework, where words—not just payoffs—shape human behaviour.

SPCIS: You found LLMs systematically overestimate human altruism. Beyond training data limitations, what unique architectural factors contribute to this bias?

Prof. Valerio Capraro: There is converging evidence from multiple angles that some biases in LLMs don’t come just from the data: they are introduced during the fine-tuning process, especially during “reinforcement learning with human feedback”, where human annotators evaluate the model’s responses, and the model learns how to improve their responses. In a recent paper, we found that GPT tends to overemphasize violence against women, but only when that violence is described using words linked to the gender parity debate, like “abuse”, and not with words like “torture,” even if the actual harm is greater. This suggests that fine-tuning can shift a model’s sensitivity based on social or political salience. Similarly, I think the overly optimistic view of human altruism in LLMs may reflect attempts to make the model appear more “politically correct” or morally desirable during fine-tuning.

SPCIS: Could you illustrate how such biased predictions might lead to flawed policy decisions - for example, in designing social welfare programs or charitable giving campaigns?

Prof. Valerio Capraro: Sure. Many social programs depend on voluntary participation. Think of NGOs or community-based initiatives where citizens are expected to contribute time, effort, or donations. If policymakers rely on LLMs that overestimate how generous or cooperative people are, they might design programs that assume too much engagement. When that engagement doesn’t materialize, the programs may underperform or even fail.

SPCIS: As economists are increasingly using LLMs to simulate human subjects in experiments, what methodological safeguards would you recommend to prevent harmful consequences from their quantitative inaccuracies? How should research designs be adapted when incorporating LLM-generated data?

Prof. Valerio Capraro: Personally, I’m still skeptical about using LLMs as a substitute for human participants. In our own research, we found that GPT-4’s predictions of human altruism were not only too optimistic, but also uncorrelated with how people actually behave. So it wasn’t just an upward shift in predicted altruism, it was a fundamental mismatch. That suggests LLMs shouldn’t be trusted to simulate human responses without strong empirical validation. I would advise researchers to be very cautious, to triangulate LLM predictions with actual human data, and to avoid over-relying on LLM predictions, especially when the stakes are high.

SPCIS: Based on your research, what key principles should guide the design of human-AI collaboration systems for economic decision-making? What governance structures or technical adjustments (like confidence intervals or bias disclosures) are needed for helping users better evaluate LLM-generated advice in high-stakes economic decisions?

Prof. Valerio Capraro: This isn’t my core area of research, but my sense is that current LLMs still lack the level of personalization needed for high-stakes decisions. They don’t truly understand the user’s goals, values, or moral preferences. And they also don’t understand the preferences of other actors involved. That’s a big limitation. Still, I think they can be helpful in shaping how we think about problems, for example, by highlighting angles we might not have considered. For now, they work well as general-purpose assistants, but they’re not yet suited for specific, context-sensitive decisions where a deep understanding of the actors’ preferences is essential.

Valerio Capraro: Using Large Language Models (LLMs) in Economics

SPCIS: You found LLMs systematically overestimate human altruism. Beyond training data limitations, what unique architectural factors contribute to this bias?

SPCIS: Could you illustrate how such biased predictions might lead to flawed policy decisions - for example, in designing social welfare programs or charitable giving campaigns?

SPCIS: As economists are increasingly using LLMs to simulate human subjects in experiments, what methodological safeguards would you recommend to prevent harmful consequences from their quantitative inaccuracies? How should research designs be adapted when incorporating LLM-generated data?

Recent Posts

Comments