Will Scaling Improve Social Simulation with LLMs? A Study of 85 Models
Stanford / Columbia / Tsinghua
An empirical study using 85 transformer models up to 70B parameters across three task families: opinion modeling, behavioral simulation, and longitudinal forecasting. Scaling generally helps for well-represented populations but consistently fails to improve calibration with human cognitive biases such as risk aversion; underrepresented demographic groups see substantially slower gains.
Why it matters
Clear empirical finding that scale does not fix bias calibration or minority-group fidelity in social simulation — an important boundary for the growing application area of using LLMs as stand-ins for human survey respondents.
Importance: 2/5
85-model study (Stanford/Columbia/Tsinghua) identifying scaling limits for social simulation and bias calibration