Quantifying Faithful Confidence Expression in Large Reasoning Models

Yale NLP

исследования официальный 1 ист. ~1 мин

This Yale NLP paper (arXiv 2606.03969) investigates whether large reasoning models faithfully express their actual uncertainty. The authors compare linguistic confidence signals against three internal uncertainty measures: token probabilities, hidden states, and response sampling consistency. Key findings: (1) reasoning capability does not automatically improve calibration; (2) standard prompting techniques do not transfer to reasoning models; (3) different internal uncertainty measures yield conflicting results, revealing fragility in existing evaluation methodologies.

Почему это важно

As reasoning models are deployed in high-stakes settings, faithful uncertainty communication is safety-critical. The paper establishes that large reasoning models have a distinct, unresolved calibration problem separate from general LLMs.

Важность: 2/5

Verified arxiv paper from Yale NLP; addresses safety-critical calibration failures specific to reasoning models.

reasoning interpretability safety calibration paper

Источники

официальный Quantifying Faithful Confidence Expression in Large Reasoning Models — arXiv