Technical Papershelf

A selected list of research papers I found interesting and informative

Reasoning Models Struggle to Control Chain of Thought

Read Paper

Read Notes

chain-of-thought reasoning-models artificial-intelligence large-language-models

May 2026
Stress-Testing Model Specs Reveals Character Differences among Language Models

Proposes a systematic methodology for stress-testing AI constitutions and model specifications by generating value tradeoff scenarios where competing principles cannot be simultaneously satisfied. Evaluating twelve frontier language models across more than 300,000 scenarios, the authors identify over 70,000 cases of significant behavioral divergence. High disagreement between models strongly predicts contradictions and ambiguities in model specifications, revealing how differences in behavioral guidelines contribute to distinct model "character" and value prioritization patterns.

Read Paper

Read Notes

large-language-models ai-alignment model-specifications constitutional-ai ai-safety evaluation behavioral-analysis value-alignment

Oct 2025