Technical Papershelf
A selected list of research papers I found interesting and informative
-
Reasoning Models Struggle to Control Chain of Thought
chain-of-thought reasoning-models artificial-intelligence large-language-modelsMay 2026 -
Stress-Testing Model Specs Reveals Character Differences among Language Models
Proposes a systematic methodology for stress-testing AI constitutions and model specifications by generating value tradeoff scenarios where competing principles cannot be simultaneously satisfied. Evaluating twelve frontier language models across more than 300,000 scenarios, the authors identify over 70,000 cases of significant behavioral divergence. High disagreement between models strongly predicts contradictions and ambiguities in model specifications, revealing how differences in behavioral guidelines contribute to distinct model "character" and value prioritization patterns.
large-language-models ai-alignment model-specifications constitutional-ai ai-safety evaluation behavioral-analysis value-alignmentOct 2025