Technical Papershelf

A selected list of research papers I found interesting and informative

  • Reasoning Models Struggle to Control Chain of Thought

    Read Paper

    Read Notes

    chain-of-thought reasoning-models artificial-intelligence large-language-models
    May 2026
  • Stress-Testing Model Specs Reveals Character Differences among Language Models

    Proposes a systematic methodology for stress-testing AI constitutions and model specifications by generating value tradeoff scenarios where competing principles cannot be simultaneously satisfied. Evaluating twelve frontier language models across more than 300,000 scenarios, the authors identify over 70,000 cases of significant behavioral divergence. High disagreement between models strongly predicts contradictions and ambiguities in model specifications, revealing how differences in behavioral guidelines contribute to distinct model "character" and value prioritization patterns.

    Read Paper

    Read Notes

    large-language-models ai-alignment model-specifications constitutional-ai ai-safety evaluation behavioral-analysis value-alignment
    Oct 2025