What's more, they exhibit a counter-intuitive scaling limit: their reasoning energy raises with issue complexity as many as a point, then declines In spite of acquiring an enough token budget. By evaluating LRMs with their normal LLM counterparts less than equivalent inference compute, we discover three general performance regimes: https://www.youtube.com/watch?v=snr3is5MTiU