LLMsMedium impactFor DevarXiv LLMs · June 12, 2026
AdaSR: Adaptive Streaming Reasoning with Hierarchical Relative Policy Optimization
AdaSR introduces an adaptive streaming reasoning framework for large models that reason during input stream and finalize decisions with a new policy optimization method, improving reasoning accuracy and efficiency.
Signal strength3.4/5·arXiv LLMs
AdaSR introduces an adaptive streaming reasoning framework for large models that reason during input stream and finalize decisions with a new policy optimization method, improving reasoning accuracy and efficiency.
TL;DR
AdaSR introduces an adaptive streaming reasoning framework for large models that reason during input stream and finalize decisions with a new policy optimization method, improving reasoning accuracy and efficiency.
What happened
Researchers proposed AdaSR, which enables large reasoning models to perform reasoning dynamically during streaming inputs and final deliberation once complete, optimized via Hierarchical Relative Policy Optimization (HRPO) to balance accuracy, computation, and latency. Code is publicly available.
Why it matters
This approach addresses limitations of traditional static read-then-think paradigms for dynamic streaming data, allowing more flexible, latency-aware reasoning that better fits real-world continuous input scenarios like audio and video streams.
Generating deep dive...
AI-powered analysis takes a few seconds
The bigger picture
AdaSR embodies a significant pivot from static batch reasoning to adaptive streaming inference, reflecting the growing importance of real-time AI systems in areas beyond text, such as audio, video, and sensor data streams. This signal indicates a maturation in AI model design where responsiveness and computational efficiency are considered as first-class objectives rather than afterthoughts. More broadly, hierarchical policy optimization represents a path to fine-grained control over large model execution, hinting at future architectures that modulate their complexity in situ according to resource constraints and task demands. As applications like live language translation, autonomous monitoring, and interactive agents proliferate, AdaSR’s approach points to how AI can better align model behavior with dynamic operational environments. This development also resonates with the trend toward modular, policy-driven AI systems that learn to manage their own inference processes for optimized utility.
Technical deep dive
AdaSR’s core innovation lies in enabling a large reasoning model to process streaming inputs incrementally, performing intermediate reasoning steps that inform whether to continue consuming data or finalize the output. Architecturally, it integrates a hierarchical policy mechanism that operates on two levels: a coarse-level controller decides when to commit to a decision, while a fine-grained policy optimizes incremental read-and-reason actions. HRPO, the reinforcement learning algorithm underpinning this, optimizes the relative improvement of policies across these levels, focusing on balancing three key metrics: accuracy, computational cost, and latency. Implementing AdaSR requires adapting the model’s input pipeline to function in a streaming fashion, designing appropriate reward signals that reflect trade-offs, and training across episodes of partial input sequences. The hierarchical policy architecture introduces additional complexity but offers modularity critical to scaling this approach in diverse environments. Developers should consider this design when deploying large models in environments with variable input rates or strict latency constraints, where static read-then-think can underperform. Leveraging the publicly released code, teams can extend HRPO to multimodal streams and tailor policies to specific SLAs.
Real-world applications
1
Live video surveillance systems that need to identify security threats in real-time while minimizing false positives and computational overhead.
2
Voice-activated smart assistants processing continuous audio commands in noisy environments requiring low-latency interpretation and response.
3
Real-time captioning and translation services for webinars and live events that must stream accurate transcriptions without buffering delays.
4
Autonomous vehicles integrating sensor and camera feeds continuously to update navigation decisions dynamically under fluctuating conditions.
What to do now
Experiment with the AdaSR codebase on simulated streaming datasets relevant to your application to understand latency-accuracy trade-offs.
Design and implement hierarchical policies tailored to your input streams’ frequency and variability, leveraging HRPO for policy optimization.
Benchmark AdaSR against traditional read-then-reason baselines in your streaming AI workloads to quantify real-time performance gains.
Incorporate modular streaming reasoning capabilities during model architecture design phases for future-proofing AI systems with dynamic input handling.