DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
DeepSeek-V4系列预览版论文中文翻译:1.6T参数MoE模型,支持百万Token上下文,引入CSA/HCA混合注意力、mHC超连接和Muon优化器。
Insights from the XRollout community
DeepSeek-V4系列预览版论文中文翻译:1.6T参数MoE模型,支持百万Token上下文,引入CSA/HCA混合注意力、mHC超连接和Muon优化器。
Vincent Sitzmann 关于计算机视觉未来的深度思考:传统的中间表示(如三维重建、分割掩码)将变得过时,计算机视觉的未来是作为端到端感知-动作循环的一部分。
深度解读 CVPR 2025 Best Paper VGGT 及其后续工作 SwiftVGGT。VGGT 通过单次前向传播同时输出相机参数、深度图、点云和点跟踪,精度超越传统优化方法。SwiftVGGT 在此基础上通过单步 SVD 和内置回环检测,将大规模场景重建速度提升 3 倍,且无需任何训练。
A deep dive into the two major strategic paths in the embodied AI industry: Cerebellum-First vs. Brain-First, and the companies leading each camp.
LingBot-Map 是一种基于 Transformer 的前馈式三维基础模型,实现了超长序列的高精度、实时单目三维重建与位姿估计。
Physical Intelligence 最新论文 π0.7 中文全文翻译 - 一个5B参数的可引导通用机器人基础模型,展现出组合泛化能力,能够开箱即用地执行复杂灵巧任务,实现零样本跨具身迁移。
本文是Physical Intelligence最新论文π0.7的完整中文翻译。π0.7是一个可引导的通用机器人基础模型,能够在开箱即用情况下执行高度灵巧的长周期任务,实现零样本跨具身迁移,并能通过语言指导学习新任务。
NVIDIA 研究员将缩放定律(Scaling Law)应用到人形机器人全身控制,通过将模型从 1.2M 放大到 42M 参数、数据集扩大到 1 亿帧(700 小时),得到一个通用人形机器人基础控制器,支持多种输入接口(VR遥操作、视频、VLA模型)。
《π0.7: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities》深度解读 论文信息 - 作者:Bo Ai, Ali Amin, ..., Sergey Levine 等(Physical Intelligence 团队) - 机构:Physical Intel…
本文解读了arXiv 2603.15618论文,该论文揭示了VLA模型中深层视觉敏感性下降的关键问题,并提出DeepVision-VLA框架,在模拟和真实任务上分别超出SOTA 9.0%和7.5%。
When we build a robot or train a new model, we face a fundamental question: how do we verify that it actually works safely and reliably across all the conditions it might encounter…
"Autobots, Roll Out!" — Optimus Prime The name "XRollout" carries dual meaning. It honors the iconic rallying cry of Optimus Prime from Transformers—a call to action, transformatio…
This article breaks down the MEM (Multi-scale Embodied Memory) approach from the Physical Intelligence (PI) research project. MEM enables robots to handle long-horizon tasks (up to…
Starting from the human brain. Human conscious thinking primarily consists of five basic functions: understanding, decision-making, recollection, memory, and inhibition. These func…
In open-source robotics, we face a classic chicken-and-egg problem: 1. We need more contributors to collect data, fix bugs, write documentation, and share knowledge 2. But contribu…
XRollout was born from a simple but powerful belief: robotics should be open, accessible, and community-driven. We believe that the future of robot intelligence shouldn't be locked…
Breakdown of Multi-scale Embodied Memory (MEM) - enabling robots to handle long-horizon tasks by remembering what they've done.
Incentivizing contribution with a credit system where you earn by contributing and redeem for platform resources.
How community feedback closes the data loop and makes robots smarter in real-world scenarios.
From prefrontal cortex vs basal ganglia to why both vision and language are indispensable in VLA.
Our complete learning philosophy — four pillars of robot acquisition and the hierarchical data pyramid.
Our mission - why we started XRollout and what we believe. Robotics should be open, accessible, and community-driven.
Simultaneous Localization and Mapping isn't just a robotics algorithm—it's a deep meditation on what memory actually is. When you build SLAM, you're actually building a memory syst…
Physical Intelligence's π project recently introduced MEM (Memory-based Manipulation), bringing memory architectures to the forefront of robot learning. MEM has two key innovations…
本文在公开TUM数据集上实证对比了CNN-SLAM (DINOv2+LoFTR) vs 传统ORB-SLAM,详细分析了两种方案的优缺点和适用场景。
第一篇文章,测试一下,激动!!!
test