23
March 2026

Daily Robotics Digest

5 curated items from arXiv, industry news, and the community

Executive Summary

This is a test digest containing 5 recent robotics papers from arXiv. This demonstrates that the Daily Observation system is working correctly.

📄

New Research Papers

19 items
6

6D Robotic OCT Scanning of Curved Tissue Surfaces

Suresh Guttikonda, Maximilian Neidhardt, Vidas Raudonis, Alexander Schlaefer

Robotic optical coherence tomography (OCT) scanning of curved tissue surfaces has been limited by existing translational-only scanning approaches, which cannot handle non-planar geometry. This work introduces a new marker for full six-dimensional hand-eye calibration of robot-mounted OCT probes, achieving highly repeatable transformation estimates and enabling consistent scanning of large curved tissue phantoms. This advance unlocks more flexible robotic OCT imaging for clinical and pre-clinical applications.

7

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

Zixuan Wang, Yuxin Chen, Yuqi Liu, Jinhui Ye, Pengguang Chen, Changsheng Lu, Shu Liu, Jiaya Jia

Existing vision-language-action (VLA) models combine instruction interpretation, spatial grounding, and control into a single black-box forward pass, leading to poor spatial precision and limited out-of-distribution robustness. This work proposes VP-VLA, a dual-system framework that decouples high-level reasoning from low-level execution via a structured visual prompting interface. A high-level planner generates spatial anchors overlaid on input images, while a low-level controller uses these prompts to generate precise actions, improving performance on challenging robotic manipulation tasks.

8

Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection

Junhyeok Rui Cha, Woohyun Cha, Jaeyong Shin, Donghyeon Kim, Jaeheung Park

Existing sim-to-real methods for humanoid locomotion rely on fixed finite parameter domain randomization, which fails to capture complex state-dependent reality gaps like nonlinear actuator dynamics. This work introduces a new approach that injects state-dependent perturbations into joint torque inputs during simulation, using neural networks to model complex uncertainties that parametric randomization cannot capture. Experiments show the method produces humanoid locomotion policies with superior robustness to unseen reality gaps in both simulation and real-world deployment.

9

Directional Mollification for Controlled Smooth Path Generation

Alfredo González-Calvin, Juan F. Jiménez, Héctor García de Marina

Smooth path generation from discrete waypoints is a fundamental requirement for stable robot control, but existing mollification methods confine smoothed paths to the convex hull of the original waypoints, preventing exact waypoint interpolation when required. This work introduces directional mollification, a novel extension of mollification that removes the convex hull constraint while retaining the computational efficiency, formal smoothness, and curvature guarantees of existing methods. The approach offers an improved alternative to spline interpolation and optimization-based path smoothing for autonomous and industrial robots.

10

Partial Attention in Deep Reinforcement Learning for Safe Multi-Agent Control

Turki Bin Mohaya, Peter Seiler

Attention mechanisms have shown strong performance in sequential learning tasks, but have not been widely adapted for safe multi-agent autonomous vehicle control. This work applies partial attention to the QMIX multi-agent reinforcement learning framework, allowing each autonomous vehicle to focus only on the most relevant neighboring vehicles during highway merging scenarios. A multi-objective reward function that balances global safety and flow with individual agent interests improves overall performance over baseline deep reinforcement learning methods in SUMO simulations.

11

Memory-Efficient Boundary Map for Large-Scale Occupancy Grid Mapping

Benxu Tang, Yunfan Ren, Yixi Cai, Fanze Kong, Wenyi Liu, Fangcheng Zhu, Longji Yin, Liuyu Shi, Fu Zhang

Traditional high-resolution large-scale occupancy grid mapping requires storing all voxels in the mapped volume, leading to prohibitive memory usage for many robotic applications. This work introduces a novel memory-efficient representation that only stores boundary voxels (occupied and frontier voxels), with free and unknown voxels automatically represented by regions inside and outside the boundary, respectively. The approach drastically reduces memory requirements for large-scale high-resolution mapping without sacrificing accuracy, enabling deployment on resource-constrained robotic platforms.

12

Can a Robot Walk the Robotic Dog: Triple-Zero Collaborative Navigation for Heterogeneous Multi-Agent Systems

Yaxuan Wang, Yifan Xiang, Ke Li, Xun Zhang, BoWen Ye, Zhuochen Fan, Fei Wei, Tong Yang

Existing collaborative navigation frameworks for heterogeneous multi-robot systems typically require extensive training or simulation pre-deployment, limiting their real-world adaptability. This work presents Triple Zero Path Planning (TZPP), a zero-training, zero-prior-knowledge, and zero-simulation collaborative navigation framework that uses a coordinator-explorer architecture with multimodal large language model guidance. Implemented on Unitree G1 humanoid and Go2 quadruped robots, TZPP achieves robust human-comparable efficiency across diverse unseen indoor and outdoor environments, offering a practical path for immediate real-world deployment.

13

BiPreManip: Learning Affordance-Based Bimanual Preparatory Manipulation through Anticipatory Collaboration

Yan Shen, Feng Jiang, Zichen He, Xiaoqi Li, Yuchen Liu, Zhiyu Li, Ruihai Wu, Hao Dong

Many everyday bimanual manipulation tasks require one arm to perform preparatory actions that enable the other arm's final goal-directed grasp or operation, such as pushing an iPad to a table edge before picking it up, but most existing frameworks do not explicitly address this asymmetric collaborative task setting. This work introduces BiPreManip, a visual affordance-based framework that first envisions the final goal action, then generates a sequence of preparatory manipulations for one arm to enable the second arm's operation. The approach advances capabilities for sequential coordinated bimanual manipulation of everyday objects.

14

PRM-as-a-Judge: A Dense Evaluation Paradigm for Fine-Grained Robotic Auditing

Yuheng Ji, Yuyang Liu, Huajie Tan, Xuchuan Huang, Fanding Huang, Yijie Xu, Cheng Chi, Yuting Zhao, Huaihai Lyu, Peterson...

Most robotic policy evaluation relies on binary success rates, which collapse the entire execution trajectory into a single outcome and hide critical qualities like progress, efficiency, and stability. This work proposes PRM-as-a-Judge, a dense evaluation paradigm that uses Process Reward Models to audit robotic policy execution directly from trajectory videos by estimating continuous task progress. The accompanying OPD metric system provides fine-grained insight into execution quality, enabling more detailed diagnosis of policy performance than standard success-rate evaluation.

15

CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation

Mohammad Eslami, Dhanvinkumar Ganeshkumar, Saber Kazeminasab, Michael G. Morley, Michael V. Boland, Michael M. Lin, John...

Robotic-assisted cataract surgery requires accurate real-time semantic segmentation of surgical video, but existing models lack the accuracy and annotation tools required for scalable medical robotic perception development. This work introduces CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2 optimized for real-time segmentation of anterior segment cataract surgery. The work also releases an interactive annotation framework that reduces manual labeling effort for scalable ground-truth creation, and the model demonstrates strong zero-shot generalization to glaucoma trabeculectomy procedures.

16

Auction-Based Task Allocation with Energy-Conscientious Trajectory Optimization for AMR Fleets

Jiachen Li, Soovadeep Bakshi, Jian Chu, Shihao Li, Dongmei Chen

Multi-AMR (autonomous mobile robot) fleet task allocation and trajectory optimization typically does not explicitly account for energy consumption, leading to unnecessary battery use in industrial settings. This work presents a hierarchical two-stage framework that combines sequential auction-based task allocation with energy-conscious trajectory optimization using a physics-based battery model. Large-scale experiments across hundreds of factory scenarios show the framework delivers an average 11.8% energy savings over standard nearest-task allocation, with rescheduling latency under 10ms for dynamic fault and priority handling.

17

SafePilot: A Framework for Assuring LLM-enabled Cyber-Physical Systems

Weizhe Xu, Mengyu Liu, Fanxin Kong

LLM integration into robotic and other cyber-physical systems brings advanced reasoning capabilities, but LLM hallucinations can lead to unsafe or undesirable actions that are not caught by existing system assurance frameworks. This work proposes SafePilot, a hierarchical neuro-symbolic framework that provides end-to-end safety assurance for LLM-enabled cyber-physical systems against attribute-based and temporal task specifications. The framework addresses the core risk of hallucinations in LLM-guided robotics, enabling safer deployment of LLM-powered autonomous systems.

18

A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systems

Seou Choi, Sachin Vaidya, Caio Silva, Shiekh Zia Uddin, Sajib Biswas Shuvo, Shrish Choudhary, Marin Soljačić

While robotic automation has transformed many scientific workflows, high-precision free-space optical system assembly and alignment remains largely manual due to strict spatial and angular tolerances. This work introduces a complete robotics framework for autonomous construction, alignment, and self-recovery of precision optical systems, integrating hierarchical computer vision, optimization routines, and custom end-of-arm tools. The framework demonstrates fully autonomous assembly of a tabletop laser cavity from randomly distributed components, including self-recovery from induced misalignment, opening the door to automated optical experimentation.

19

GaussianSSC: Triplane-Guided Directional Gaussian Fields for 3D Semantic Completion

Ruiqi Xian, Jing Liang, He Yin, Xuewei Qi, Dinesh Manocha

Existing 3D semantic scene completion methods struggle with poor voxel-image alignment and fail to efficiently capture fine-grained geometric details like surface tangency and occlusion asymmetry. This work presents GaussianSSC, a two-stage triplane-guided approach that integrates Gaussian representation benefits into standard voxel grid frameworks without additional memory overhead. On the SemanticKITTI benchmark, GaussianSSC improves occupancy recall by 1.0% and precision by 2.0% over baseline methods, advancing state-of-the-art monocular semantic scene completion for robotic perception.