SLAM Data Pipeline

Process your human-collected egocentric video with modern SLAM. Get 6DOF camera poses and 3D sparse reconstructions ready for robot learning.

Upload Video

Supports MP4, MOV, AVI. Maximum file size: 500MB
CNN mode is more robust for low-texture indoor scenes
Lower = faster processing
Resize to save processing time
Upload your camera calibration YAML file from the calibration tool

Related Tools

Need to calibrate your camera first? Use our camera calibration tool to get accurate intrinsics before running SLAM.

About This Pipeline

This SLAM pipeline processes egocentric video collected from humans (e.g., wearable cameras) and produces a ready-to-use dataset for human-to-robot transfer learning.

What you get:

  • 6DOF camera pose (x, y, z + quaternion) for each frame
  • Sparse 3D point cloud of the environment
  • Output in LeRobot format - compatible with existing robot learning code
  • All original video data preserved with added SLAM metadata

Available modes:

🗺️ CNN-based (DINOv2 + LoFTR)

  • ✓ Better for low-texture indoor scenes
  • ✓ More robust matching for repetitive patterns
  • ✓ Recommended for human demonstration videos
  • Requires GPU for reasonable speed

📍 ORB-SLAM3 (Traditional)

  • ✓ Mature open-source implementation
  • ✓ Faster on CPU
  • Good baseline for comparison
  • Can struggle with low-texture areas

Source code available on GitHub. The pipeline is open-source and can be run locally.