Welcome to IndEgo, a NeurIPS 2025 Datasets & Benchmarks Track accepted dataset and open-source framework for industrial egocentric vision, designed to support training, real-time guidance, process improvement, and collaboration.
🎥 Industrial Scenarios
📘 About
IndEgo introduces a multimodal egocentric + exocentric video dataset capturing common industrial activities such as assembly/disassembly, inspection, repair, logistics, and woodworking.
It includes 3,460 egocentric videos (~197h) and 1,092 exocentric videos (~97h) with synchronised eye gaze, audio narration, hand pose, motion, and semi-dense point clouds.
IndEgo enables research on:
- Procedural & collaborative task understanding
- Mistake detection and process deviation recognition
- Reasoning-based Video Question Answering (VQA)
⚙️ Technology
IndEgo combines:
- Egocentric Computer Vision for context-aware task understanding
- Vision-Language Models (VLMs) for multimodal reasoning
- Smart Glasses Integration for on-site, real-time assistance
🎬 IndEgo Dataset Multimodality
🚀 Try It: No Setup Required
Run IndEgo’s core logic directly in your browser with Google Colab — no installation needed.
🧩 Citation
If you use IndEgo in your research, please cite our paper:
🏆 Acknowledgments & Funding
This work is funded by the German Federal Ministry of Research, Technology and Space (BMFTR) and the German Aerospace Center (DLR) under the KIKERP project (Grant No. 16IS23055C) in the KI4KMU program. We thank the Meta AI team and Reality Labs for the Project Aria initiative, including the research kit, the open-source tools and related services. The data collection for this study was carried out at the IWF research labs and the test field at TU Berlin. Lastly, we sincerely thank the student volunteers and workers who participated in the data collection process.