MimicDroid's proposed few-shot learning benchmark for humanoid manipulation.
Evaluation is structured into three levels: L1, L2, and L3, which progressively increase in difficulty by varying objects and environments, enabling systematic assessment of few-shot learning.
Each level is evaluated on a set of four manipulation tasks.
Below, we visualize the demonstrations for each task in the following videos.
Manipulation tasks with objects it encountered during training in the environments.
This level evaluates the robot's ability to generalize to new object positions.
Close Left Cabinet Door
Pick and Place from Sink to Cabinet
Pick and Place from Sink to Right Counter Plate
Turn On Faucet
Manipulation tasks with novel objects it has not encountered during training.
This level evaluates the robot's ability to adapt to novel objects using few demonstrations.
Close Left Cabinet Door
Close Right Cabinet Door
Pick and Place from Sink to Cabinet
Pick and Place from Sink to Right Counter Plate
Manipulation tasks with novel objects in new kitchen environments it has not encountered during training.
This level tests the robot's ability to generalize to different background, furniture layouts, and novel objects.
Close Left Cabinet Door
Pick and Place from Sink to Microwave Top
Pick and Place from Sink to Right Counter Plate
Turn On Faucet
We collect human play data in simulation by interacting freely in randomized kitchen environments using a spacemouse.
Each play session lasts 20 minutes and captures diverse interactions covering a broad range of tasks, object configurations, and manipulation behaviors.
In total, we gather 8 hours of simulated play, amounting to 320k timesteps.
Below, we visualize random clips of play data from our training dataset.