One-Shot anipulation Strategy Learning
by Making Contact nalogies

Yuyao Liu^12*, Jiayuan Mao^1*, Joshua Tenenbaum¹, Tomás Lozano-Pérez¹, Leslie Pack Kaelbling¹

¹Massachusetts Institute of Technology ²Tsinghua University

^* Equal contribution. Work done while Yuyao Liu was a visiting student at MIT.

Accepted by ICRA 2025

PDF arXiv Video Code

We propose MAGIC, a one-shot learning method that combines global shape matching by pretrained visual features with local curvature analysis, enabling fast generalization of manipulation strategies to novel objects.

Abstract

We present a novel approach, MAGIC (manipulation analogies for generalizable intelligent contacts), for one-shot learning of manipulation strategies with fast and extensive generalization to novel objects. By leveraging a reference action trajectory, MAGIC effectively identifies similar contact points and sequences of actions on novel objects to replicate a demonstrated strategy, such as using different hooks to retrieve distant objects of different shapes and sizes. Our method is based on a two-stage contact-point matching process that combines global shape matching using pretrained neural features with local curvature analysis to ensure precise and physically plausible contact points. We experiment with three tasks including scooping, hanging, and hooking objects. MAGIC demonstrates superior performance over existing methods, achieving significant improvements in runtime speed and generalization to different object categories.

Walkthrough Video

Overview of MAGIC

(a) We first extract contact points from the reference trajectory. (b) Then, we compute a global and local contact point matching score to select candidate contact points on novel objects. (c) The generated contact points will be used for motion retargeting or motion planning, and the final motion will be simulated and verified by a physical simulator. Our pipeline combines data-driven (pretrained DINOv2 visual features) and analytic (curvature) approaches from one single demonstration, without any additional training or task specific datasets.

Functional Alignment Visualization for Various Objects in the Wild

We illustrate the contact points found by DINOv2 and functional alignment established by curvature on various visually different objects below.

Real-World Experiments

Hanging (4x)

Additional Experiment: Hanging Alphabets (3x)

Hooking (4x)

Additional Experiment: Using Alphabets as Hooks (3x)

Simulation Experiments

Scooping

Demonstration

Hanging

Demonstration

Additional Experiment: Hanging Alphabets

Hooking

Demonstration

Additional Experiment: Using Alphabets as Hooks

Failure Case Analysis

The clamp slips out of the gripper.

The most prevalent failure occurs when the grasping point is positioned near the tool's end, particularly with slippery materials like plastics or polished surfaces, as well as non-flat objects. This failure mode constitutes 33% of all failures. While our pipeline effectively verifies trajectory feasibility in simulation, discrepancies in physical properties such as weight distribution and friction coefficients can lead to real-world failures.

The cup collides with the mug tree.

Another significant failure, also accounting for 33% of the total, arises from the partial point cloud captured by our two RGBD cameras, which can result in imperfect mesh reconstruction. Consequently, the motion planner may suggest trajectories that lead to collisions in practice.

The gripper misses grasping the cup.

Despite efforts in outlier removal and denoising, the reconstructed point cloud can still exhibit noise, leading to unrealistic surface normal estimations and "hallucinated" object parts, which impede the antipodal grasp sampler. This issue represents approximately 14% of all failures.

MAGIC proposes wrong contact points.

Lastly, global contact matching using DINOv2 can prove suboptimal in the presence of noisy textures. When visual and functional differences between reference and novel objects are pronounced, the pretrained DINOv2 features may struggle to identify suitable global contact points without task-specific guidance, accounting for around 19% of failures.

BibTeX

@inproceedings{liu2025one,
  title = {{One-Shot Manipulation Strategy Learning by Making Contact Analogies}},
  author = {Liu, Yuyao and Mao, Jiayuan and Tenenbaum, Joshua and Lozano-Pérez, Tomás and Kaelbling, Leslie},
  booktitle = {ICRA},
  year = {2025}
}

One-Shot anipulation Strategy Learning by Making Contact nalogies

We propose MAGIC, a one-shot learning method that combines global shape matching by pretrained visual features with local curvature analysis, enabling fast generalization of manipulation strategies to novel objects.

Abstract

Walkthrough Video

Overview of MAGIC

Functional Alignment Visualization for Various Objects in the Wild

Real-World Experiments

Hanging (4x)

Additional Experiment: Hanging Alphabets (3x)

Hooking (4x)

Additional Experiment: Using Alphabets as Hooks (3x)

Simulation Experiments

Scooping

Hanging

Additional Experiment: Hanging Alphabets

Hooking

Additional Experiment: Using Alphabets as Hooks

Failure Case Analysis

BibTeX

One-Shot anipulation Strategy Learning
by Making Contact nalogies