Visual-Tactile Peg-in-Hole Assembly Learning from Peg-out-of-Hole Disassembly

Yongqiang Zhao1, Xuyang Zhang1, Zhuo Chen1, Matteo Leonetti2, Emmanouil Spyrakos-Papastavridis1, Shan Luo1
1Department of Engineering, King's College London 2Department of Informatics, King's College London
Visual-tactile robotic assembly and inverse-task skill transfer

Abstract

Peg-in-hole (PiH) assembly is a fundamental yet challenging robotic manipulation task because successful insertion requires both accurate long-range approach and precise contact-rich alignment. This project studies whether the inverse task, peg-out-of-hole (PooH) disassembly, can provide a safer and easier source of experience for learning PiH skills.

The proposed framework formulates both PooH and PiH as partially observable Markov decision processes in a shared environment with kinematic, visual, and tactile observations. A visual-tactile PooH policy is first trained, then its trajectories are temporally reversed and paired with action randomization to generate expert-like PiH demonstrations. Vision supports the approach phase, while tactile feedback compensates for peg-hole misalignment during contact.

Across diverse peg-hole geometries, the visual-tactile policy achieves lower contact force than single-modality variants and substantially improves insertion success over direct reinforcement learning from scratch, while also transferring effectively from simulation to the real robot.

Key Idea

1. Learn the easier inverse task

PooH disassembly is easier than insertion because it mainly needs to overcome friction, not maintain precise peg-hole alignment throughout the motion.

2. Reverse and enrich demonstrations

PooH trajectories are reversed in time, and action randomization is added so the induced PiH data better covers contact patterns needed for robust insertion.

3. Fuse vision and touch

Vision helps the robot approach the hole reliably, while tactile sensing resolves local contact uncertainty and corrects misalignment during insertion.

Main Results

  • The proposed framework reaches average success rates of 87.5% on seen objects and 77.1% on unseen objects.
  • Compared with direct reinforcement learning that trains PiH from scratch, the method improves success rate by 18.1%.
  • Visual-tactile sensing reduces maximum contact force by 6.4% compared with single-modality variants.
  • Real-world deployment across six object pairs achieves an average success rate of 72.1%.
  • The study uses an ABB YuMi dual-arm platform with GelSight-like tactile sensing and an Intel RealSense camera in both simulation and real-world experiments.

Method Summary

The observation space combines robot kinematics, a 96 x 96 visual image, and a 15-dimensional tactile representation derived from marker flow. Actions are Cartesian end-effector displacements, and both PooH and PiH are trained in the same environment with Soft Actor-Critic.

The full pipeline has three stages: training a PooH policy, generating PiH demonstrations by reversing PooH rollouts and regenerating contact-rich tactile experience, and finally learning a PiH policy with hybrid replay and behavior cloning support. This design improves sample efficiency while keeping the final policy robust to geometry changes and sim-to-real transfer.

Paper PDF

BibTeX

@misc{zhao2026visualtactile,
  title={Visual-Tactile Peg-in-Hole Assembly Learning from Peg-out-of-Hole Disassembly},
  author={Yongqiang Zhao and Xuyang Zhang and Zhuo Chen and Matteo Leonetti and Emmanouil Spyrakos-Papastavridis and Shan Luo},
  year={2026},
  note={Project page: https://sites.google.com/view/pooh2pih},
  url={https://rancho-zhao.github.io/PooH2PiH/}
}