Overview

Undergraduate thesis project exploring a Reinforcement Learning approach to the Travelling Salesman Problem with a hardness-adaptive curriculum to stabilize training.

Approach

  • Designed a curriculum that adjusts graph hardness based on agent performance.
  • Implemented RL baselines and evaluation pipeline to compare sample efficiency and solution quality.
  • Integrated 1.5-approximation heuristics into the reward to guide exploration.

Results

  • Improved training stability over greedy roll-out baselines.
  • Demonstrated higher-quality tours on medium-sized TSP instances.

Poster & Certificate

Updated: