TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Geonwoo Cho, Jaegyun Im, Jihwan Lee, Hojun Yi, Sejin Kim, Sundong Kim

Gwangju Institute of Science and Technology

Method Overview

TRACED is an unsupervised environment design (UED) framework that constructs adaptive curricula by prioritizing tasks based on both how challenging they are and how much they transfer knowledge to other tasks.
TRACED refines regret-based curriculum design. Beyond value-based proxies, it introduces a transition-aware regret approximation that accounts for dynamics prediction error, enabling more accurate estimation of task difficulty and faster curriculum ramp-up.
TRACED explicitly models cross-task transfer. A lightweight co-learnability metric measures how training on one task reduces the difficulty of others, allowing the curriculum to favor tasks that accelerate overall learning progress.
TRACED is effective and sample-efficient. Across MiniGrid navigation and BipedalWalker terrains, TRACED consistently outperforms strong UED baselines (ACCEL, PLR⊥, ADD, CENIE), achieving superior zero-shot generalization with roughly half the training budget.

Experiments

1. MiniGrid Mazes

We train a single PPO agent with unsupervised environment design on procedurally generated MiniGrid mazes, where TRACED adaptively prioritizes training tasks. Generalization is evaluated via zero-shot transfer to selected mazes with varying layouts and difficulty.

Simple Crossing

Small Corridor

Large Corridor

Four Rooms

Sixteen Rooms

Sixteen Rooms (Fewer Doors)

Maze

Maze 2

Maze 3

Labyrinth

Labyrinth 2

Perfect Maze (Medium)

2. BipedalWalker Terrains

We train a single PPO agent using unsupervised environment design on procedurally generated BipedalWalker terrains, where each task is defined by continuous terrain parameters such as gaps, stairs, stumps, and surface roughness. TRACED adaptively prioritizes training terrains. Generalization is evaluated via zero-shot transfer to held-out terrains with distinct obstacle configurations and difficulty levels.

Basic

Hardcore

Pit Gap

Stairs

Stumps

Roughness

3. How do Curricula Evolve Over Time?

To understand how TRACED constructs curricula over time, we visualize the evolution of training levels generated by the teacher during learning. Rather than focusing solely on final performance, this analysis reveals how task difficulty and structure adapt in response to the agent’s learning progress.

Across both MiniGrid and BipedalWalker, TRACED progressively increases environment complexity, transitioning from simple, easily solvable tasks to more challenging and diverse ones. This behavior reflects the combined effect of transition-aware regret, which drives difficulty ramp-up, and co-learnability, which favors tasks that induce transferable learning.

MiniGrid Curriculum Evolution

Mazes evolve from simple layouts to longer paths and denser obstacles as training progresses.

BipedalWalker Curriculum Evolution

Terrains gradually introduce larger gaps, higher stairs, and rougher surfaces, matching the agent’s improving locomotion capability.