For each source–target pair we split the transfer return into source quality, target difficulty, and dissimilarity:
\[J(\pi_x, y) = f(x) + g(y) + h(x,y) + C. \]
After training on contexts \(x_{1:k}\) we estimate target difficulty by subtracting the empirical mean over trained policies, \(\overline{J}(\pi_x,y) = J(\pi_x,y) - \mathbb{E}_{x' \in x_{1:k}}[J(\pi_{x'},y)]\), and apply two detection criteria:
\[ \texttt{Mountain} \Longleftrightarrow \begin{cases} \operatorname{std}_{x} \overline{J}(\pi_x,x) < \mathbb{E}_{x}\big[\operatorname{std}_{y} \overline{J}(\pi_x,y)\big], \\ \text{sgn}(\theta_{\text{left}}^d) = \text{sgn}(\theta_{\text{right}}^d) \quad \forall d, \end{cases} \]
where \(\theta_{\text{left/right}}^d\) are slopes from a linear regression on the signed L1 context differences in dimension \(d\). Passing both tests means policy quality is almost constant and dissimilarity behaves like a distance metric, so the CMDP can be treated as Mountain. Otherwise training performance is heterogeneous and we rely on GP modeling.