Shi Z., Wang J., Guo Y., Xie X., Hu Q., Cordy M., Chen S., Papadakis M., Le Traon Y., Lyu Y.
IEEE Transactions on Software Engineering, vol. 52, n° 4, pp. 1515-1530, 2026
Distribution shift poses a significant challenge for deep learning (DL) models in source code analysis, where test data often follows different distributions from training data, leading to unexpected performance degradation and hindering the practical usage of code models. While our previous work CodeS introduced the first benchmark for studying distribution shift in source code analysis, it has limitations in covering more fine-grained types of real-world distribution shifts and lacks the study of the effectiveness of shift mitigation strategies. In this paper, we present CodeS++, an enhanced benchmark that addresses these limitations through two key contributions, (1) expanded shift types, we design more fine-grained distribution shift types, that is, shift introduced by different program element complexity (e.g., different node number of control flow graphs), and (2) investigate the usefulness of fine-tuning based shift mitigation techniques, such as Core-Set. Comprehensive experiments on different pre-trained code models demonstrated that code models significantly suffer from distribution shift, out-of-distribution detectors from other domains (e.g., computer vision) do not generalize to source code, and existing fine-tuning based shift mitigation techniques have limited benefits in enhancing the generalization ability of code models. Our findings highlight the need to pay more attention to OOD issues for code models.
