When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair (ASE 2025 - Journal-First Track)

Who

Wenqiang LUO, Jacky Keung, Boyang Yang, He Ye, Claire Le Goues, Tegawendé F. Bissyandé, Haoye Tian, Xuan-Bach D. Le

Track

ASE 2025 Journal-First Track

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 17 Nov 2025 11:20 - 11:30 at Grand Hall 1 - Program Repair 1

Abstract

While Large Language Models (LLMs) have shown remarkable potential for automated program repair (APR), their effectiveness is often limited by the availability of high-quality training data. Proprietary industrial codebases are valuable datasets that can be used to enhance model performance but are inaccessible due to data privacy concerns, creating a major challenge for collaborative software development.

Despite that federated learning has emerged as a decentralized privacy-preserving solution, the prevailing paradigm in software engineering research remains centralized, requiring sensitive data to be pooled for model training, which is unacceptable during collaboration. Existing studies overlook challenges in fine-tuning LLMs for generative code-related tasks like program repair, particularly with real-world code heterogeneity in multi-organizational settings. They focus on labeled-data tasks, ignoring feature-skewed generation tasks common in software development, and evaluate only a narrow range of LLM architectures and sizes. Given the value of proprietary codebases, robust privacy is essential in collaborative, distributed development. Moreover, federated approaches to harness collective knowledge while preserving privacy remain largely unexplored.

To address the gap, we investigate federated learning as a privacy-preserving approach for fine-tuning LLMs on proprietary and decentralized data to boost collaborative software development and maintenance. We conduct a comprehensive empirical study to examine the effectiveness of federated learning for program repair to provide practical insights for real-world collaborative software development. Our study makes the following main contributions:

An empirical study on federated fine-tuning of LLMs for program repair, demonstrating its feasibility and effectiveness of fine-tuning LLMs while {\bf preserving data privacy} in decentralized and collaborative software development.
Analysis of federated fine-tuning’s impact on the generative code-related task (i.e., program repair), contrasting with prior federated learning work on discriminative tasks and revealing insights for other generative applications.
Evaluation of a wide range of code LLMs in federated program repair, providing insights into the suitability and practicality of different LLMs in federated learning.
Investigation of heterogeneous code’s effects on LLM repair capabilities in federated settings, illuminating the robustness and adaptability of LLMs in handling Non-IID data in decentralized environments.
Assessment of various federated learning algorithms’ impact on LLM-based bug fixing, providing insights into the trade-offs for federated algorithm selection in program repair.

Wenqiang LUO

City University of Hong Kong

Jacky Keung

City University of Hong Kong

Hong Kong SAR China

Boyang Yang

Yanshan University

China

He Ye

University College London (UCL)

United Kingdom

Claire Le Goues

Carnegie Mellon University

Tegawendé F. Bissyandé

University of Luxembourg

Luxembourg

Haoye Tian

Aalto University

Finland

Xuan-Bach D. Le

University of Melbourne

Australia