Advances and Open Problems in Federated Learning

Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, +49 more

Foundations and Trends® in Machine Learning·2020·Computer Science·4,393 citations

7 min read

Read the full paper at DOI or on arxiv

TL;DR

Federated learning is best understood as a full-stack setting where optimization, communication, privacy, security, fairness, and systems constraints interact.

Briefing Cornell Notes

Briefing

This paper, “Advances and Open Problems in Federated Learning” (Foundations and Trends® in Machine Learning, 2020), is a broad survey and research roadmap rather than a single empirical study. Its central “research question” is: what are the defining characteristics of federated learning (FL), what progress has been made across the main subareas, and—most importantly—what open problems remain that are likely to matter for both theory and real-world deployments? This matters because FL is now used in production settings (e.g., mobile keyboard and messaging features), yet the setting differs sharply from standard centralized training: data are decentralized, clients are unreliable and intermittently available, communication is constrained, and privacy and security requirements are first-order.

The paper frames FL as a machine learning setting where many clients collaborate under orchestration of a central server while keeping raw data local. A key contribution is the paper’s operational definition emphasizing “focused updates” and early aggregation: clients send only narrowly scoped updates needed for the learning objective, and aggregation happens as early as possible to support data minimization. It distinguishes cross-device FL (massively many, unreliable mobile/IoT clients) from cross-silo FL (a small number of organizations, more reliable, often with different incentives and constraints). It also situates FL among related paradigms such as fully decentralized peer-to-peer learning and split learning.

Methodologically, the paper synthesizes prior work and organizes it into a taxonomy of challenges and solution directions. It uses theoretical convergence results and privacy/security formalisms as anchors, but it does not introduce new experiments or a new dataset. Instead, it provides: (i) a canonical FL training process template (client selection, broadcast, local computation, aggregation, server update), (ii) a taxonomy of non-IID data regimes, (iii) a synthesis of optimization convergence-rate results for IID vs non-IID settings, (iv) a structured overview of privacy technologies (MPC, homomorphic encryption, TEEs, secure aggregation/shuffling, differential privacy variants), (v) an attack/failure-mode landscape (poisoning, backdoors, evasion, Byzantine failures, system failures), (vi) fairness and bias considerations (including fairness without sensitive attributes), and (vii) system-level challenges (deployment, monitoring, bias induced by device availability, tuning, and on-device runtime).

Across these sections, the paper highlights several concrete quantitative findings from the literature it surveys. For example, in the IID convex setting under bounded stochastic gradient variance, it summarizes convergence-rate upper bounds for federated averaging/local SGD and related baselines, emphasizing that federated methods can match the “statistical” term but often have a suboptimal “optimization” term relative to accelerated minibatch SGD. It also provides scaling heuristics: in typical cross-device regimes, the number of clients per round can be on the order of hundreds while the number of rounds can be very large, implying that the number of local steps per round often must be small to preserve convergence guarantees. In the non-IID setting, it reports that achieving the IID-like rate requires stricter constraints on the number of local updates: where IID analysis allows local steps up to roughly a certain order, non-IID analysis typically requires smaller local-update budgets (the paper states that to achieve an error bound scaling like $1/ T K N$ under non-convex objectives, the number of local updates $K$ should be smaller than $T^{1/3} / N$ rather than the IID threshold $T / N^{3}$ ). It also provides a table of representative non-IID convergence rates for methods such as FedProx, SCAFFOLD, and others, along with the assumptions used (e.g., bounded inter-client gradient variance, bounded optimal objective difference, bounded gradient dissimilarity).

In privacy, the paper is explicit about formal definitions and mechanisms. It introduces differential privacy (DP) with parameters $(ε, δ)$ and gives the DP inequality, then maps FL threat models to “what” is computed (privacy-preserving disclosure) and “how” it is computed (secure computation and verifiability). It distinguishes user-level DP (adjacency by adding/removing all records of a user) from record-level DP, and it explains why FL often needs user-level guarantees. It also discusses privacy accounting challenges in cross-device FL with dynamic eligibility and dropout, and it highlights that distributed DP mechanisms often require discrete noise distributions that complicate standard DP accounting tools like the moments accountant.

In security/robustness, the paper’s quantitative specificity is more limited because it is a survey, but it still cites concrete numeric examples from the literature. For instance, it notes that targeted backdoors can be introduced even with anomaly detectors, and it cites a result that if 10% of devices participating in federated learning are compromised, a backdoor can be introduced by poisoning model updates sent to the service provider (attributed to Bhagoji et al.). It also characterizes adversaries along multiple axes (attack vector, model inspection capability, collusion, participation rate, adaptability), which is crucial for interpreting which defenses apply.

Limitations are inherent to the paper’s survey nature: it does not provide a unified new algorithm, dataset, or experimental evaluation. Instead, it acknowledges that FL research is difficult to validate because many real-world constraints (client availability, communication topology, privacy-preserving evaluation, and non-IID structure) are hard to reproduce. It also emphasizes that simulations must be carefully justified: success on a proxy dataset does not automatically imply success in the real deployment objective. Additionally, because the paper is interdisciplinary, it necessarily leaves some topics less quantified than others (e.g., fairness metrics and system-induced bias quantification are discussed as open problems rather than resolved).

Practical implications are extensive. For ML researchers, the paper provides a checklist for specifying the FL setting precisely (cross-device vs cross-silo; client statefulness; availability patterns; non-IID regime), for reporting where computation happens and what is communicated, and for using privacy and communication efficiency as first-order evaluation criteria even in simulations. For system builders, it highlights that deployment, monitoring, and debugging are major bottlenecks, and that device availability constraints can induce bias. For policy and product stakeholders, it clarifies that FL improves privacy primarily through data minimization but does not automatically guarantee privacy without formal mechanisms (DP, secure aggregation, etc.), and that privacy guarantees depend on threat models.

Overall, the paper’s core message is that federated learning is not a single algorithmic trick but a full stack of interacting constraints—optimization, communication, privacy, security, fairness, and systems—that create new failure modes and new research frontiers. The most important finding, in the sense of the paper’s contribution, is the structured mapping from FL’s defining constraints to the open problems that remain unsolved across these layers, along with concrete pointers to the theoretical and practical tools that are most promising to combine next.

Cornell Notes

This survey paper defines federated learning’s core constraints and organizes recent advances across optimization, privacy, security, fairness, and systems. It emphasizes that FL is an interdisciplinary stack and provides a detailed taxonomy of open problems, including non-IID learning, privacy accounting under dropout, robustness to poisoning/backdoors, and fairness when sensitive attributes are unavailable.

What is the paper’s central aim and “research question” given that it is a survey?

To identify federated learning’s defining characteristics and synthesize advances while enumerating major open problems that matter for both theory and real-world deployments.

How does the paper define federated learning in operational terms?

Multiple clients collaborate under a central server while keeping raw data local; clients send focused updates and aggregation happens early to support data minimization.

What distinguishes cross-device from cross-silo federated learning in the paper?

Cross-device FL has massive client populations, unreliable availability, and communication constraints; cross-silo FL has fewer, more reliable clients (often organizations) and may involve feature-partitioned data and different incentive/privacy constraints.

What is the canonical FL training template the paper uses?

Client selection, broadcast of model weights/training program, local client computation (e.g., SGD), server aggregation of updates (possibly dropping stragglers), and server model update; repeated for $T$ rounds.

What taxonomy of non-IID data does the paper provide?

Non-IIDness arises from differences in client feature marginals $P_{i} (x)$ , label marginals $P_{i} (y)$ , conditional distributions $P_{i} (y ∣ x)$ or $P_{i} (x ∣ y)$ (concept drift/shift), quantity skew, and also from violations of independence due to changing client availability $Q$ over time.

What does the paper say about convergence rates for federated averaging under IID data?

In IID convex settings, federated averaging/local SGD can match the statistical term but often has a suboptimal optimization term compared to accelerated minibatch SGD; it summarizes representative upper bounds in a comparative table.

How does non-IIDness affect the allowable number of local steps $K$ ?

The paper reports that non-IID analysis typically requires smaller $K$ than IID to retain comparable rates; e.g., to achieve a $1/ T K N$ -type bound under non-convex objectives, $K$ must be smaller than $T^{1/3} / N$ rather than the IID threshold.

What are the main privacy “what” and “how” dimensions emphasized by the paper?

“What” concerns privacy-preserving disclosure of computed results (notably differential privacy); “how” concerns secure computation and information flow (MPC, homomorphic encryption, TEEs, secure aggregation/shuffling, and verifiability).

What is the paper’s stance on privacy guarantees in baseline FL without cryptographic/DP mechanisms?

It argues that FL’s data minimization helps but does not provide formal privacy guarantees; leakage can occur if an adversary can infer training examples from model updates.

What robustness/security threats does the paper highlight as central in FL?

Training-time poisoning via model update poisoning and data poisoning, inference-time evasion/backdoor exploitation, and non-malicious failures like client dropouts and noisy updates; it also stresses the tension between privacy mechanisms and robustness.

Review Questions

Explain how the paper’s definition of FL (focused updates + early aggregation) changes the threat model compared to centralized training.
Using the paper’s non-IID taxonomy, give an example of a real-world FL scenario that corresponds to label distribution skew and another that corresponds to concept drift.
Why do convergence guarantees for local-update methods (like federated averaging) become harder under non-IID data and intermittent client availability?
Describe the difference between user-level DP and record-level DP as used in FL, and explain why privacy accounting is harder with dropout in cross-device FL.
How does secure aggregation (SecAgg) complicate defenses against poisoning, and what does the paper suggest as an integration challenge?

Key Points

1
Federated learning is best understood as a full-stack setting where optimization, communication, privacy, security, fairness, and systems constraints interact.
2
The paper distinguishes cross-device FL (massive, unreliable clients) from cross-silo FL (few, reliable organizations), and shows that many techniques and guarantees depend on this distinction.
3
Non-IIDness in FL is multi-faceted (feature skew, label skew, concept drift/shift, quantity skew) and can also arise from time-varying client availability $Q$ .
4
For federated averaging/local SGD, IID convergence analyses often preserve the statistical term but can have weaker optimization-rate behavior than accelerated minibatch SGD; non-IID settings typically require smaller local-update budgets $K$ .
5
Formal privacy in FL requires more than data minimization: the paper emphasizes differential privacy (user-level) and secure computation tools (MPC/HE/TEEs) plus verifiability.
6
FL introduces new attack surfaces (notably model update poisoning and backdoors) and new failure modes (client dropouts, noisy updates), and defenses must be evaluated under FL-specific threat models.
7
Fairness in FL is complicated by sampling/selection bias and by the frequent absence of sensitive attributes; the paper highlights fairness-without-demographics as an open direction.
8
System-induced bias from device availability and limited on-device runtime are major practical blockers, not just academic concerns.

Highlights

The paper defines FL operationally: clients keep raw data local and send “focused updates” so that “aggregation is performed as early as possible in the service of data minimization.”

In non-IID optimization, it reports that to achieve an error scaling like 1/TKN​ under non-convex objectives, the number of local updates must satisfy a stricter regime (stated as K smaller than T1/3/N rather than the IID threshold).

For privacy, it formalizes differential privacy with (ε,δ) and stresses user-level adjacency (adding/removing all records of a user) as the relevant notion for FL.

For security, it cites a concrete result: “if 10% of the devices participating in federated learning are compromised, a backdoor can be introduced … even with the presence of anomaly detectors.”

Topics

Federated learning
Distributed optimization
Non-IID learning and dataset shift
Privacy-preserving machine learning
Differential privacy
Secure multi-party computation
Trusted execution environments
Secure aggregation and secure shuffling
Adversarial ML and poisoning/backdoors
Robustness and failure modes in distributed systems
Fairness in machine learning
Fairness without sensitive attributes
Federated systems and deployment engineering
Communication-efficient learning and compression
Split learning and decentralized learning

Mentioned

TensorFlow Federated (TFF)
Federated AI Technology Enabler (FATE)
PySyft
LEAF
PaddleFL
Clara Training Framework
IBM Federated Learning
Flower framework
FedML
Flower (research framework)
Flower framework (again, as cited)
TensorFlow
SGX (Intel)
Arm TrustZone
Sanctum (RISC-V)
libsnark
Microsoft SEAL
Flower (federated learning research framework)
Peter Kairouz
H. Brendan McMahan
Brendan Avent
Aurélien Bellet
Mehdi Bennis
Arjun Nitin Bhagoji
Kallista Bonawitz
Zachary Charles
Graham Cormode
Rachel Cummings
Rafael G.L. D’Oliveira
Hubert Eichner
Salim El Rouayheb
David Evans
Josh Gardner
Badih Ghazi
Marco Gruteser
Zaid Harchaoui
Martin Jaggi
Tara Javidi
Aleksandra Korolova
Prateek Mittal
Mehryar Mohri
Richard Nock
Ayfer Özğür
Rasmus Pagh
Hang Qi
Daniel Ramage
Ramesh Raskar
Mariana Raykova
Dawn Song
Weikang Song
Sebastian U. Stich
Ziteng Sun
Ananda Theertha Suresh
Florian Tramer
Praneeth Vepakomma
Jianyu Wang
Li Xiong
Zheng Xu
Qiang Yang
Felix X. Yu
FL - Federated Learning
SGD - Stochastic Gradient Descent
FedAvg - Federated Averaging (local SGD)
DP - Differential Privacy
MPC - Secure Multi-Party Computation
HE - Homomorphic Encryption
TEE - Trusted Execution Environment
SecAgg - Secure Aggregation
ESA - Encode-Shuffle-Analyze
LDP - Local Differential Privacy
RDP - (Mentioned via Rényi DP in references; not central in main text excerpt)
ZKP - Zero-Knowledge Proof
SNARK - Succinct Non-interactive Argument of Knowledge
NAS - Neural Architecture Search
HPO - Hyperparameter Optimization
IID - Identically and Independently Distributed
non-IID - Non-identically distributed
PL condition - Polyak–Łojasiewicz condition
DRO - Distributionally Robust Optimization
MAML - Model-Agnostic Meta-Learning
LTL - Learning-to-Learn
PD-SGD - Periodic Decentralized SGD
SCAFFOLD - Stochastic Controlled Averaging for Federated Learning