Community detection in graphs

Q: What is modularity’s core formula used throughout the review?

For undirected graphs, $Q=\frac{1}{2m}\sum_{ij}\left(A_{ij}-\frac{k_i k_j}{2m}\right)\delta(C_i,C_j)$, where $\delta$ selects pairs in the same community.

Q: How does the review suggest testing whether a detected community structure is significant?

Compute $Q_{\max}$ for the observed graph, then compare to a null distribution from edge-rewiring null models; use a z-score $z=\frac{Q_{\max}-\langle Q\rangle_{\text{NM}}}{\sigma_{\text{NM}}}$, while noting non-Gaussianity and possible false positives/negatives.

Santo Fortunato

Physics Reports·2009·Physics and Astronomy·11,206 citations

6 min read

Read the full paper at DOI or on arxiv

TL;DR

Community detection is fundamentally ambiguous because “community” lacks a single universally accepted definition; algorithm outputs depend on the chosen definition and quality function.

Briefing Cornell Notes

Briefing

Santo Fortunato’s Physics Reports review, “Community detection in graphs” (2009), addresses a deceptively simple research question: how can one identify “communities” (clusters/modules) in networks using only the graph’s topology, and how should the many proposed algorithms be defined, compared, and validated? The question matters because community structure is widely treated as a fundamental organizing principle of real systems—social groups, scientific disciplines, protein functional modules, web topical clusters, and more. In practice, community detection is used for tasks like summarizing large networks, identifying roles of vertices, improving information retrieval or recommendation systems, and studying dynamical processes (diffusion, synchronization) at mesoscopic scales.

The paper’s contribution is not a single new algorithmic result but a comprehensive synthesis and critical assessment of the field. Fortunato organizes the literature around (i) what “community” means, (ii) how partitions are represented (partitions vs covers for overlapping communities), (iii) algorithmic families (traditional clustering, divisive edge-removal, modularity optimization, spectral methods, dynamic/process-based methods, and statistical inference), and (iv) the central methodological issue of whether detected structure is significant rather than an artifact of optimization or randomness.

Methodologically, the review is structured as a conceptual and comparative survey. It introduces formal elements used across the literature: community definitions (local/global/similarity-based), partitions and covers, computational complexity classes (including NP-hardness), and quality functions. A key technical anchor is Newman–Girvan modularity, presented as a global criterion derived from a null model that preserves the degree sequence via the configuration model. Modularity is written in matrix form as $Q = \frac{1}{2 m} ij \sum (A_{ij} - P_{ij}) δ (C_{i}, C_{j}),$ with the standard null model term $P_{ij} = \frac{k _{i} k _{j}}{2 m}$ , leading to the commonly used expression $Q = \frac{1}{2 m} ij \sum (A_{ij} - \frac{k _{i} k _{j}}{2 m}) δ (C_{i}, C_{j}) .$ Fortunato also emphasizes that modularity can be re-expressed as a sum over clusters, and equivalently as a cut-size comparison against the null model. This framing connects community detection to graph partitioning and makes it possible to analyze limitations.

Across algorithm families, the review repeatedly discusses study design in the sense of evaluation methodology: benchmarks with planted community structure, similarity measures between partitions, and significance testing against null models. The paper highlights that most empirical comparisons rely on synthetic benchmarks (e.g., Girvan–Newman planted $ℓ$ -partition graphs and the more realistic LFR benchmark with power-law degree and community-size distributions) and that the choice of benchmark strongly affects conclusions.

Key findings in the review are primarily “meta-findings” about what the field can and cannot reliably do. The most prominent is the resolution limit of modularity optimization: modularity may fail to detect small but well-defined communities when the graph is large. Fortunato explains this using the Fortunato–Barthélemy argument: if a graph consists of $n_{c}$ identical cliques connected by single edges, modularity’s maximum may merge multiple cliques into larger clusters when the cliques are smaller than a scale on the order of $m$ (where $m$ is the number of edges). The practical implication is that modularity-based methods can systematically under-resolve community structure, especially when communities are heterogeneous in size (common in real networks).

A second major “finding” is that high modularity does not guarantee true community structure. Fortunato discusses that random graphs can yield partitions with large modularity due to fluctuations, so significance must be assessed. He describes a z-score approach: compute the modularity maximum $Q_{m a x}$ for the observed graph and compare it to the distribution of modularity maxima under edge rewiring null models with the same expected degree sequence. The z-score is $z = \frac{Q _{m a x} - ⟨ Q ⟩ _{NM}}{σ _{NM}},$ with typical “strong structure” thresholds around $z ≫ 1$ (often $2$ – $3$ in practice), while noting that modularity-null distributions are not Gaussian and that this can produce false positives/negatives.

The review also synthesizes computational complexity results: many exact community detection formulations are NP-hard, and even modularity optimization is NP-complete (as cited via Brandes et al.). Therefore, algorithms rely on heuristics and approximations, and the review catalogs tradeoffs between accuracy and scalability (e.g., Girvan–Newman’s edge betweenness divisive method is slow; fast greedy modularity methods scale to very large graphs; simulated annealing and extremal optimization can be more accurate but are slower).

Limitations are acknowledged both explicitly and implicitly. The paper stresses conceptual ambiguity: “community” is not uniquely defined, and different definitions lead to different algorithmic targets. It also highlights that many quality functions depend on global null models with unrealistic “infinite horizon” assumptions (each vertex could connect to any other), which contributes to resolution-limit behavior. For directed graphs, modularity extensions may fail to capture flow direction properly (Fortunato discusses directed modularity issues and alternative formulations). For overlapping communities, the number of possible covers grows extremely fast, making overlap detection computationally demanding and definition-dependent.

Practical implications are central to the review. Fortunato argues that practitioners should (i) choose algorithms consistent with the graph type (weighted, directed, bipartite, overlapping), (ii) use appropriate benchmarks and similarity measures, (iii) assess significance/stability rather than trusting raw quality scores, and (iv) consider multiresolution methods when community sizes are unknown. He points to multiresolution modularity/spin-glass approaches (tunable resolution parameter $γ$ ) and stability-based methods as ways to mitigate resolution-limit issues.

Who should care? The review is aimed at researchers and practitioners across physics, computer science, and network science—anyone using community structure to interpret data or infer latent organization. It is especially relevant to analysts working with large, noisy, heterogeneous real networks (social, biological, web) where modularity-based methods may under-detect small communities or overfit fluctuations.

Overall, Fortunato’s core message is that community detection is not a solved problem: the field lacks a universally accepted definition of community, lacks fully reliable benchmarks and null models, and must treat algorithm outputs as hypotheses requiring significance and stability checks. The review provides a structured map of methods and the methodological pitfalls that determine whether detected communities are meaningful.

Cornell Notes

Fortunato reviews how to define and detect community structure in graphs, covering algorithm families from modularity optimization and spectral methods to dynamic/process-based and statistical-inference approaches. A central contribution is the critical assessment of modularity’s limitations—especially the resolution limit and the need to test significance against null models—along with guidance on benchmarking and comparing partitions.

What is the paper’s main research question about community detection?

How can one define and reliably detect community structure in graphs using topology alone, and how should different algorithms be evaluated, compared, and tested for significance?

Why does the paper emphasize that “community detection” is not well-defined?

Because there is no universally accepted definition of a community; the concept depends on application and leads to different algorithmic objectives and quality functions.

What is the standard null-model idea behind Newman–Girvan modularity?

A partition is scored by comparing observed within-community edge density to the expected density under a null model that preserves the degree sequence (configuration model), i.e., $P_{ij} = \frac{k _{i} k _{j}}{2 m}$ .

What is modularity’s core formula used throughout the review?

For undirected graphs, $Q = \frac{1}{2 m} \sum_{ij} (A_{ij} - \frac{k _{i} k _{j}}{2 m}) δ (C_{i}, C_{j})$ , where $δ$ selects pairs in the same community.

What is the resolution limit of modularity optimization?

Modularity may merge small true communities into larger ones when their total degree is below a scale on the order of $m$ , even if the small communities are internally very dense (e.g., cliques).

How does the review suggest testing whether a detected community structure is significant?

Compute $Q_{m a x}$ for the observed graph, then compare to a null distribution from edge-rewiring null models; use a z-score $z = \frac{Q _{m a x} - ⟨ Q ⟩ _{NM}}{σ _{NM}}$ , while noting non-Gaussianity and possible false positives/negatives.

What are the main algorithm families covered?

Traditional clustering (partitioning/hierarchical/spectral), divisive edge-removal (e.g., Girvan–Newman), modularity-based optimization (greedy, annealing, extremal optimization, spectral modularity), spectral algorithms, dynamic/process-based methods (random walks, Potts/spin models, synchronization), and statistical inference (generative models/blockmodels, MDL/information theory).

What is the key evaluation bottleneck the paper highlights?

Testing is often limited to a small set of benchmarks and quality functions; without reliable benchmarks and null models aligned to the definition of community, algorithm comparisons can be misleading.

Review Questions

Explain modularity as a comparison between observed and expected within-community connectivity. What null model assumption drives modularity’s resolution limit?
Describe why high modularity can occur in random graphs. What does the z-score procedure attempt to correct, and what are its pitfalls?
Compare divisive (Girvan–Newman) and modularity-optimization approaches: what structural signal do they rely on, and how do their computational costs differ?
Why do multiresolution methods (e.g., tuning $γ$ in Potts/spin-glass formulations) address the resolution-limit problem, and what practical difficulty remains?

Key Points

1
Community detection is fundamentally ambiguous because “community” lacks a single universally accepted definition; algorithm outputs depend on the chosen definition and quality function.
2
Newman–Girvan modularity scores partitions by comparing observed within-community edges to expected edges under a degree-preserving null model $P_{ij} = \frac{k _{i} k _{j}}{2 m}$ .
3
Modularity optimization has a resolution limit: it can merge small true communities when their scale is below about $m$ , even for very dense subgraphs like cliques.
4
High modularity does not guarantee real community structure; random graphs can yield large modularity due to fluctuations, so significance testing against null models is necessary.
5
Overlapping communities (covers) are harder because the number of possible covers grows extremely fast; overlap detection is definition- and algorithm-dependent.
6
Evaluation and benchmarking are central: Girvan–Newman benchmarks are simpler but less realistic than LFR benchmarks with power-law degree and community-size distributions.
7
Multiresolution and stability-based methods are emphasized as practical ways to mitigate resolution-limit issues and to select meaningful partitions.

Highlights

“Modularity optimization has a resolution limit that may prevent it from detecting clusters which are comparatively small with respect to the graph as a whole.”

“A graph has community structure with respect to a random graph with equal size and expected degree sequence… [so] the modularity maximum reveals significant structure only if it is appreciably larger than the modularity maximum of random graphs.”

“If a graph has a clear cluster structure, one expects that the maximum modularity of the graph reveals it… [but] if the cliques are smaller than a scale depending on the size of the graph, modularity would be higher for partitions whose clusters include two or more cliques.”

“The existence of a resolution limit for Newman–Girvan modularity implies that the straight optimization of quality functions yields a coarse description of the cluster structure.”

Topics

Network science
Community detection
Graph clustering
Modularity and quality functions
Spectral graph theory
Graph partitioning
Overlapping communities
Multiresolution clustering
Statistical inference and generative models
Information theory / MDL
Dynamic and process-based community detection
Benchmarking and evaluation of clustering algorithms

Mentioned

CFinder (Clique Percolation Method software)
MCL (Markov Cluster Algorithm) software
Pajek (mentioned in figure context)
findcommunities (fast modularity optimization software by Blondel et al.)
walktrap (Latapy & Pons Walktrap software)
NetworkCommunity (Zhou community detection code)
Peacock / Gregory’s Peacock algorithm code (as referenced)
LFR benchmark generator (Fortunato/Lancichinetti code repository link)
Santo Fortunato
Mark Newman
M. Girvan
Aaron Clauset
M. E. J. Newman
Danon et al.
Lancichinetti
Fortunato and Barthélemy
Rosvall and Bergstrom
Blondel et al.
Palla et al.
Radicchi et al.
Reichardt and Bornholdt
White and Smyth
Girvan and Newman
Karrer et al.
Delvenne et al.
Arenas et al.
Chakrabarti et al.
Clauset et al.
Efron and Tibshirani
Kleinberg
Brandes et al.
Fiedler
Fouss and Renders
P - Polynomial time complexity class
NP - Non-deterministic polynomial time
NP-hard - At least as hard as the hardest problems in NP
NP-complete - Problems that are both in NP and NP-hard
RCT - Randomized Controlled Trial (not used in this paper; included only as a general acronym reference)
PPI - Protein-protein interaction
CPM - Clique Percolation Method
LFR - Lancichinetti–Fortunato–Radicchi benchmark (LFR benchmark)
MDL - Minimum Description Length
EM - Expectation-Maximization
KL - Kullback–Leibler divergence
MCL - Markov Cluster Algorithm
LS-set - Strong community set (Radicchi et al.)
z-score - Standard score for significance testing against null models
NCPP - Network Community Profile Plot
SFI - Santa Fe Institute
SFI - Santa Fe Institute (same acronym repeated in context)