Statistical Tests for Annotation Significance¶
RISK provides six statistical methods for testing overrepresentation or underrepresentation of functional annotations within local network neighborhoods. Each method has different strengths depending on your data size, structure, and goals.
Summary of Methods¶
Test | Speed | Best For |
---|---|---|
Permutation | 🐢 Slow | Most robust, no assumptions |
Hypergeometric | ⚖️ Medium | GO/pathway analysis, exact sampling |
Binomial | ⚡ Fast | Binary trials, scalable |
Chi-squared | ⚡ Fast | Contingency tables, large datasets |
Poisson | ⚡ Fast | Rare events, sparse networks |
Z-score | ⚡ Fast | Approximate, fast scanning |
Common Parameters¶
All methods use a shared API and return a neighborhoods
dictionary with per-cluster statistics.
Parameter | Description |
---|---|
network |
NetworkX graph |
annotation |
Annotation dict |
distance_metric |
Method(s) for neighborhood detection (e.g., 'louvain' ) |
louvain_resolution |
Resolution for Louvain clustering |
leiden_resolution |
Resolution for Leiden clustering |
fraction_shortest_edges |
Filter for edge-based subgraphs |
null_distribution |
'network' or 'annotation' |
random_seed |
Random state for reproducibility |
Choose from several distance metrics such as 'louvain'
, 'leiden'
, 'walktrap'
, and more. See the tutorial notebook for full details. For null_distribution
, choose 'network'
(default) or 'annotation'
.
1. Permutation Test¶
Most robust method. Shuffles graph or annotations to build a null.
neighborhoods = risk.load_neighborhoods_permutation(
network=network,
annotation=annotation,
distance_metric="louvain",
louvain_resolution=10.0,
leiden_resolution=1.0,
fraction_shortest_edges=0.275,
score_metric="stdev",
null_distribution="network",
num_permutations=1000,
random_seed=887,
max_workers=1,
)
2. Hypergeometric Test¶
Exact test based on finite sampling without replacement.
neighborhoods = risk.load_neighborhoods_hypergeom(
network=network,
annotation=annotation,
distance_metric="louvain",
louvain_resolution=10.0,
fraction_shortest_edges=0.275,
null_distribution="network",
random_seed=887,
)
3. Binomial Test¶
Models binary outcomes assuming independent trials.
neighborhoods = risk.load_neighborhoods_binom(
network=network,
annotation=annotation,
distance_metric="louvain",
louvain_resolution=10.0,
fraction_shortest_edges=0.275,
null_distribution="network",
random_seed=887,
)
4. Chi-squared Test¶
Tests significance via contingency tables.
neighborhoods = risk.load_neighborhoods_chi2(
network=network,
annotation=annotation,
distance_metric="louvain",
louvain_resolution=10.0,
fraction_shortest_edges=0.275,
null_distribution="network",
random_seed=887,
)
5. Poisson Test¶
Evaluates deviation from expected frequency under Poisson.
neighborhoods = risk.load_neighborhoods_poisson(
network=network,
annotation=annotation,
distance_metric="louvain",
louvain_resolution=10.0,
fraction_shortest_edges=0.275,
null_distribution="network",
random_seed=887,
)
6. Z-score Test¶
Computes standardized overrepresentation scores for each cluster.
neighborhoods = risk.load_neighborhoods_zscore(
network=network,
annotation=annotation,
distance_metric="louvain",
louvain_resolution=10.0,
fraction_shortest_edges=0.275,
null_distribution="network",
random_seed=887,
)
Output¶
All test functions return a neighborhoods
dictionary with:
- Cluster IDs
- Term-wise overrepresentation scores
- Optional p-values or z-scores depending on method
Use this result to create a NetworkGraph
in the next step.
Next Step¶
Proceed to 5. Load Graph to build a cluster-aware network object.