Statistical Tests for Annotation Significance¶

RISK provides six statistical methods for testing overrepresentation or underrepresentation of functional annotations within local network neighborhoods. Each method has different strengths depending on your data size, structure, and goals.

Summary of Methods¶

Test	Speed	Best For
Permutation	🐢 Slow	Most robust, no assumptions
Hypergeometric	⚖️ Medium	GO/pathway analysis, exact sampling
Binomial	⚡ Fast	Binary trials, scalable
Chi-squared	⚡ Fast	Contingency tables, large datasets
Poisson	⚡ Fast	Rare events, sparse networks
Z-score	⚡ Fast	Approximate, fast scanning

Common Parameters¶

All methods use a shared API and return a neighborhoods dictionary with per-cluster statistics.

Parameter	Description
`network`	NetworkX graph
`annotation`	Annotation dict
`distance_metric`	Method(s) for neighborhood detection (e.g., `'louvain'`)
`louvain_resolution`	Resolution for Louvain clustering
`leiden_resolution`	Resolution for Leiden clustering
`fraction_shortest_edges`	Filter for edge-based subgraphs
`null_distribution`	`'network'` or `'annotation'`
`random_seed`	Random state for reproducibility

Choose from several distance metrics such as 'louvain', 'leiden', 'walktrap', and more. See the tutorial notebook for full details. For null_distribution, choose 'network' (default) or 'annotation'.

1. Permutation Test¶

Most robust method. Shuffles graph or annotations to build a null.

neighborhoods = risk.load_neighborhoods_permutation(
    network=network,
    annotation=annotation,
    distance_metric="louvain",
    louvain_resolution=10.0,
    leiden_resolution=1.0,
    fraction_shortest_edges=0.275,
    score_metric="stdev",
    null_distribution="network",
    num_permutations=1000,
    random_seed=887,
    max_workers=1,
)

2. Hypergeometric Test¶

Exact test based on finite sampling without replacement.

neighborhoods = risk.load_neighborhoods_hypergeom(
    network=network,
    annotation=annotation,
    distance_metric="louvain",
    louvain_resolution=10.0,
    fraction_shortest_edges=0.275,
    null_distribution="network",
    random_seed=887,
)

3. Binomial Test¶

Models binary outcomes assuming independent trials.

neighborhoods = risk.load_neighborhoods_binom(
    network=network,
    annotation=annotation,
    distance_metric="louvain",
    louvain_resolution=10.0,
    fraction_shortest_edges=0.275,
    null_distribution="network",
    random_seed=887,
)

4. Chi-squared Test¶

Tests significance via contingency tables.

neighborhoods = risk.load_neighborhoods_chi2(
    network=network,
    annotation=annotation,
    distance_metric="louvain",
    louvain_resolution=10.0,
    fraction_shortest_edges=0.275,
    null_distribution="network",
    random_seed=887,
)

5. Poisson Test¶

Evaluates deviation from expected frequency under Poisson.

neighborhoods = risk.load_neighborhoods_poisson(
    network=network,
    annotation=annotation,
    distance_metric="louvain",
    louvain_resolution=10.0,
    fraction_shortest_edges=0.275,
    null_distribution="network",
    random_seed=887,
)

6. Z-score Test¶

Computes standardized overrepresentation scores for each cluster.

neighborhoods = risk.load_neighborhoods_zscore(
    network=network,
    annotation=annotation,
    distance_metric="louvain",
    louvain_resolution=10.0,
    fraction_shortest_edges=0.275,
    null_distribution="network",
    random_seed=887,
)

Output¶

All test functions return a neighborhoods dictionary with:

Cluster IDs
Term-wise overrepresentation scores
Optional p-values or z-scores depending on method

Use this result to create a NetworkGraph in the next step.

Next Step¶

Proceed to 5. Load Graph to build a cluster-aware network object.