RISK Tutorial and Examples¶
Welcome to the RISK tutorial notebook. This guide introduces tutorial.ipynb, a comprehensive walkthrough of RISK (Regional Inference of Significant Kinships) — a modular framework for biological network annotation and visualization. RISK integrates community detection algorithms, statistical overrepresentation analysis, and publication-quality plotting to support large-scale, reproducible network analysis.
Whether you're new to network analysis or aiming to explore advanced workflows, this tutorial covers key functionalities, practical examples, and reproducible pipelines.
Yeast Protein–Protein Interaction (PPI) Network Demonstration¶
In this example, we apply RISK to the yeast PPI network from Michaelis et al., 2023, comprising 3,839 proteins and 30,955 interactions. This network serves as a benchmark for demonstrating how RISK identifies compact functional modules and performs annotation overrepresentation.
In this tutorial, we:
- Cluster the network using scalable community detection algorithms (e.g., Louvain, Leiden).
- Evaluate functional overrepresentation using tests such as hypergeometric and permutation.
- Visualize modules with contour overlays and publication-ready network layouts.
These steps illustrate RISK’s ability to resolve functional modules and reveal the organization of cellular processes such as ribosomal assembly, mitochondrial organization, and RNA polymerase activity.
Tutorial Sections¶
0. Installing RISK¶
To get started with RISK, you'll need to install risk-network using pip. Run the following command in a code cell or terminal to install the latest version of the package:
Back to Top¶
# !pip install risk-network --upgrade
1. Importing RISK¶
After installing RISK, the next step is to import it into your notebook. You can verify that the installation was successful by checking the version of the package.
Back to Top¶
import risk as r
# Check the version of the RISK package to ensure it's installed correctly
print(f"RISK version: {r.__version__}")
RISK version: 0.1.0
After verifying that the RISK package is installed and properly loaded, the next step is to import the RISK class. This class provides the core functionalities for performing biological network analysis.
from risk import RISK
Notebook specific: Detect and change to the notebook’s directory to enable relative paths
import os
from pathlib import Path
if "__file__" not in globals():
os.chdir(Path().resolve())
Notebook specific: Use the %matplotlib inline magic command to display plots within the notebook
%matplotlib inline
2. Initializing RISK¶
This code block initializes a RISK object with specific parameters. Below is a description of each parameter:
Parameters¶
verbose(bool): Controls whether log messages are printed. IfTrue, log messages are printed to the console. Defaults toTrue.
Back to Top¶
# Initialize the RISK object
risk = RISK(verbose=True)
3. Loading Networks into RISK¶
Networks form the foundation of any RISK analysis, representing nodes (e.g., genes, proteins, papers) and the edges, or relationships, connecting them.
RISK provides flexible loading functions compatible with multiple formats—including Cytoscape session files, Cytoscape JSON exports, GPickle files, and in-memory NetworkX graphs. These loaders standardize input into NetworkX graph objects ready for downstream analysis, with options for preprocessing such as spherical projection, surface depth adjustment, and node filtering.
Supported Network Formats¶
| Format | Method | Description |
|---|---|---|
Cytoscape (.cys) |
load_network_cytoscape |
Import from Cytoscape session files; supports source/target labels and view selection. |
Cytoscape JSON (.cyjs) |
load_network_cyjs |
Import from Cytoscape JSON exports; specify source/target labels for nodes/edges. |
GPickle (.gpickle) |
load_network_gpickle |
Reload networks serialized with GPickle; preserves complex structures. |
| NetworkX Graph | load_network_networkx |
Convert existing NetworkX objects into RISK-compatible format. |
Back to Top¶
Cytoscape Files (.cys)¶
Load Cytoscape session files exported from the desktop app.
Parameters¶
filepath(str): Path to the Cytoscape file.source_label(str, optional): Source node label. Defaults to"source".target_label(str, optional): Target node label. Defaults to"target".view_name(str, optional): Specific view name to load. Defaults to"".compute_sphere(bool, optional): Whether to map nodes from a 2D plane onto a 3D spherical surface using a Mercator projection. Defaults toTrue. This enables visualization of nodes on a sphere for improved spatial representation, particularly in applications where spatial relationships are crucial (e.g., networks with global-scale data or modular clustering).surface_depth(float, optional): Adjusts the depth of nodes relative to the spherical surface, enhancing visualization of clustering. Defaults to0.0. Positive values pull clustered nodes closer to the center of the sphere, creating a clear visual distinction for denser regions. Negative values push nodes outward, emphasizing peripheral or sparse clusters while retaining their relative positions. A value of0.0keeps all nodes on the sphere's surface.min_edges_per_node(int, optional): Minimum number of edges per node. Defaults to0.
Returns¶
nx.Graph: The loaded and processed Cytoscape network as a NetworkX graph.
# Load the network from a Cytoscape file for RISK analysis
network = risk.load_network_cytoscape(
filepath="./data/cytoscape/michaelis_2023.cys",
source_label="source",
target_label="target",
view_name="",
compute_sphere=True,
surface_depth=0.1,
min_edges_per_node=1,
)
--------------- Loading network --------------- Filetype: Cytoscape Filepath: ./data/cytoscape/michaelis_2023.cys Minimum edges per node: 1 Projection: Sphere Surface depth: 0.1 Initial node count: 3839 Final node count: 3839 Initial edge count: 30955 Final edge count: 30955
Cytoscape JSON Files (.cyjs)¶
Load JSON exports from Cytoscape Web or JavaScript pipelines.
Parameters¶
filepath(str): Path to the Cytoscape JSON file.source_label(str, optional): Source node label. Default is"source".target_label(str, optional): Target node label. Default is"target".compute_sphere(bool, optional): Whether to map nodes from a 2D plane onto a 3D spherical surface using a Mercator projection. Defaults toTrue. This enables visualization of nodes on a sphere for improved spatial representation, particularly in applications where spatial relationships are crucial (e.g., networks with global-scale data or modular clustering).surface_depth(float, optional): Adjusts the depth of nodes relative to the spherical surface, enhancing visualization of clustering. Defaults to0.0. Positive values pull clustered nodes closer to the center of the sphere, creating a clear visual distinction for denser regions. Negative values push nodes outward, emphasizing peripheral or sparse clusters while retaining their relative positions. A value of0.0keeps all nodes on the sphere's surface.min_edges_per_node(int, optional): Minimum number of edges per node. Defaults to0.
Returns¶
nx.Graph: The loaded and processed Cytoscape JSON network as a NetworkX graph.
# Load the network from a Cytoscape JSON file for RISK analysis
network = risk.load_network_cyjs(
filepath="./data/cyjs/michaelis_2023.cyjs",
source_label="source",
target_label="target",
compute_sphere=True,
surface_depth=0.1,
min_edges_per_node=1,
)
--------------- Loading network --------------- Filetype: Cytoscape JSON Filepath: ./data/cyjs/michaelis_2023.cyjs Minimum edges per node: 1 Projection: Sphere Surface depth: 0.1 EDGE LENGTH WARNING — 30955 out of 30955 edges (100.00%) had invalid or non-positive lengths and were replaced with a minimal fallback value (1e-12). Initial node count: 3839 Final node count: 3839 Initial edge count: 30955 Final edge count: 30955
GPickle Files (.gpickle)¶
Fast, Python-native serialization of NetworkX graphs preserving all attributes. Recommended for reproducibility and performance.
Parameters¶
filepath(str): Path to the GPickle file.compute_sphere(bool, optional): Whether to map nodes from a 2D plane onto a 3D spherical surface using a Mercator projection. Defaults toTrue. This enables visualization of nodes on a sphere for improved spatial representation, particularly in applications where spatial relationships are crucial (e.g., networks with global-scale data or modular clustering).surface_depth(float, optional): Adjusts the depth of nodes relative to the spherical surface, enhancing visualization of clustering. Defaults to0.0. Positive values pull clustered nodes closer to the center of the sphere, creating a clear visual distinction for denser regions. Negative values push nodes outward, emphasizing peripheral or sparse clusters while retaining their relative positions. A value of0.0keeps all nodes on the sphere's surface.min_edges_per_node(int, optional): Minimum number of edges per node. Defaults to0.
Returns¶
nx.Graph: The loaded and processed GPickle network as a NetworkX graph.
# Load the network from a GPickle file for RISK analysis
network = risk.load_network_gpickle(
filepath="./data/gpickle/michaelis_2023.gpickle",
compute_sphere=True,
surface_depth=0.1,
min_edges_per_node=1,
)
--------------- Loading network --------------- Filetype: GPickle Filepath: ./data/gpickle/michaelis_2023.gpickle Minimum edges per node: 1 Projection: Sphere Surface depth: 0.1 Initial node count: 3839 Final node count: 3839 Initial edge count: 30955 Final edge count: 30955
NetworkX Graphs¶
Load from an in-memory networkx.Graph. Integrates seamlessly when the graph is already loaded.
Node requirements
- Node position must be provided as either
xandyattributes or a singlepos = (x, y)tuple. - Node labels used for annotation and enrichment are taken from the
labelattribute if present; otherwise, the node identifier is used.
Parameters¶
network(nx.Graph): A NetworkX graph object.compute_sphere(bool, optional): Whether to map nodes from a 2D plane onto a 3D spherical surface using a Mercator projection. Defaults toTrue. This enables visualization of nodes on a sphere for improved spatial representation, particularly in applications where spatial relationships are crucial (e.g., networks with global-scale data or modular clustering).surface_depth(float, optional): Adjusts the depth of nodes relative to the spherical surface, enhancing visualization of clustering. Defaults to0.0. Positive values pull clustered nodes closer to the center of the sphere, creating a clear visual distinction for denser regions. Negative values push nodes outward, emphasizing peripheral or sparse clusters while retaining their relative positions. A value of0.0keeps all nodes on the sphere's surface.min_edges_per_node(int, optional): Minimum number of edges per node. Defaults to0.
Returns¶
nx.Graph: The loaded and processed NetworkX graph.
# Load the network from a NetworkX graph for RISK analysis
network = risk.load_network_networkx(
network=network,
compute_sphere=True,
surface_depth=0.1,
min_edges_per_node=1,
)
--------------- Loading network --------------- Filetype: NetworkX Minimum edges per node: 1 Projection: Sphere Surface depth: 0.1 Initial node count: 3839 Final node count: 3839 Initial edge count: 30955 Final edge count: 30955
4. Loading Annotations into RISK¶
Annotations provide the biological context for RISK by linking network clusters to functional terms (e.g., cellular components, metabolic pathways, protein complexes). They enable downstream statistical tests to evaluate whether specific functions are overrepresented within detected network clusters.
Annotations must be provided by the user and formatted clearly to ensure compatibility with RISK loaders. We note that RISK does not bundle annotations; users should obtain curated resources such as Gene Ontology (GO) (Ashburner et al., 2000), CORUM (Tsitsiridis et al., 2022), or KEGG (Kanehisa et al., 2023) and format them as term–to–node (e.g., genes/proteins) membership tables. Publicly available resources such as Gene Ontology, CORUM, and KEGG offer standardized annotation files that can be adapted for use.
For example, a simple JSON/dictionary representation could be:
{
"proline biosynthetic process": ["CAR2", "PRO1", "PRO2", "PRO3", "YHR033W"],
"glyoxylate metabolic process": ["DAL7", "GOR1", "ICL1", "IDP1", "IDP2", "IDP3", "LEU2", "MDH3", "MLS1"]
}
Supported Annotation Formats¶
| Format | Method | Description |
|---|---|---|
| JSON | load_annotation_json |
Import annotations from a JSON term–to–gene mapping file. |
| CSV | load_annotation_csv |
Import from a CSV file (supports custom delimiters). |
| TSV | load_annotation_tsv |
Import from a tab-separated file. |
| Excel | load_annotation_excel |
Import annotations from a specified sheet in an XLSX file. |
Back to Top¶
JSON Files (.json)¶
Load annotation data from a JSON file and associate terms with network nodes.
Parameters¶
network(NetworkX graph): The network to which the annotation is related.filepath(str): Path to the JSON annotation file.min_nodes_per_term(int, optional): The minimum number of network nodes required for each annotation term to be included. Defaults to1.max_nodes_per_term(int, optional): The maximum number of network nodes required for each annotation term to be included. Defaults to10_000.
Returns¶
dict: A dictionary containing ordered nodes, ordered annotations, and the annotation matrix.
# Load GO Biological Process (BP) annotations from a JSON file and associate them with the existing network
annotation = risk.load_annotation_json(
network=network,
filepath="./data/json/annotation/go_biological_process.json",
min_nodes_per_term=5,
max_nodes_per_term=75,
)
# Note: You can also load other GO annotations, such as:
# - 'go_cellular_component.json' for GO Cellular Component (CC) annotations
# - 'go_molecular_function.json' for GO Molecular Function (MF) annotations
------------------ Loading annotation ------------------ Filetype: JSON Filepath: ./data/json/annotation/go_biological_process.json Minimum number of nodes per annotation term: 5 Maximum number of nodes per annotation term: 75 Number of input annotation terms: 2214 Number of remaining annotation terms: 1293
CSV Files (.csv)¶
Load annotation data from a CSV file, specifying columns and delimiter for nodes.
Parameters¶
network(nx.Graph): The NetworkX graph to which the annotation is related.filepath(str): Path to the CSV annotation file.label_colname(str): Name of the column containing the labels (e.g., GO terms).nodes_colname(str): Name of the column containing the nodes associated with each label.nodes_delimiter(str, optional): Delimiter used to separate multiple nodes within the nodes column. Defaults to';'.min_nodes_per_term(int, optional): The minimum number of network nodes required for each annotation term to be included. Defaults to1.max_nodes_per_term(int, optional): The maximum number of network nodes required for each annotation term to be included. Defaults to10_000.
Returns¶
dict: A dictionary containing ordered nodes, ordered annotations, and the annotation matrix.
# Load GO Biological Process (BP) annotations from a CSV file and associate them with the existing network
annotation = risk.load_annotation_csv(
network=network,
filepath="./data/csv/annotation/go_biological_process.csv",
label_colname="label",
nodes_colname="nodes",
nodes_delimiter=";",
min_nodes_per_term=5,
max_nodes_per_term=75,
)
# Note: You can also load other GO annotations using similar filenames, such as:
# - 'go_cellular_component.csv' for GO Cellular Component (CC) annotations
# - 'go_molecular_function.csv' for GO Molecular Function (MF) annotations
------------------ Loading annotation ------------------ Filetype: CSV Filepath: ./data/csv/annotation/go_biological_process.csv Minimum number of nodes per annotation term: 5 Maximum number of nodes per annotation term: 75 Number of input annotation terms: 2214 Number of remaining annotation terms: 1293
TSV Files (.tsv)¶
Load annotation data from a tab-delimited file, similar to CSV but with tabs as separators.
Parameters¶
network(nx.Graph): The NetworkX graph to which the annotation is related.filepath(str): Path to the TSV annotation file.label_colname(str): Name of the column containing the labels (e.g., GO terms).nodes_colname(str): Name of the column containing the nodes associated with each label.nodes_delimiter(str, optional): Delimiter used to separate multiple nodes within the nodes column. Defaults to';'.min_nodes_per_term(int, optional): The minimum number of network nodes required for each annotation term to be included. Defaults to1.max_nodes_per_term(int, optional): The maximum number of network nodes required for each annotation term to be included. Defaults to10_000.
Returns¶
dict: A dictionary containing ordered nodes, ordered annotations, and the annotations matrix.
# Load GO Biological Process (BP) annotations from a TSV file and associate them with the existing network
annotation = risk.load_annotation_tsv(
network=network,
filepath="./data/tsv/annotation/go_biological_process.tsv",
label_colname="label",
nodes_colname="nodes",
nodes_delimiter=";",
min_nodes_per_term=5,
max_nodes_per_term=75,
)
# Note: You can also load other GO annotations using similar filenames, such as:
# - 'go_cellular_component.tsv' for GO Cellular Component (CC) annotations
# - 'go_molecular_function.tsv' for GO Molecular Function (MF) annotations
------------------ Loading annotation ------------------ Filetype: TSV Filepath: ./data/tsv/annotation/go_biological_process.tsv Minimum number of nodes per annotation term: 5 Maximum number of nodes per annotation term: 75 Number of input annotation terms: 2214 Number of remaining annotation terms: 1293
Excel Files (.xlsx, .xls)¶
Load annotation data from an Excel workbook, with support for selecting specific sheets.
Parameters¶
network(nx.Graph): The NetworkX graph to which the annotation are related.filepath(str): Path to the Excel annotation file.label_colname(str): Name of the column containing the labels (e.g., GO terms).nodes_colname(str): Name of the column containing the nodes associated with each label.sheet_name(str, optional): The name of the Excel sheet to load. Defaults to'Sheet1'.nodes_delimiter(str, optional): Delimiter used to separate multiple nodes within the nodes column. Defaults to';'.min_nodes_per_term(int, optional): The minimum number of network nodes required for each annotation term to be included. Defaults to1.max_nodes_per_term(int, optional): The maximum number of network nodes required for each annotation term to be included. Defaults to10_000.
Returns¶
dict: A dictionary containing ordered nodes, ordered annotations, and the annotations matrix.
# Load GO Biological Process (BP) annotations from an Excel file and associate them with the existing network
annotation = risk.load_annotation_excel(
network=network,
filepath="./data/excel/annotation/go_biological_process.xlsx",
label_colname="label",
nodes_colname="nodes",
sheet_name="Sheet1",
nodes_delimiter=";",
min_nodes_per_term=5,
max_nodes_per_term=75,
)
# Note: You can also load other GO annotations using similar filenames, such as:
# - 'go_cellular_component.xlsx' for GO Cellular Component (CC) annotations
# - 'go_molecular_function.xlsx' for GO Molecular Function (MF) annotations
------------------ Loading annotation ------------------ Filetype: Excel Filepath: ./data/excel/annotation/go_biological_process.xlsx Minimum number of nodes per annotation term: 5 Maximum number of nodes per annotation term: 75 Number of input annotation terms: 2214 Number of remaining annotation terms: 1293
Dictionary Annotation¶
Load annotation data already in memory as a Python dictionary.
Parameters¶
network(nx.Graph): The NetworkX graph to which the annotation are related.content(dict): The annotation dictionary to load.min_nodes_per_term(int, optional): The minimum number of network nodes required for each annotation term to be included. Defaults to1.max_nodes_per_term(int, optional): The maximum number of network nodes required for each annotation term to be included. Defaults to10_000.
Returns¶
dict: A dictionary containing ordered nodes, ordered annotations, and the annotation matrix.
# Load the JSON file into a dictionary, then use the dictionary to load annotations
import json
json_file_path = "./data/json/annotation/go_biological_process.json"
with open(json_file_path, "r") as file:
annotation_dict = json.load(file)
# Use the loaded dictionary with the load_annotation_dict method
annotation = risk.load_annotation_dict(
network=network,
content=annotation_dict,
min_nodes_per_term=5,
max_nodes_per_term=75,
)
------------------ Loading annotation ------------------ Filetype: Dictionary Filepath: In-memory dictionary Minimum number of nodes per annotation term: 5 Maximum number of nodes per annotation term: 75 Number of input annotation terms: 2214 Number of remaining annotation terms: 1293
5. Clustering Algorithms¶
Community detection defines the modules that RISK analyzes downstream. Different algorithms emphasize scale, cohesiveness, or stability, so it is often worth trying more than one approach on the same network. Each clustering method accepts a fraction_shortest_edges argument: a rank-based fraction (0, 1] of the shortest edges retained before running community detection, mirroring the implementation in risk.cluster.
| Algorithm | Speed | Primary use | When/Why (assumptions & notes) |
|---|---|---|---|
| Louvain | Fast | Default, scalable to very large networks | Greedy modularity optimization (Blondel et al., 2008); efficient for >10⁴ nodes; may produce disconnected subclusters. |
| Leiden | Fast | Improved Louvain with better resolution | Guarantees well-connected communities; more stable than Louvain (Traag et al., 2019); slightly higher runtime. |
| Markov Clustering | Medium | Detect smaller, compact complexes | Flow-based algorithm (Van Dongen, 2008); good for protein complexes or tightly connected submodules. |
| Walktrap | Medium | Hierarchical detection in mid-sized graphs | Random-walk based (Pons & Latapy, 2005); effective for local structure; slower on >10⁴ nodes. |
| Greedy Modularity | Fast | Coarse partitioning | Optimizes modularity via agglomeration; very fast but suffers from resolution limit (Newman, 2004). |
| Label Propagation | Fast | Quick heuristic | Unsupervised label spreading; no objective function; non-deterministic and unstable (Raghavan et al., 2007). |
| Spinglass | Slow | Small networks; theoretical interest | Statistical mechanics approach (Reichardt & Bornholdt, 2006); finds communities by simulating spin states; computationally intensive. |
Back to Top¶
Louvain Clustering¶
Greedy modularity optimisation that scales well to large graphs and serves as the default starting point.
# Louvain clustering example
clusters_louvain = risk.cluster_louvain(
network=network,
fraction_shortest_edges=0.25,
resolution=5.0,
random_seed=887,
)
print(f"Louvain clusters: shape={clusters_louvain.shape}, nnz={clusters_louvain.getnnz()}")
------------------ Computing clusters ------------------ Clustering: 'louvain' Edge length threshold: 0.25 Resolution: 5.0 Random seed: 887 Louvain clusters: shape=(3839, 3839), nnz=38212
Leiden Clustering¶
Connectivity-refined variant of Louvain that yields better separated clusters while remaining fast.
# Leiden clustering example
clusters_leiden = risk.cluster_leiden(
network=network,
fraction_shortest_edges=0.25,
resolution=1.0,
random_seed=887,
)
print(f"Leiden clusters: shape={clusters_leiden.shape}, nnz={clusters_leiden.getnnz()}")
------------------ Computing clusters ------------------ Clustering: 'leiden' Edge length threshold: 0.25 Resolution: 1.0 Random seed: 887 Leiden clusters: shape=(3839, 3839), nnz=54372
Greedy Modularity Clustering¶
Agglomerative optimisation that quickly produces coarse partitions—useful for exploratory analysis.
# Greedy modularity clustering example
clusters_greedy = risk.cluster_greedy(
network=network,
fraction_shortest_edges=0.25,
)
print(f"Greedy clusters: shape={clusters_greedy.shape}, nnz={clusters_greedy.getnnz()}")
------------------ Computing clusters ------------------ Clustering: 'greedy' Edge length threshold: 0.25 Greedy clusters: shape=(3839, 3839), nnz=68880
Label Propagation Clustering¶
Heuristic label spreading approach that offers a non-parametric, fast partitioning of the network.
# Label propagation clustering example
clusters_labelprop = risk.cluster_labelprop(
network=network,
fraction_shortest_edges=0.25,
)
print(
f"Label propagation clusters: shape={clusters_labelprop.shape}, nnz={clusters_labelprop.getnnz()}"
)
------------------ Computing clusters ------------------ Clustering: 'labelprop' Edge length threshold: 0.25 Label propagation clusters: shape=(3839, 3839), nnz=42434
Markov Clustering (MCL)¶
Flow-based clustering that captures compact complexes by simulating random walks with expansion and inflation steps.
# Markov clustering example
clusters_markov = risk.cluster_markov(
network=network,
fraction_shortest_edges=0.25,
)
print(f"Markov clusters: shape={clusters_markov.shape}, nnz={clusters_markov.getnnz()}")
------------------ Computing clusters ------------------ Clustering: 'markov' Edge length threshold: 0.25 Markov clusters: shape=(3839, 3839), nnz=31798
Walktrap Clustering¶
Hierarchical clustering that groups vertices visited together during short random walks.
# Walktrap clustering example
clusters_walktrap = risk.cluster_walktrap(
network=network,
fraction_shortest_edges=0.25,
)
print(f"Walktrap clusters: shape={clusters_walktrap.shape}, nnz={clusters_walktrap.getnnz()}")
------------------ Computing clusters ------------------ Clustering: 'walktrap' Edge length threshold: 0.25 Walktrap clusters: shape=(3839, 3839), nnz=42084
Spinglass Clustering¶
Simulated annealing approach inspired by statistical mechanics—slower but useful for smaller graphs.
# Spinglass clustering example
clusters_spinglass = risk.cluster_spinglass(
network=network,
fraction_shortest_edges=0.25,
)
print(f"Spinglass clusters: shape={clusters_spinglass.shape}, nnz={clusters_spinglass.getnnz()}")
------------------ Computing clusters ------------------ Clustering: 'spinglass' Edge length threshold: 0.25 Spinglass clusters: shape=(3839, 3839), nnz=38262
6. Statistical Methods¶
Once clusters are defined, statistical tests quantify the enrichment or depletion of annotation terms within those modules. The table summarises the trade-offs between the supported tests. All statistical APIs return a dictionary with "depletion_pvals" and "enrichment_pvals", matching the outputs documented in risk.stats.api.
| Test | Speed | Primary use | When/Why (assumptions & notes) |
|---|---|---|---|
| Permutation | Slow | Most rigorous; non-parametric | Distribution-free empirical null (permute network or labels); preferred when assumptions are unclear; computationally intensive. |
| Hypergeometric | Medium | Standard for GO/pathway overrepresentation | Exact test for finite populations sampled without replacement; widely used for term–to–gene membership tables. |
| Chi-squared | Fast | Approximate contingency-table testing | Suitable for large samples with expected counts ≥ 5 per cell; fast but approximate; avoid with sparse/low counts. |
| Binomial | Fast | Scalable approximation | Fast approximation assuming independent trials/with-replacement; useful for large populations with small samples. |
Back to Top¶
Permutation Test¶
Builds an empirical null by permuting either the network structure or annotation labels.
Parameters¶
annotation(dict): The annotation associated with the network.clusters(csr_matrix): Cluster assignment matrix produced by a call such ascluster_louvainorcluster_leiden.score_metric(str, optional): Metric used to score clusters. Options include:'sum': Sums the annotation values within each cluster. (default)'stdev': Computes the standard deviation of annotation values within each cluster.
null_distribution(str, optional): Defines the type of null distribution to use for comparison. Options include:'network': Randomly permuted network structure. (default)'annotation': Randomly permuted annotations.
num_permutations(int, optional): Number of permutations for significance testing. Defaults to1000.random_seed(int, optional): Seed for random number generation in permutation test. Defaults to888.max_workers(int, optional): Maximum number of workers for parallel computation. Defaults to1.
Returns¶
dict: A dictionary containing depletion and enrichment p-values for each cluster/annotation pair.
# Compute annotation significance by running the permutation test on Louvain clusters
stats_permutation = risk.run_permutation(
annotation=annotation,
clusters=clusters_louvain,
score_metric="stdev",
null_distribution="network",
num_permutations=1_000,
random_seed=887,
max_workers=4,
)
------------------------ Running permutation test ------------------------ Cluster scoring metric: 'stdev' Number of permutations: 1000 Maximum workers: 4 Null distribution: 'network'
Total progress: 100%|██████████| 1000/1000 [00:09<00:00, 109.72it/s]
Hypergeometric Test¶
Exact test based on finite sampling without replacement.
Parameters¶
annotation(dict): The annotation associated with the network.clusters(csr_matrix): Cluster assignment matrix produced by a community detection method.null_distribution(str, optional): Defines the type of null distribution to use for comparison. Options include:'network': Randomly permuted network structure. (default)'annotation': Randomly permuted annotations.
Returns¶
dict: A dictionary containing depletion and enrichment p-values for each cluster/annotation pair.
# Compute annotation significance with the hypergeometric test on Louvain clusters
stats_hypergeom = risk.run_hypergeom(
annotation=annotation,
clusters=clusters_louvain,
null_distribution="network",
)
--------------------------- Running hypergeometric test --------------------------- Null distribution: 'network'
Chi-squared Test¶
Evaluates significance using contingency tables.
Parameters¶
annotation(dict): The annotation associated with the network.clusters(csr_matrix): Cluster assignment matrix produced by a community detection method.null_distribution(str, optional): Defines the type of null distribution to use for comparison. Options include:'network': Randomly permuted network structure. (default)'annotation': Randomly permuted annotations.
Returns¶
dict: A dictionary containing depletion and enrichment p-values for each cluster/annotation pair.
# Compute annotation significance with the chi-squared test on Louvain clusters
stats_chi2 = risk.run_chi2(
annotation=annotation,
clusters=clusters_louvain,
null_distribution="network",
)
------------------------ Running chi-squared test ------------------------ Null distribution: 'network'
Binomial Test¶
Fast approximation to overrepresentation based on independent trials.
Parameters¶
annotation(dict): The annotation associated with the network.clusters(csr_matrix): Cluster assignment matrix produced by a community detection method.null_distribution(str, optional): Defines the type of null distribution to use for comparison. Options include:'network': Randomly permuted network structure. (default)'annotation': Randomly permuted annotations.
Returns¶
dict: A dictionary containing depletion and enrichment p-values for each cluster/annotation pair.
# Compute annotation significance with the binomial test on Louvain clusters
stats_binom = risk.run_binom(
annotation=annotation,
clusters=clusters_louvain,
null_distribution="network",
)
--------------------- Running binomial test --------------------- Null distribution: 'network'
7. Building and Analyzing Results¶
The NetworkGraph object integrates network data, annotations, and overrepresentation results into a unified structure, supporting clustering, domain-level significance, and downstream visualization. The load_graph function generates a NetworkGraph object for analyzing the network. This function integrates various network components, such as clusters with overrepresented annotation terms, and provides options to customize significance filtering and domain aggregation.
Back to Top¶
Parameters¶
network(nx.Graph): The network graph containing the nodes and edges to be analyzed.annotation(dict): The annotation associated with the network, typically derived from biological or functional data.stats_results(dict): Output from a statistical test (e.g.,run_permutation) supplying depletion and enrichment p-values.tail(str, optional): Specifies the tail of the statistical test to use. Options include:'right': For enrichment. (default)'left': For depletion.'both': For two-tailed analysis.
pval_cutoff(float, optional): Cutoff value for p-values to determine significance. Defaults to0.01.- Range: Any value between 0 and 1.
fdr_cutoff(float, optional): Cutoff value for FDR-corrected p-values. Defaults to0.9999.- Range: Any value between 0 and 1.
impute_depth(int, optional): Depth for imputing missing values. Defaults to0.- Range: Any whole number greater than or equal to 0.
prune_threshold(float, optional): Threshold for pruning weak edges from the network graph. Defaults to0.0.- Range: Any value between 0 and 1.
linkage_criterion(str, optional): Criterion for clustering. Defaults to'distance'.- Options:
'distance': Clusters are formed based on distance.'off': Disable clustering. If selected, individual annotation terms will not be consolidated on the network.
- Options:
linkage_method(str, optional): Method used for hierarchical clustering. Defaults to'average'.- Options:
'auto': Automatically determines the optimal method using the silhouette score.- Other options:
'single','complete','average','weighted','centroid','median','ward'.
- Options:
linkage_metric(str, optional): Distance metric used for clustering. Defaults to'yule'.- Options:
'auto': Automatically determines the optimal metric using the silhouette score.- Other options:
'braycurtis','canberra','chebyshev','cityblock','correlation','cosine','dice','euclidean','hamming','jaccard','jensenshannon','kulczynski1','mahalanobis','matching','minkowski','rogerstanimoto','russellrao','seuclidean','sokalmichener','sokalsneath','sqeuclidean','yule'.
- Options:
linkage_threshold(str or float, optional): The cutoff distance for forming flat clusters in hierarchical clustering. Accepts either a numeric threshold or'auto'to enable automatic threshold optimization using the silhouette score. Defaults to0.2.- Range: Any value between 0 and 1.
min_cluster_size(int, optional): Minimum size of clusters to be formed. Defaults to5.max_cluster_size(int, optional): Maximum size of clusters to be formed. Defaults to1000.
Returns¶
NetworkGraph: ANetworkGraphobject representing the processed network, ready for analysis and visualization.
# Build a NetworkGraph using permutation-based significance results
graph = risk.load_graph(
network=network,
annotation=annotation,
stats_results=stats_permutation,
tail="right",
pval_cutoff=0.05,
fdr_cutoff=1.00,
impute_depth=0,
prune_threshold=0.125,
linkage_criterion="distance",
linkage_method="single",
linkage_metric="sokalmichener",
linkage_threshold=0.063,
min_cluster_size=5,
max_cluster_size=1_000,
)
---------------------------- Finding significant clusters ---------------------------- p-value cutoff: 0.05 FDR BH cutoff: 1.0 Significance tail: 'right' (enrichment) ------------------------------- Processing significant clusters ------------------------------- Imputation depth: 0 Pruning threshold: 0.125 ----------------------- Finding top annotations ----------------------- Min cluster size: 5 Max cluster size: 1000 ------------------------------ Grouping clusters into domains ------------------------------
Evaluating linkage methods and metrics: 100%|██████████| 1/1 [00:00<00:00]
Linkage criterion: 'distance' Linkage method: 'single' Linkage metric: 'sokalmichener' Linkage threshold: 0.063
7a. NetworkGraph Methods¶
The NetworkGraph object in RISK provides essential methods for managing network data structures. These methods enable the organization of domains and significance data, making it easier to handle clusters, significance scores, and annotations in the network.
pop: This method removes a specified domain ID and its associated data from all internal mappings withinNetworkGraph, and returns the domain's node labels. It effectively "cleans up" that domain from the network while maintaining internal consistency.
# Remove every reference to Domain ID 1 from the NetworkGraph instance and retrieve the associated node labels
# domain_1_labels = graph.pop(1)
7b. NetworkGraph Attributes¶
After computing cluster significance for terms, the NetworkGraph object in RISK holds several key attributes that organize the network's nodes, domains, and significance data. These attributes not only support the structure of the network but also enable flexible analysis and visualization, ensuring that significant clusters and annotations are correctly mapped.
domain_id_to_node_ids_map(dict): Maps each domain (cluster or group of nodes) to the node IDs it contains, helping to identify which nodes belong to each domain in the network.domain_id_to_node_labels_map(dict): Connects domain IDs to the node labels within each cluster, useful for grouping nodes by their labels for visualization.domain_id_to_enriched_node_labels_map(dict): Maps each domain ID to the node labels that are significantly enriched for that domain. Nodes may appear under multiple domain IDs in this mapping, reflecting functional enrichment rather than primary (layout) domain membership. This attribute supports blended node coloring and interpretation of pleiotropic or multi-domain nodes.domain_id_to_domain_terms_map(dict): Links each domain to its significant terms, providing insight into the functional significance of the clusters.domain_id_to_domain_info_map(dict): Associates each domain with a detailed description and its significance score, offering a comprehensive view of the domain's attributes.node_id_to_node_label_map(dict): A reverse lookup connecting node IDs back to their labels, ensuring clarity during visualization and analysis.node_label_to_significance_map(dict): Associates each node label with its significance score, facilitating interpretation of the network’s significant nodes.node_label_to_node_id_map(dict): Maps node labels to node IDs, providing easy conversion between labels and internal node identifiers.node_significance_sums(numpy.ndarray): Contains significance values for each node in a 1D array, reflecting the strength of significance across the network and highlighting the most significant nodes.
These attributes form the backbone of the RISK tool, allowing multiple statistical tests and visualizations to be generated and integrated into one cohesive network. Whether running permutation tests, hypergeometric tests, or visualizing significant subgraphs, these mappings ensure that all analyses remain consistent and unified under a single master network. By supporting iterative testing and clear organization of nodes and domains, these attributes make RISK a powerful tool for uncovering meaningful patterns and insights in network data.
# Fetching key NetworkGraph attributes from the graph object
domain_id_to_node_ids_map = graph.domain_id_to_node_ids_map
domain_id_to_node_labels_map = graph.domain_id_to_node_labels_map
domain_id_to_enriched_node_labels_map = graph.domain_id_to_enriched_node_labels_map
domain_id_to_domain_terms_map = graph.domain_id_to_domain_terms_map
domain_id_to_domain_info_map = graph.domain_id_to_domain_info_map
node_id_to_node_label_map = graph.node_id_to_node_label_map
node_label_to_significance_map = graph.node_label_to_significance_map
node_label_to_node_id_map = graph.node_label_to_node_id_map
node_significance_sums = graph.node_significance_sums
7c. NetworkGraph Analysis Summary¶
The summary method in the NetworkGraph object in RISK is designed to process, store, and export analysis results, including significance and depletion data. It provides methods to load and structure domain information into a DataFrame, as well as export the processed data to various file formats for reporting.
Loading Results¶
The load method loads and processes domain and annotation data into a DataFrame, applying FDR correction to p-values and structuring the data for significance metrics.
Returns¶
pd.DataFrame: A DataFrame containing processed significance scores, p-values, q-values, and annotation membership information.
# Load the analysis summary into a DataFrame
loaded_summary = graph.summary.load()
loaded_summary.head()
------------------------ Loading analysis summary ------------------------
| Annotation | Domain ID | Matched Members | Matched Count | Enrichment P-value | Enrichment Q-value | Depletion P-value | Depletion Q-value | |
|---|---|---|---|---|---|---|---|---|
| 0 | maintenance of protein location in cell | -1 | 0 | 1.000 | 1.000000 | 1.0 | 1.0 | |
| 1 | mRNA splice site recognition | 20 | CDC40;ISY1;LUC7;NAM8;PRP28;PRP39;PRP42;PRP8;PR... | 11 | 0.001 | 0.184714 | 1.0 | 1.0 |
| 2 | transsulfuration | -1 | 0 | 1.000 | 1.000000 | 1.0 | 1.0 | |
| 3 | signal peptide processing | -1 | 0 | 1.000 | 1.000000 | 1.0 | 1.0 | |
| 4 | regulation of phosphatidylinositol dephosphory... | -1 | 0 | 1.000 | 1.000000 | 1.0 | 1.0 |
# Export analysis summary to a CSV file
graph.summary.to_csv(filepath="./data/csv/summary/michaelis_2023.csv")
------------------------ Loading analysis summary ------------------------ Analysis summary exported to CSV file: ./data/csv/summary/michaelis_2023.csv
# Export analysis summary to a JSON file
graph.summary.to_json(filepath="./data/json/summary/michaelis_2023.json")
------------------------ Loading analysis summary ------------------------ Analysis summary exported to JSON file: ./data/json/summary/michaelis_2023.json
# Export analysis summary to a text file
graph.summary.to_txt(filepath="./data/txt/summary/michaelis_2023.txt")
------------------------ Loading analysis summary ------------------------ Analysis summary exported to text file: ./data/txt/summary/michaelis_2023.txt
8. Visualizing Networks in RISK¶
Visualization highlights relationships within the network, making modular structure and functional associations easier to interpret. RISK provides a flexible and powerful plotting interface via the NetworkPlotter object. This enables generation of publication-ready figures with detailed domain annotations, significance-driven coloring, contours, and rich label customization. The load_plotter function in RISK initializes a NetworkPlotter object for visualizing the network. Parameters are highly customizable, allowing you to adjust various aspects of the plot, such as the figure size and background color.
Back to Top¶
Parameters¶
graph(NetworkGraph): TheNetworkGraphobject containing the network structure to be visualized.figsize(tuple, optional): Size of the figure, specified as a tuple(width, height)in inches. Defaults to(10, 10).background_color(str, list, tuple, or np.ndarray, optional): Background color of the plot. Provide a single color (e.g.,"white",(1.0, 1.0, 1.0)for RGB, or(1.0, 1.0, 1.0, 1.0)for RGBA). Defaults to"white".background_alpha(float, None, optional): Transparency level of the background color. If provided, it overrides any existing alpha values found inbackground_color. Defaults to1.0.pad(float, optional): Padding value to adjust the axis limits around the network plot. Defaults to0.3.
Returns¶
NetworkPlotter: ANetworkPlotterobject configured with the provided parameters, used for further customization and plotting of the network.
# Turn interactive plotting off - this enables the graph to be built across multiple cells
import matplotlib.pyplot as plt
plt.ioff()
<contextlib.ExitStack at 0x3155933e0>
# Initialize the NetworkPlotter with the NetworkGraph object
plotter = risk.load_plotter(
graph=graph,
figsize=(15, 15),
background_color="black",
background_alpha=1.0,
pad=0.3,
)
# Set random seed for reproducibility
random_seed = 887
--------------- Loading plotter ---------------
8a. Plotting the Network Title and Subtitle¶
RISK allows users to add customizable titles and subtitles to network plots. You can adjust various parameters such as font size, font family, color, and position for both the title and subtitle.
Plotting the Title and Subtitle¶
The plot_title function in RISK adds a title and subtitle to your network plot. Both the title and subtitle are optional and can be customized.
Parameters¶
title(str, optional): Title of the plot. Defaults toNone.subtitle(str, optional): Subtitle of the plot. Defaults toNone.title_fontsize(int, optional): Font size for the title. Defaults to20.subtitle_fontsize(int, optional): Font size for the subtitle. Defaults to14.font(str, optional): Font family used for both the title and subtitle. Defaults to"DejaVu Sans".title_color(str, list, tuple, or np.ndarray, optional): Color of the title text. Provide a single color (e.g.,"black",(0.0, 0.0, 0.0)for RGB, or(0.0, 0.0, 0.0, 1.0)for RGBA). Defaults to"black".title_color(str, list, tuple, or np.ndarray, optional): Color of the subtitle text. Provide a single color (e.g.,"black",(0.0, 0.0, 0.0)for RGB, or(0.0, 0.0, 0.0, 1.0)for RGBA). Defaults to"black".title_x(float, optional): X-axis position of the title. Defaults to0.5.title_y(float, optional): Y-axis position of the title. Defaults to0.975.title_space_offset(float, optional): Fraction of figure height to leave for the space above the plot. Defaults to0.075.subtitle_offset(float, optional): Offset factor to position the subtitle below the title. Defaults to0.025.
# Plot network title and subtitle
plotter.plot_title(
title="Yeast PPI Network",
subtitle="Michaelis et al., 2023",
title_fontsize=24,
subtitle_fontsize=18,
font="DejaVu Sans",
title_color="white",
subtitle_color="lightblue",
title_x=0.5,
title_y=0.925,
title_space_offset=0.08,
subtitle_offset=0.025,
)
8b. Plotting the Network Perimeter¶
RISK offers two options for plotting the network perimeter: a simple circular outline or a detailed contour based on node density. Both options are customizable in terms of size, style, and transparency.
Plotting the Circle Perimeter¶
The plot_circle_perimeter function in RISK draws a circle around the network to represent its perimeter. You can customize the circle's scale, color, line style, and transparency.
Parameters¶
scale(float, optional): Scaling factor for the perimeter's diameter. Defaults to1.0.center_offset_x(float, optional): Horizontal offset as a fraction of the diameter. Negative values shift the center left, positive values shift it right. Defaults to0.0.center_offset_y(float, optional): Vertical offset as a fraction of the diameter. Negative values shift the center down, positive values shift it up. Defaults to0.0.linestyle(str, optional): Line style for the circle. Options include"solid","dashed","dashdot","dotted", or any Matplotlib-supported linestyle. Defaults to"dashed".linewidth(float, optional): Width of the circle's outline. Defaults to1.5.color(str, list, tuple, or np.ndarray, optional): Color of the circle. Provide a single color (e.g.,"black",(0.0, 0.0, 0.0)for RGB, or(0.0, 0.0, 0.0, 1.0)for RGBA). Defaults to"black".outline_alpha(float, None, optional): Transparency level for the circle's outline. If provided, it overrides any existing alpha values found incolor. Defaults to1.0.fill_alpha(float, None, optional): Transparency level for the circle’s fill. If provided, it overrides any existing alpha values found incolor. Defaults to0.0.
# Plot network perimeter as a circle
plotter.plot_circle_perimeter(
scale=1.02,
center_offset_x=0.0,
center_offset_y=0.0,
linestyle="solid",
linewidth=1.5,
color="white",
outline_alpha=1.0,
fill_alpha=0.0,
)
Plotting the Contour Perimeter¶
The plot_contour_perimeter function in RISK plots a Kernel Density Estimation (KDE)-based contour around the network, representing its perimeter. This method allows for flexible customization of the contour levels, color, and transparency.
Parameters¶
scale(float, optional): Scaling factor for the perimeter size. Defaults to1.0.levels(int, optional): Number of contour levels. Defaults to3.bandwidth(float, optional): Bandwidth for KDE, controlling the smoothness of the contour. Defaults to0.8.grid_size(int, optional): Resolution of the grid for KDE. Higher values create finer contours. Defaults to250.color(str, list, tuple, or np.ndarray, optional): Color of the circle. Provide a single color (e.g.,"black",(0.0, 0.0, 0.0)for RGB, or(0.0, 0.0, 0.0, 1.0)for RGBA). Defaults to"black".linestyle(str, optional): Line style for the contour. Options include"solid","dashed","dashdot","dotted", or any Matplotlib-supported linestyle. Defaults to"solid".linewidth(float, optional): Width of the contour’s outline. Defaults to1.5.outline_alpha(float, None, optional): Transparency level for the contour's outline. If provided, it overrides any existing alpha values found incolor. Defaults to1.0.fill_alpha(float, None, optional): Transparency level for the contour’s fill. If provided, it overrides any existing alpha values found incolor. Defaults to0.0.
# Draw a KDE-based contour around the network perimeter
plotter.plot_contour_perimeter(
scale=1.02,
levels=3,
bandwidth=0.6,
grid_size=250,
color="white",
linestyle="solid",
linewidth=1.5,
outline_alpha=1.0,
fill_alpha=0.0,
)
8c. Plotting Network Nodes and Edges¶
The plot_network function in RISK allows you to visualize the network nodes and edges with various customization options.
Parameters¶
node_size(int or np.ndarray, optional): Size of the nodes. Can be a single integer or an array of sizes. Defaults to50.node_shape(str, optional): Shape of the nodes. Options include:'o': Circle. (default)'s': Square.'^': Triangle up.'v': Triangle down.- Other options:
'p','P','h','H','8','d','D','>','<,'|','_'.
node_edgewidth(float, optional): Width of the edges around each node. Defaults to1.0.edge_width(float, optional): Width of the edges in the plot. Defaults to1.0.node_color(str, list, tuple, or np.ndarray, optional): Color of the nodes. Can be a single color (e.g.,"white","red",(0.5, 0.5, 0.5)for RGB, or(0.5, 0.5, 0.5, 0.8)for RGBA) or an array of such colors. Defaults to"white".node_edgecolor(str, list, tuple, or np.ndarray, optional): Color of the edges around each node. Can be a single color, a string of colors, or an array of string or RGB/RGBA colors. Defaults to"black".edge_color(str, list, tuple, or np.ndarray, optional): Color of the edges connecting the nodes. Can be a single color, a string of colors, or an array of string or RGB/RGBA colors. Defaults to"black".node_alpha(float or None, optional): Alpha value (transparency) for the nodes. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any alpha values innode_color. Defaults to1.0.edge_alpha(float or None, optional): Alpha value (transparency) for the edges. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any alpha values inedge_color. Defaults to1.0.
Annotated Node Size Parameters (for param node_size)¶
These parameters control the size of nodes based on their biological significance status. The get_annotated_node_sizes function is applied to the node_size parameter to determine these sizes.
Parameters¶
singificant_size(int, optional): Size for singificant nodes. Defaults to50.nonsignificant_size(int, optional): Size for non-singificant nodes. Defaults to25.
Returns¶
np.ndarray: Array of node sizes, with singificant nodes larger than non-singificant ones.
Annotated Node Color (for param node_color)¶
These parameters allow you to customize node colors, either by colormap or specific colors, based on significance or predefined categories. The get_annotated_node_colors function is applied to the node_color parameter to generate these colors.
Parameters¶
cmap(str, optional): The colormap to use for node colors. Defaults to"gist_rainbow".color(str, list, tuple, np.ndarray, or None, optional): A specific color to use for all nodes. Can be a single color (e.g.,"red",(0.5, 0.5, 0.5)for RGB, or(0.5, 0.5, 0.5, 0.8)for RGBA) or an array of such colors. If specified, this will override the colormap (cmap). Defaults toNone.blend_colors(bool, optional): Whether to blend colors for nodes with multiple domains. Defaults toFalse.blend_gamma(float, optional): Gamma correction factor for perceptual color blending. Defaults to2.2.min_scale(float, optional): Minimum scale for color intensity. Defaults to0.8.max_scale(float, optional): Maximum scale for color intensity. Defaults to1.0.scale_factor(float, optional): Factor for adjusting the color scaling intensity. Defaults to1.0.alpha(float, None, optional): Alpha value for singificant nodes. If provided, it overrides any existing alpha values found incolor. Defaults to1.0.nonsignificant_color(str, list, tuple, or np.ndarray, optional): Color for non-singificant nodes. Can be a single color (e.g.,"white",(0.5, 0.5, 0.5)for RGB, or(0.5, 0.5, 0.5, 0.8)for RGBA) or an array of such colors. Defaults to"white".nonsignificant_alpha(float, None, optional): Alpha value for non-singificant nodes. If provided, it overrides any existing alpha values found innonsignificant_color. Defaults to1.0.ids_to_colors(dict, None, optional): Mapping of domain IDs to specific colors. Defaults toNone.random_seed(int, optional): Seed for random number generation. Defaults to888.
Returns¶
np.ndarray: Array of RGBA colors adjusted for significance status.
# Plot network nodes and edges
plotter.plot_network(
node_size=plotter.get_annotated_node_sizes(
significant_size=225,
nonsignificant_size=12.5,
),
node_shape="o",
node_edgewidth=1.0,
edge_width=0.03,
node_color=plotter.get_annotated_node_colors(
cmap="gist_rainbow",
color=None,
blend_colors=False,
blend_gamma=2.2,
min_scale=1.0,
max_scale=1.0,
scale_factor=0.5,
alpha=1.0,
nonsignificant_color="white",
nonsignificant_alpha=0.75,
ids_to_colors={
10: "#f200ff",
32: "#fcec00",
},
random_seed=random_seed,
),
node_edgecolor="black",
edge_color="white",
node_alpha=1.0,
edge_alpha=1.0,
)
8d. Plotting a Subnetwork¶
The plot_subnetwork function in RISK allows you to focus on and visualize a subset of the network nodes and their connecting edges with customizable attributes.
Parameters¶
nodes(list, tuple, or np.ndarray): List of node labels to include in the subnetwork. Accepts nested lists.node_size(int or np.ndarray, optional): Size of the nodes. Can be a single integer or an array of sizes. Defaults to50.node_shape(str, optional): Shape of the nodes. Options include:'o': Circle. (default)'s': Square.'^': Triangle up.'v': Triangle down.- Other options:
'p','P','h','H','8','d','D','>','<,'|','_'.
node_edgewidth(float, optional): Width of the node edges. Defaults to1.0.edge_width(float, optional): Width of the edges in the subnetwork plot. Defaults to1.0.node_color(str, list, tuple, or np.ndarray, optional): Color of the nodes. Can be a single color (e.g.,"red",(0.5, 0.5, 0.5)for RGB, or(0.5, 0.5, 0.5, 0.8)for RGBA) or an array of such colors. Defaults to"white".node_edgecolor(str, list, tuple, or np.ndarray, optional): Color of the node edges. Can be a single color or an array of string or RGB/RGBA colors. Defaults to"black".edge_color(str, list, tuple, or np.ndarray, optional): Color of the edges connecting the nodes. Can be a single color or an array of string or RGB/RGBA colors. Defaults to"black".node_alpha(float or None, optional): Transparency for the nodes. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any alpha values innode_color. Defaults toNone.edge_alpha(float or None, optional): Transparency for the edges. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any alpha values inedge_color. Defaults toNone.
Raises¶
ValueError: If no valid nodes are found in the network graph.
# Plot a subnetwork with custom node and edge attributes
plotter.plot_subnetwork(
nodes=[
"LSM1",
"LSM2",
"LSM3",
"LSM4",
"LSM5",
"LSM6",
"LSM7",
"PAT1",
],
node_size=225,
node_shape="^",
node_edgewidth=1.0,
edge_width=0.04,
node_color="white",
node_edgecolor="black",
edge_color="white",
node_alpha=1.0,
edge_alpha=1.0,
)
8e. Plotting Contours¶
The plot_contours function in RISK allows you to visualize density contours around network nodes. This can help identify regions of high node density or clustering within the network.
Parameters¶
levels(int, optional): Number of contour levels to plot. Defaults to5.bandwidth(float, optional): Bandwidth for KDE, controlling the smoothness of the contour. Defaults to0.8.grid_size(int, optional): Resolution of the grid for KDE. Higher values create finer contours. Defaults to250.color(str, list, tuple, or np.ndarray, optional): Color of the contours. Can be a string (e.g.,"white"), an RGB/RGBA value, or an array of such values. Defaults to"white".linestyle(str, optional): Line style for the contours. Options include'solid','dashed','dashdot','dotted'. Defaults to"solid".linewidth(float, optional): Line width for the contours. Defaults to1.5.alpha(float, None, optional): Transparency level of the contour lines. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any existing alpha values found incolor. Defaults to1.0.fill_alpha(float, None, optional): Transparency level of the contour fill. If provided, it overrides any existing alpha values found incolor. Defaults toNone.
Returns¶
None: This function does not return any value. It directly plots the contours on the network graph.
Annotated Contour Color Parameters (for param color)¶
These parameters allow you to define or generate contour colors based on a colormap or specific colors. The get_annotated_contour_colors function is applied to the color parameter to generate these contour colors.
Parameters¶
cmap(str, optional): The colormap to use for contour colors. Defaults to"gist_rainbow".color(str, list, tuple, np.ndarray, or None, optional): A specific color to use for all contours. Can be a string (e.g.,"red"), an RGB or RGBA value, or an array of such values (strings, RGB, or RGBA). If specified, this will overwrite the colormap (cmap). Defaults toNone.blend_colors(bool, optional): Whether to blend colors for nodes with multiple domains. Defaults toFalse.blend_gamma(float, optional): Gamma correction factor for perceptual color blending. Defaults to2.2.min_scale(float, optional): Minimum intensity scale for the colors generated by the colormap. Controls the dimmest colors. Defaults to0.8.max_scale(float, optional): Maximum intensity scale for the colors generated by the colormap. Controls the brightest colors. Defaults to1.0.scale_factor(float, optional): Exponent for adjusting color scaling based on significance scores. A higher value increases contrast by dimming lower scores more. Defaults to1.0.ids_to_colors(dict, None, optional): Mapping of domain IDs to specific colors. Defaults toNone.random_seed(int, optional): Seed for random number generation to ensure reproducibility. Defaults to888.
Returns¶
np.ndarray: Array of RGBA colors for contour annotations.
# Plot KDE-based contours around network nodes
plotter.plot_contours(
levels=5,
bandwidth=0.8,
grid_size=250,
color=plotter.get_annotated_contour_colors(
cmap="gist_rainbow",
color=None,
blend_colors=False,
blend_gamma=2.2,
min_scale=1.0,
max_scale=1.0,
scale_factor=0.5,
ids_to_colors={
10: "#f200ff",
32: "#fcec00",
},
random_seed=random_seed,
),
linestyle="solid",
linewidth=2.0,
alpha=1.0,
fill_alpha=0.25,
)
8f. Plotting a Subcontour¶
The plot_subcontour function in RISK allows you to focus on and visualize contours around a specific subset of nodes using Kernel Density Estimation (KDE). This feature is useful for highlighting particular pathways or regions of interest within the network.
Parameters¶
nodes(list, tuple, or np.ndarray): List of node labels or list of lists of node labels to plot the contour for.levels(int, optional): Number of contour levels to plot. Defaults to5.bandwidth(float, optional): Bandwidth for KDE, controlling the smoothness of the contour. Defaults to0.8.grid_size(int, optional): Resolution of the grid for KDE. Higher values create finer contours. Defaults to250.color(str, list, tuple, or np.ndarray, optional): Color of the contour. Can be a string (e.g.,"white"), an RGB or RGBA value, or an array of such values (strings, RGB, or RGBA). Defaults to"white".linestyle(str, optional): Line style for the contour. Options include'solid','dashed','dashdot','dotted'. Defaults to"solid".linewidth(float, optional): Line width for the contour. Defaults to1.5.alpha(float, None, optional): Transparency level of the contour lines. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any existing alpha values found incolor. Defaults to1.0.fill_alpha(float, None, optional): Transparency level of the contour fill. If provided, it overrides any existing alpha values found incolor. Defaults toNone.
Raises¶
ValueError: If no valid nodes are found in the network graph.
# Plot custom KDE-based contours around a subset of nodes
plotter.plot_subcontour(
nodes=[
"LSM1",
"LSM2",
"LSM3",
"LSM4",
"LSM5",
"LSM6",
"LSM7",
"PAT1",
],
levels=5,
bandwidth=0.8,
grid_size=250,
color="white",
linestyle="solid",
linewidth=2.0,
alpha=1.0,
fill_alpha=0.25,
)
8g. Plotting Labels¶
The plot_labels method in the NetworkPlotter class is used to annotate the network with labels. This function provides various customization options to adjust the appearance and placement of labels within the network graph.
Parameters¶
scale(float, optional): Scale factor for positioning labels around the perimeter. Defaults to1.05.offset(float, optional): Offset distance for labels from the perimeter. Defaults to0.10.font(str, optional): Font name for the labels. Defaults to"DejaVu Sans".fontcase(str, dict, or None, optional): Defines how to transform the case of words. Can be a string ('upper','lower','title') or a dictionary mapping cases (e.g.,{'lower': 'title', 'upper': 'lower'}). Defaults toNone.fontsize(int, optional): Font size for the labels. Defaults to10.fontcolor(str, list, tuple, or np.ndarray, optional): Color of the label text. Can be a string (e.g.,"black"), an RGB or RGBA value, or an array of such values (strings, RGB, or RGBA). Defaults to"black".fontalpha(float, None, optional): Transparency level for the font color. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any existing alpha values found infontcolor. Defaults to1.0.arrow_linewidth(float, optional): Line width of the arrows pointing to centroids. Defaults to1.arrow_style(str, optional): Style of the arrows pointing to centroids. Defaults to"->".arrow_color(str, list, tuple, or np.ndarray, optional): Color of the arrows. Can be a string (e.g.,"black"), an RGB or RGBA value, or an array of such values (strings, RGB, or RGBA). Defaults to"black".arrow_alpha(float, None, optional): Transparency level for the arrow color. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any existing alpha values found inarrow_color. Defaults to1.0.arrow_base_shrink(float, optional): Distance between the text and the base of the arrow. Defaults to0.0.arrow_tip_shrink(float, optional): Distance between the arrow tip and the centroid. Defaults to0.0.max_labels(int, optional): Maximum number of labels to plot. Defaults toNone(no limit).min_label_lines(int, optional): Minimum number of lines in a label. Defaults to1.max_label_lines(int, optional): Maximum number of lines in a label. Defaults toNone(no limit).min_chars_per_line(int, optional): Minimum number of characters in a line to display. Defaults to1.max_chars_per_line(int, optional): Maximum number of characters in a line to display. Defaults toNone(no limit).words_to_omit(list, optional): List of words to omit from the labels. Defaults toNone.overlay_ids(bool, optional): Whether to overlay domain IDs in the center of the centroids. Defaults toFalse.ids_to_keep(list, tuple, np.ndarray, or None, optional): IDs of domains that must be labeled. To discover domain IDs, you can setoverlay_ids=True. Defaults toNone.ids_to_labels(dict, optional): A dictionary mapping domain IDs to custom labels (strings). The labels should be space-separated words. If provided, the custom labels will replace the default domain terms. Defaults toNone.
Raises¶
ValueError: If the number of providedids_to_keepexceedsmax_labels.
Annotated Label Color Parameters (for params fontcolor and arrow_color)¶
Customize the appearance of the labels with a colormap or a specific color. The get_annotated_label_colors function is applied to the fontcolor and arrow_color parameters to generate these label colors.
Parameters¶
cmap(str, optional): The colormap to use for label colors. Defaults to"gist_rainbow".color(str, list, tuple, np.ndarray, or None, optional): A specific color to use for all labels. Can be a string (e.g.,"red"), an RGB or RGBA value, or an array of such values (strings, RGB, or RGBA). Warning: If specified, this will overwrite the colormap (cmap). Defaults toNone.blend_colors(bool, optional): Whether to blend colors for nodes with multiple domains. Defaults toFalse.blend_gamma(float, optional): Gamma correction factor for perceptual color blending. Defaults to2.2.min_scale(float, optional): Minimum intensity scale for the colors generated by the colormap. Controls the dimmest colors. Defaults to0.8.max_scale(float, optional): Maximum intensity scale for the colors generated by the colormap. Controls the brightest colors. Defaults to1.0.scale_factor(float, optional): Exponent for adjusting color scaling based on significance scores. A higher value increases contrast by dimming lower scores more. Defaults to1.0.ids_to_colors(dict, None, optional): Mapping of domain IDs to specific colors. Defaults toNone.random_seed(int, optional): Seed for random number generation to ensure reproducibility. Defaults to888.
Returns¶
np.ndarray: Array of RGBA colors for label annotations.
# Plot labels on the network
plotter.plot_labels(
scale=1.1,
offset=0.12,
font="DejaVu Sans",
fontcase={"title": "lower"},
fontsize=15,
fontcolor="white",
fontalpha=1.0,
arrow_linewidth=2.0,
arrow_style="-",
arrow_color=plotter.get_annotated_label_colors(
cmap="gist_rainbow",
color=None,
blend_colors=False,
blend_gamma=2.2,
min_scale=1.0,
max_scale=1.0,
scale_factor=0.5,
ids_to_colors={
10: "#f200ff",
32: "#fcec00",
},
random_seed=random_seed,
),
arrow_alpha=1.0,
arrow_base_shrink=10.0,
arrow_tip_shrink=0.0,
max_labels=28,
min_label_lines=3,
max_label_lines=4,
min_chars_per_line=3,
max_chars_per_line=12,
words_to_omit=["from", "the", "into", "via", "novo", "process", "activity"],
overlay_ids=False,
ids_to_keep=None,
ids_to_labels=None,
)
8h. Plot Sublabel¶
The plot_sublabel method in the NetworkPlotter class is designed to annotate the network graph with a single label for a specified set of nodes. This method provides customization options for the label's appearance, positioning, font transparency, and the arrow pointing to the nodes.
Parameters¶
nodes(list, tuple, or np.ndarray): List of node labels or list of lists of node labels to be used for calculating the centroid.label(str): The label to be annotated on the network.radial_position(float, optional): Radial angle for positioning the label around the network's perimeter. Range:0-360degrees. Defaults to0.0.scale(float, optional): Scale factor for positioning the label around the perimeter. Defaults to1.05.offset(float, optional): Offset distance for the label from the perimeter. Defaults to0.10.font(str, optional): Font name for the label. Defaults to"DejaVu Sans".- Options: Any valid font name (e.g.,
"DejaVu Sans","Times New Roman").
- Options: Any valid font name (e.g.,
fontsize(int, optional): Font size for the label. Defaults to10.- Options: Any integer value representing font size.
fontcolor(str, list, tuple, or np.ndarray, optional): Color of the label text. Can be a string (e.g.,"black"), an RGB or RGBA value, or an array of such values. Defaults to"black".fontalpha(float, None, optional): Transparency level for the label font. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any existing alpha values found infontcolor. Defaults to1.0.arrow_linewidth(float, optional): Line width of the arrow pointing to the centroid. Defaults to1.arrow_style(str, optional): Style of the arrows pointing to the centroid. Defaults to"->".arrow_color(str, list, tuple, or np.ndarray, optional): Color of the arrow. Can be a string, RGB/RGBA value, or an array of such values. Defaults to"black".arrow_alpha(float, None, optional): Transparency level for the arrow. Range:0.0(fully transparent) to1.0(fully opaque). If provided, it overrides any existing alpha values found inarrow_color. Defaults to1.0.arrow_base_shrink(float, optional): Distance between the text and the base of the arrow. Defaults to0.0.arrow_tip_shrink(float, optional): Distance between the arrow tip and the centroid. Defaults to0.0.
# Plot sublabels on the network
plotter.plot_sublabel(
nodes=[
"LSM1",
"LSM2",
"LSM3",
"LSM4",
"LSM5",
"LSM6",
"LSM7",
"PAT1",
],
label="LSM1-7-PAT1 Complex",
radial_position=73,
scale=1.6,
offset=0.12,
font="DejaVu Sans",
fontsize=15,
fontcolor="white",
fontalpha=1.0,
arrow_linewidth=2.0,
arrow_style="-",
arrow_color="white",
arrow_alpha=1.0,
arrow_base_shrink=10.0,
arrow_tip_shrink=0.0,
)
8i. Plotting Utility Methods in NetworkPlotter¶
The NetworkPlotter class provides utility methods for managing the display and saving of plots. These methods interface directly with Matplotlib functions, facilitating easy integration into your plotting workflows.
Saving the Plot¶
The savefig method in RISK saves the current plot to a file. You can specify the filename, format, and additional options to customize the output.
Parameters¶
*args: Positional arguments passed toplt.savefig. Commonly used for specifying the filename (e.g.,"plot.png").pad_inches(float, optional): Padding around the figure when saving. Defaults to0.5.dpi(int, optional): Dots per inch (DPI) for the exported image. Defaults to100.- **
**kwargs**: Keyword arguments passed toplt.savefig, such asformat(e.g.,"png","pdf") and other options likebbox_inches.
# Save the plot to a file
# plotter.savefig("network_plot.png", pad_inches=0.5, dpi=100)
Displaying the Plot¶
The show method in RISK displays the current plot. This method is typically the last step after configuring your plot and is essential for visualizing your results.
Parameters¶
*args: Positional arguments passed toplt.show. Typically not used, but can be included for consistency.- **
**kwargs**: Keyword arguments passed toplt.show, such asblockto control whether the display blocks the execution of code.
# Display the plot
plotter.show()
8j. [Troubleshoot] Potential Plotting Issue in Jupyter Notebooks¶
When using the NetworkPlotter class in a Jupyter Notebook, you might notice that the plot is automatically displayed when plt.subplots is called during plot initialization. This can result in the plot appearing prematurely in the cell where it's created, even if you intend to display it later using plotter.show().
If you encounter this issue, refer to the following cell to properly display the plot.
# Set random seed for reproducibility
random_seed = 887
# Initialize the NetworkPlotter with the NetworkGraph object
plotter = risk.load_plotter(
graph=graph,
figsize=(15, 15),
background_color="black",
)
# Plot network title and subtitle
plotter.plot_title(
title="Yeast PPI Network",
subtitle="Michaelis et al., 2023",
title_fontsize=24,
subtitle_fontsize=18,
font="DejaVu Sans",
title_color="white",
subtitle_color="lightblue",
title_x=0.5,
title_y=0.925,
title_space_offset=0.08,
subtitle_offset=0.025,
)
# Plot network perimeter as a circle
plotter.plot_circle_perimeter(
scale=1.02,
center_offset_x=0.0,
center_offset_y=0.0,
linestyle="solid",
linewidth=1.5,
color="white",
outline_alpha=1.0,
fill_alpha=0.0,
)
# Plot network nodes and edges
plotter.plot_network(
node_size=plotter.get_annotated_node_sizes(
significant_size=225,
nonsignificant_size=12.5,
),
node_shape="o",
node_edgewidth=1.0,
edge_width=0.03,
node_color=plotter.get_annotated_node_colors(
cmap="gist_rainbow",
color=None,
blend_colors=False,
blend_gamma=2.2,
min_scale=1.0,
max_scale=1.0,
scale_factor=0.5,
alpha=1.0,
nonsignificant_color="white",
nonsignificant_alpha=0.75,
ids_to_colors={
10: "#f200ff",
32: "#fcec00",
},
random_seed=random_seed,
),
node_edgecolor="black",
edge_color="white",
node_alpha=1.0,
edge_alpha=1.0,
)
# Plot a subnetwork with custom node and edge attributes
plotter.plot_subnetwork(
nodes=[
"LSM1",
"LSM2",
"LSM3",
"LSM4",
"LSM5",
"LSM6",
"LSM7",
"PAT1",
],
node_size=225,
node_shape="^",
node_edgewidth=1.0,
edge_width=0.04,
node_color="white",
node_edgecolor="black",
edge_color="white",
node_alpha=1.0,
edge_alpha=1.0,
)
# Plot KDE-based contours around network nodes
plotter.plot_contours(
levels=5,
bandwidth=0.8,
grid_size=250,
color=plotter.get_annotated_contour_colors(
cmap="gist_rainbow",
color=None,
blend_colors=False,
blend_gamma=2.2,
min_scale=1.0,
max_scale=1.0,
scale_factor=0.5,
ids_to_colors={
10: "#f200ff",
32: "#fcec00",
},
random_seed=random_seed,
),
linestyle="solid",
linewidth=2.0,
alpha=1.0,
fill_alpha=0.25,
)
# Plot custom KDE-based contours around a subset of nodes
plotter.plot_subcontour(
nodes=[
"LSM1",
"LSM2",
"LSM3",
"LSM4",
"LSM5",
"LSM6",
"LSM7",
"PAT1",
],
levels=5,
bandwidth=0.8,
grid_size=250,
color="white",
linestyle="solid",
linewidth=2.0,
alpha=1.0,
fill_alpha=0.25,
)
# Plot labels on the network
plotter.plot_labels(
scale=1.1,
offset=0.12,
font="DejaVu Sans",
fontcase={"title": "lower"},
fontsize=15,
fontcolor="white",
fontalpha=1.0,
arrow_linewidth=2.0,
arrow_style="-",
arrow_color=plotter.get_annotated_label_colors(
cmap="gist_rainbow",
color=None,
blend_colors=False,
blend_gamma=2.2,
min_scale=1.0,
max_scale=1.0,
scale_factor=0.5,
ids_to_colors={
10: "#f200ff",
32: "#fcec00",
},
random_seed=random_seed,
),
arrow_alpha=1.0,
arrow_base_shrink=10.0,
arrow_tip_shrink=0.0,
max_labels=28,
min_label_lines=3,
max_label_lines=4,
min_chars_per_line=3,
max_chars_per_line=12,
words_to_omit=["from", "the", "into", "via", "novo", "process", "activity"],
overlay_ids=False,
ids_to_keep=None,
ids_to_labels=None,
)
# Plot sublabels on the network
plotter.plot_sublabel(
nodes=[
"LSM1",
"LSM2",
"LSM3",
"LSM4",
"LSM5",
"LSM6",
"LSM7",
"PAT1",
],
label="LSM1-7-PAT1 Complex",
radial_position=73,
scale=1.6,
offset=0.12,
font="DejaVu Sans",
fontsize=15,
fontcolor="white",
fontalpha=1.0,
arrow_linewidth=2.0,
arrow_style="-",
arrow_color="white",
arrow_alpha=1.0,
arrow_base_shrink=10.0,
arrow_tip_shrink=0.0,
)
# Display the plot
plotter.show()
--------------- Loading plotter ---------------
9. Overview of risk.params¶
The risk.params module manages and exports analysis parameters, supporting inspection, saving, and sharing to improve reproducibility. Note: Reloading a RISK instance from a saved parameter file is not currently supported.
Loading Parameters¶
The load method imports parameters from a predefined source, converting any np.ndarray values to lists for easier processing.
Returns¶
dict: A dictionary containing the processed parameters.
Back to Top¶
import pandas as pd
from IPython.display import display
# Load the parameters into a dictionary
loaded_params = risk.params.load()
# Display parameters in a tidy table for Jupyter documentation purposes
# This is intended for clarity in notebook examples, not for full inspection of nested fields
pd.set_option("display.max_colwidth", 200)
display(pd.DataFrame(list(loaded_params.items()), columns=["Parameter", "Value"]))
pd.reset_option("display.max_colwidth")
------------------ Loading parameters ------------------
| Parameter | Value | |
|---|---|---|
| 0 | annotation | {'filetype': 'JSON', 'filepath': './data/json/annotation/go_biological_process.json', 'min_nodes_per_term': 5, 'max_nodes_per_term': 75} |
| 1 | datetime | 2025-10-31 17:42:42 |
| 2 | graph | {'tail': 'right', 'pval_cutoff': 0.05, 'fdr_cutoff': 1.0, 'impute_depth': 0, 'prune_threshold': 0.125, 'linkage_criterion': 'distance', 'linkage_method': 'single', 'linkage_metric': 'sokalmichener... |
| 3 | clusters | {'clustering': 'louvain', 'fraction_shortest_edges': 0.25, 'resolution': 5.0, 'random_seed': 887} |
| 4 | network | {'compute_sphere': True, 'surface_depth': 0.1, 'min_edges_per_node': 1, 'filetype': 'Cytoscape', 'filepath': './data/cytoscape/michaelis_2023.cys'} |
| 5 | plotter | {'figsize': (15, 15), 'background_color': 'black', 'background_alpha': 1.0, 'pad': 0.3, 'title': 'Yeast PPI Network', 'subtitle': 'Michaelis et al., 2023', 'title_fontsize': 24, 'subtitle_fontsize... |
# Export parameters to a CSV file
risk.params.to_csv(filepath="./data/csv/params/michaelis_2023.csv")
------------------ Loading parameters ------------------ Parameters exported to CSV file: ./data/csv/params/michaelis_2023.csv
# Export parameters to a JSON file
risk.params.to_json(filepath="./data/json/params/michaelis_2023.json")
------------------ Loading parameters ------------------ Parameters exported to JSON file: ./data/json/params/michaelis_2023.json
# Export parameters to a text file
risk.params.to_txt(filepath="./data/txt/params/michaelis_2023.txt")
------------------ Loading parameters ------------------ Parameters exported to text file: ./data/txt/params/michaelis_2023.txt