Skip to content

Building and Analyzing Results

The Graph object integrates network data, annotations, and overrepresentation results into a unified structure, supporting clustering, domain-level significance, and downstream visualization.


Create a Graph

Parameters:

  • network (nx.Graph): The network graph containing the nodes and edges to be analyzed.
  • annotation (dict): The annotation associated with the network, typically derived from biological or functional data.
  • stats_results (dict): Statistical output from risk.run_permutation, risk.run_hypergeom, risk.run_chi2, or risk.run_binom. Must contain "depletion_pvals" and "enrichment_pvals".
  • tail (str, optional): Specifies the tail of the statistical test to use. Options include:
    • 'right': For enrichment. (default)
    • 'left': For depletion.
    • 'both': For two-tailed analysis.
  • pval_cutoff (float, optional): Cutoff value for p-values to determine significance. Range: Any value between 0 and 1. Defaults to 0.01.
  • fdr_cutoff (float, optional): Cutoff value for FDR-corrected p-values. FDR (Benjamini–Hochberg) is computed per cluster across terms. Range: Any value between 0 and 1. Defaults to 0.9999.
  • display_prune_threshold (float, optional): Display-only pruning based on spatial layout distance that suppresses spatially diffuse or isolated regions in plots. Runs after enrichment and clustering on plotting matrices only, does not use enrichment strength or statistical significance or affect clustering, enrichment, or statistical testing, and defaults to 0.0. Range: Any value between 0 and 1.
  • linkage_criterion (str, optional): Criterion for clustering. Defaults to 'distance'. Options include:
    • 'distance': Clusters are formed based on distance.
    • 'off': Disables clustering; terms remain separate.
  • linkage_method (str, optional): Method used for hierarchical clustering. Defaults to 'average'. Options include:
    • 'auto': Automatically determines the optimal method using the silhouette score.
    • Other options: 'single', 'complete', 'average', 'weighted', 'centroid', 'median', 'ward'.
  • linkage_metric (str, optional): Distance metric used for clustering. Defaults to 'yule'. Options include:
    • 'auto': Automatically determines the optimal metric using the silhouette score.
    • Other options: 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulczynski1', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
  • linkage_threshold (str or float, optional): The cutoff distance for forming flat clusters in hierarchical clustering. Accepts either a numeric threshold or 'auto' to enable automatic threshold optimization using the silhouette score. Range depends on metric. Defaults to 0.2.
  • min_cluster_size (int, optional): Minimum size of clusters to be formed. Defaults to 5.
  • max_cluster_size (int, optional): Maximum size of clusters to be formed. Defaults to 1000.

Linkage runs after clustering and enrichment, grouping enriched terms into domains for labeling and presentation. It does not change clustering, enrichment, or statistical inference. Domain grouping is optional. Set linkage_criterion="off" to disable it. Disabling yields raw, ungrouped enriched terms.

Returns: Graph: A Graph object representing the processed network, ready for analysis and visualization.

graph = risk.load_graph(
    network=network,
    annotation=annotation,
    stats_results=stats_permutation,
    tail="right",
    pval_cutoff=0.01,
    fdr_cutoff=0.9999,
    display_prune_threshold=0.0,
    linkage_criterion="distance",
    linkage_method="average",
    linkage_metric="yule",
    linkage_threshold=0.2,
    min_cluster_size=5,
    max_cluster_size=1000,
)

Summarize results

Inspect matched members, counts, and significance in a DataFrame.

summary_df = graph.summary.load()
summary_df.head()

Notes

  • Raw columns are linkage-independent. Domain columns depend on linkage.
  • graph.summary.load() caches the computed summary table for faster repeated access; cache is automatically invalidated when domains are modified via graph.pop(...).

Column definitions

Column How it's calculated
Domain ID Domain assignment from domain-conditioned results. Defaults to -1 when no domain row is retained for that annotation.
Annotation Annotation label from the input annotation list.
Matched Members All network member labels matched to that annotation. Empty when no domain row is retained.
Matched Count Number of labels in Matched Members (0 if empty). Cast to integer in final output.
Raw Enrichment P-value Smallest enrichment p-value for that annotation across the full network.
Raw Enrichment Q-value Smallest enrichment q-value (Benjamini-Hochberg corrected) for that annotation across the full network.
Raw Depletion P-value Smallest depletion p-value for that annotation across the full network.
Raw Depletion Q-value Smallest depletion q-value (Benjamini-Hochberg corrected) for that annotation across the full network.
Domain Enrichment P-value Smallest enrichment p-value for that annotation within the assigned domain.
Domain Enrichment Q-value Smallest enrichment q-value (Benjamini-Hochberg corrected) for that annotation within the assigned domain.
Domain Depletion P-value Smallest depletion p-value for that annotation within the assigned domain.
Domain Depletion Q-value Smallest depletion q-value (Benjamini-Hochberg corrected) for that annotation within the assigned domain.

Why do I see Domain ID = -1?

Domain ID = -1 means the annotation had no retained domain assignment after merge. It does not mean raw statistics are missing.

Raw columns are computed independently from full p-value/q-value matrices and can still be informative even when Domain ID = -1.

Export Summary

Export the processed summary table in common formats for downstream use or sharing.

Shared Parameters:

Shared parameters among export methods.

  • filepath (str): The path where the file will be saved.
graph.summary.to_csv("./data/csv/summary/michaelis_2023.csv")
graph.summary.to_json("./data/json/summary/michaelis_2023.json")
graph.summary.to_txt("./data/txt/summary/michaelis_2023.txt")

Clean domains

Remove a domain (in-place) and retrieve its node labels:

domain_1_labels = graph.pop(1)

Key Attributes

The Graph object exposes several mappings for cluster and node information:

Domain-Level

  • domain_id_to_node_ids_map: Maps each domain ID to the list of node IDs belonging to that domain.
  • domain_id_to_node_labels_map: Maps each domain ID to the list of node labels in that domain for readable visualization.
  • domain_id_to_enriched_node_labels_map: Maps each domain ID to node labels with non-zero domain signal (enrichment or depletion) for that domain.
    • Unlike domain_id_to_node_labels_map, nodes may appear in multiple domains in this mapping, reflecting functional association rather than primary (layout) domain assignment. This attribute underlies blended node coloring and pleiotropic interpretation.
  • domain_id_to_domain_terms_map: Maps each domain ID to the list of overrepresented/significant terms associated with that domain.
  • domain_id_to_domain_info_map: Maps each domain ID to a metadata record (e.g., size, p-value, FDR, summary) about the domain.

Node-Level

  • node_id_to_node_label_map: Maps each internal node ID to its display label.
  • node_label_to_node_id_map: Maps each display label back to its internal node ID.
  • node_label_to_significance_map: Maps each node label to its significance score from the analysis.
  • node_significance_sums: Array of aggregate significance values per node, used for sizing, coloring, or ranking.

These attributes enable visualization, labeling, and export functionalities.


Next Step

Visualizing Networks in RISK