Skip to content

Loading and Associating Annotation Data

An annotation maps biological terms to network nodes (e.g., Gene Ontology categories mapping GO terms to genes). RISK supports multiple input formats with dedicated loaders.


Supported Input Formats

Format Function Example File
.json load_annotation_json() go_biological_process.json
.csv load_annotation_csv() go_biological_process.csv
.tsv load_annotation_tsv() go_biological_process.tsv
.xlsx/.xls load_annotation_excel() go_biological_process.xlsx
dict load_annotation_dict() Python-loaded JSON

Each method also accepts min_nodes_per_term and max_nodes_per_term to exclude underpowered or overly broad annotations.


JSON Annotation

annotation = risk.load_annotation_json(
    network=network,
    filepath="./data/json/annotation/go_biological_process.json",
    min_nodes_per_term=1,
    max_nodes_per_term=10_000,
)
  • Load term-to-node mappings from a JSON dictionary
  • Ideal for GO annotations exported from standard tools

CSV Annotation

annotation = risk.load_annotation_csv(
    network=network,
    filepath="./data/csv/annotation/go_biological_process.csv",
    label_colname="label",
    nodes_colname="nodes",
    nodes_delimiter=";",
    min_nodes_per_term=1,
    max_nodes_per_term=10_000,
)
  • Columns: one for labels, one for semicolon-separated nodes
  • Use for flat structured data

TSV Annotation

annotation = risk.load_annotation_tsv(
    network=network,
    filepath="./data/tsv/annotation/go_biological_process.tsv",
    label_colname="label",
    nodes_colname="nodes",
    nodes_delimiter=";",
    min_nodes_per_term=1,
    max_nodes_per_term=10_000,
)
  • Tab-delimited version of the CSV format

Excel Annotation

annotation = risk.load_annotation_excel(
    network=network,
    filepath="./data/excel/annotation/go_biological_process.xlsx",
    label_colname="label",
    nodes_colname="nodes",
    sheet_name="Sheet1",
    nodes_delimiter=";",
    min_nodes_per_term=1,
    max_nodes_per_term=10_000,
)
  • Specify a sheet name to target structured spreadsheets

Dictionary-Based Annotation

If you already have a dictionary loaded from another source:

import json

with open("./data/json/annotation/go_biological_process.json") as file:
    annotation_dict = json.load(file)

annotation = risk.load_annotation_dict(
    network=network,
    content=annotation_dict,
    min_nodes_per_term=1,
    max_nodes_per_term=10_000,
)

Use this method to work with annotations already in memory.


Next Step

Proceed to 4. Statistics to evaluate term overrepresentation.