Loading and Associating Annotation Data¶
An annotation maps biological terms to network nodes (e.g., Gene Ontology categories mapping GO terms to genes). RISK supports multiple input formats with dedicated loaders.
Supported Input Formats¶
Format | Function | Example File |
---|---|---|
.json |
load_annotation_json() |
go_biological_process.json |
.csv |
load_annotation_csv() |
go_biological_process.csv |
.tsv |
load_annotation_tsv() |
go_biological_process.tsv |
.xlsx /.xls |
load_annotation_excel() |
go_biological_process.xlsx |
dict |
load_annotation_dict() |
Python-loaded JSON |
Each method also accepts min_nodes_per_term
and max_nodes_per_term
to exclude underpowered or overly broad annotations.
JSON Annotation¶
annotation = risk.load_annotation_json(
network=network,
filepath="./data/json/annotation/go_biological_process.json",
min_nodes_per_term=1,
max_nodes_per_term=10_000,
)
- Load term-to-node mappings from a JSON dictionary
- Ideal for GO annotations exported from standard tools
CSV Annotation¶
annotation = risk.load_annotation_csv(
network=network,
filepath="./data/csv/annotation/go_biological_process.csv",
label_colname="label",
nodes_colname="nodes",
nodes_delimiter=";",
min_nodes_per_term=1,
max_nodes_per_term=10_000,
)
- Columns: one for labels, one for semicolon-separated nodes
- Use for flat structured data
TSV Annotation¶
annotation = risk.load_annotation_tsv(
network=network,
filepath="./data/tsv/annotation/go_biological_process.tsv",
label_colname="label",
nodes_colname="nodes",
nodes_delimiter=";",
min_nodes_per_term=1,
max_nodes_per_term=10_000,
)
- Tab-delimited version of the CSV format
Excel Annotation¶
annotation = risk.load_annotation_excel(
network=network,
filepath="./data/excel/annotation/go_biological_process.xlsx",
label_colname="label",
nodes_colname="nodes",
sheet_name="Sheet1",
nodes_delimiter=";",
min_nodes_per_term=1,
max_nodes_per_term=10_000,
)
- Specify a sheet name to target structured spreadsheets
Dictionary-Based Annotation¶
If you already have a dictionary loaded from another source:
import json
with open("./data/json/annotation/go_biological_process.json") as file:
annotation_dict = json.load(file)
annotation = risk.load_annotation_dict(
network=network,
content=annotation_dict,
min_nodes_per_term=1,
max_nodes_per_term=10_000,
)
Use this method to work with annotations already in memory.
Next Step¶
Proceed to 4. Statistics to evaluate term overrepresentation.