CRAG: Causal Reasoning for Adversomics Graphs

CRAG (Causal Reasoning for Adversomics Graphs) is a dual-encoder model for extracting adverse drug event (ADE) relationships from clinical narratives.

Model Description

CRAG uses a dual-encoder architecture with:

Two separate BioLinkBERT encoders: One for drug mentions, one for adverse event mentions
mean pooling: Sequence representation aggregation
Bilinear fusion: Captures complex drug-ADR interactions
Multi-view concatenation: Combines bilinear output, individual embeddings, and element-wise products

Training Approach

The model is trained in two phases:

Phase 1 - Contrastive Pre-training: InfoNCE loss with hard negative mining to learn discriminative drug-ADR embeddings
Phase 2 - Classification Fine-tuning: Focal loss to handle class imbalance and refine the classifier

Performance

Test Set Results

Metric	Score
F1 Score	0.9264705882352942
Precision	0.9163636363636364
Recall	0.9368029739776952
AUC-ROC	0.9656523724050712

Usage

from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("michiyasunaga/BioLinkBERT-base")
tokenizer.add_special_tokens({"additional_special_tokens": ["[DRUG]", "[/DRUG]", "[ADR]", "[/ADR]"]})

# Download and load the model
from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(repo_id="chrisvoncsefalvay/CRAG-dual-encoder-mimicause", filename="pytorch_model.pt")
state_dict = torch.load(model_path, map_location="cpu")