Graph Transformer Attention Heatmap¶
Run the Graph Transformer Attention Heatmap MicroSim Fullscreen
Edit in the p5.js Editor
About This MicroSim¶
Standard GNNs restrict attention to immediate neighbors. Graph transformers like GPS (General Powerful Scalable GNN) pair a local message-passing module with a global attention module, letting every node directly attend to every other node in a single layer.
This MicroSim shows a molecular graph with an attention heatmap overlay. Click a query node and see, for both MPNN-local and transformer-global attention, which atoms receive the highest attention weights. As GPS layers increase, global attention reaches farther across the molecule.
Learning objective (Bloom's Analyze (Level 4)): Compare strictly-local MPNN attention with global multi-head self-attention on a molecular graph, and watch how GPS layers let attention spread from local to long-range.
How to Use¶
- Select a query node — click any atom in the molecular graph.
- Toggle mode — switch between "MPNN (local)" and "Transformer (global)" attention.
- Layer depth slider — increase GPS layers (1–4) to see how global attention coverage expands.
- Heatmap — edge color and width encode attention weight from the query to each other atom.
- Head selector — view each of the 4 attention heads separately.
Iframe Embed Code¶
You can embed this MicroSim in any web page with the following HTML:
<iframe src="https://AnvithPothula.github.io/graph-neural-networks-textbook/sims/ch11-graph-transformer-attention/main.html"
height="522"
width="100%"
scrolling="no"></iframe>
Lesson Plan¶
Grade Level¶
Undergraduate / Graduate (College Level)
Duration¶
20–30 minutes
Prerequisites¶
GCN and GAT (Chapters 6–7). Self-attention in transformers (basic familiarity). Over-squashing (Chapter 8).
Activities¶
- On the molecular graph, pick a terminal atom (degree 1). Under MPNN with 2 layers, which atoms can it see? Under global attention, how many can it attend to directly?
- At GPS depth 1, compare the local (MPNN) and global (attention) attention patterns for the same query node. Which atoms differ most?
- Explain how Graphormer encodes shortest-path distance as a bias in the attention score.
Assessment Question¶
Describe the GPS architecture: what are the local and global modules, and how are their outputs combined? Explain why this hybrid outperforms pure MPNN or pure transformer on graph tasks.
References¶
- Rampášek et al. (2022). Recipe for a General, Powerful, Scalable Graph Transformer. NeurIPS.
- Ying et al. (2021). Do Transformers Really Perform Badly for Graph Representation? (Graphormer). NeurIPS.
Part of Chapter 11: Graph Transformers. Return to the chapter page or browse all MicroSims.