References: Graph Transformers¶

Transformer (machine learning model) - Wikipedia - Covers the original Transformer architecture including multi-head self-attention, positional encodings, and the encoder-decoder structure. Provides essential background on the base mechanism that Graph Transformers adapt for graph-structured data.
Graph neural network - Wikipedia - Broad overview of the GNN family including message-passing frameworks, expressiveness limitations relative to the Weisfeiler–Leman test, and key application domains. Contextualizes where Graph Transformers fit within the wider GNN landscape.
Spectral graph theory - Wikipedia - Explains the graph Laplacian, its eigendecomposition, and the relationship between spectral properties and graph structure. Directly underpins Laplacian positional encodings (LapPE) used in SAN, GPS, and related Graph Transformer architectures.
Deep Learning on Graphs - Ma, Y. and Tang, J. - Cambridge University Press - A graduate-level textbook covering spectral and spatial GNNs, scalability, and applications; Chapter 7 discusses attention mechanisms on graphs and provides mathematical grounding for Graph Transformer design choices.
Graph Representation Learning - Hamilton, W. L. - Morgan & Claypool (Synthesis Lectures on AI and ML) - Concise treatment of node embeddings, GNN expressiveness, and the limitations of local aggregation. Provides theoretical foundations (Weisfeiler–Leman hierarchy, spectral convolutions) that motivate the move to global self-attention in Graph Transformers.
Ying, C., et al. "Do Transformers Really Perform Bad for Graph Representation?" (Graphormer) - arXiv - Introduces Graphormer, which incorporates centrality encoding, spatial encoding, and edge encoding into standard Transformer attention. Won the OGB-LSC molecular property prediction track at NeurIPS 2021, establishing Graph Transformers as competitive with GNNs on real benchmarks.
Rampasek, L., et al. "Recipe for a General, Powerful, Scalable Graph Transformer" (GPS) - arXiv - Proposes the GPS framework combining local MPNN layers with global self-attention and flexible positional/structural encodings. Systematic ablations across 11 datasets make this the primary reference for understanding when and why the hybrid design outperforms either component alone.
Kreuzer, D., et al. "Rethinking Graph Transformers with Spectral Attention" (SAN) - arXiv - Introduces SAN, which uses the full Laplacian eigenvector basis and separate attention for connected vs. disconnected pairs. Proves that spectral positional encodings provably improve expressiveness beyond 1-WL, providing theoretical justification for LapPE.
PyTorch Geometric — GraphGPS and Positional Encodings documentation - PyTorch Geometric Docs - Official API reference for the GPS model implementation in PyG, including configuration of MPNN type, attention type, and positional encoding modules (LapPE, RWSE). The practical starting point for running GPS experiments.
Papers With Code — Graph Transformer benchmark leaderboards - Papers With Code - Aggregates published results for Graph Transformer variants across molecular, social, and citation benchmarks with links to code repositories. Useful for tracking state-of-the-art performance and comparing Graphormer, GPS, SAN, GRIT, and Exphormer on standardized splits.