160 Chapter 5
For a graceful introduction to both, we warmly recommend the book from
Hamilton (2020). While it is somewhat dated in the specific GNN models it
studies, it provides a gentle and intuitive overview of computing important
metrics on a graph, as well as many relevant models that preceded graph neural
networks (such as node embedding techniques).
Then, should you wish to dive deeper into the many exciting facets of
graph machine learning, the book of L. Wu et al. (2022) is an excellent and
comprehensive reference which still holds up very well today.
In parallel with diving deeper into GNNs, or even before doing so, we
believe it is worthwhile to gain a strong foundation in graph theory. This will
be of significant value: both in gaining important intuitions for understand-
ing GNNs, and for inspiration on ways to further improve them. For this, we
recommend the foundational text from Bollobás (1998).
Exercises
1. A linear permutation-equivariant function on a set is a function of the form F(X) =
FX satisfying FPX = PFX for every permutation matrix P. Show that any lin-
ear permutation-equivariant function can be written as a linear combination of the
identity and average functions, i.e. FX = (I + 11
⊤
)X.
2. Let X be a feature space containing a countable set of possible values. This means
that every possible set of values coming from X can be represented by a bitmask in
B
X
(where B = {0, 1} is the set of bits), and any permutation-invariant real function
over such a set must be representable as f : B
X
→R. Show that such a function can
always be expressed as a Deep Sets model, i.e., f (X) = ϕ
P
i∈V
ψ
(
x
i
)
, for some
choice of ψ : X →R and ϕ : R →R, for every set V ∈B
X
.
3. Consider the recipe for building permutation-equivariant graph neural networks
F(X, A) from Equation 5.53. Recall how it leverages a local function ϕ(x
u
, X
N
u
),
applied to the neighbourhood of each node u ∈V in parallel. Prove that, if ϕ is per-
mutation invariant, that is, ϕ(x
u
, PX
N
u
) = ϕ(x
u
, X
N
u
), then F will be permutation
equivariant, that is, F(PX, PAP
⊤
) = PF(X, A), for every permutation matrix P.
(Note that, when N
u
⊂V, the space of P differs across the two equations.)
4. A popular way to implement attentional GNNs computes attentional logits as
a(x
u
, x
v
) = σ
a
⊤
W[x
u
∥x
v
]
, where a : R
l
, W : R
l×2k
are the parameters, ∥: R
n
×
R
m
→R
n+m
is vector concatenation, and σ : R →R is a monotonic activation func-
tion. Another popular variant applies two of these operations in reverse: a(x
u
, x
v
) =
a
⊤
σ
W[x
u
∥x
v
]
. Explain, using a proof, which of the two variants is more expres-
sive (Hint: consider the ranking between attentional logits across all the nodes in
the neighbourhood). Which of the two forms is more scalable, and why? (Note that,
as the softmax function is monotonic, you may ignore it for this question.)
5. Recall that subgraph GNNs should be equivariant to G = S
n
×S
m
, where S
n
is the
n-element permutation group, n is the number of nodes in the graph and m is the