7
Group Convolution on Homogeneous Spaces
Convolutional neural networks rely on matching input features with
appropriately-transformed filter parameters; this idea extends well
beyond shift-equivariance on grids.
We can define a rich family of group convolutions which apply a more
general filter transformation, followed by an inner product with the
features over the domain—so long as the input domain is homogeneous.
For shift-equivariant functions on grids, the domain and group have
identical structure—constructing the more general case requires carefully
keeping track of the structure within the group.
This allows us to construct convolutional neural networks over spheres,
DNA sequences, 3D medical scans, and many more domains.
Our discussion of grids highlighted how shifts and convolutions are inti-
mately connected: convolutions are linear shift-equivariant
1
operations, and
vice versa, any shift-equivariant linear operator is a convolution. Furthermore,
shift operators can be jointly diagonalised by the Fourier transform. As it turns
out, this is part of a far larger story: both convolution and the Fourier transform
can be defined for any group of symmetries that we can sum or integrate over.
Consider the Euclidean domain = R. We can understand the convolution
as a pattern matching operation: we match shifted copies of a filter θ(u) with
an input signal x(u). The value of the convolution (x θ)(u) at a point u is the
inner product of the signal x with the filter shifted by u,
(x θ)(u) = x, S
u
θ=
Z
R
x(v)θ(u + v)dv.
2
Note that in this case u is both a point on the domain = R and also an element
of the translation group, which we can identify with the domain itself, G = R.
We will now show how to generalise this construction, by simply replacing the
translation group by another group G acting on .
194 Chapter 7
7.1 Domain
As discussed in Chapter 3, the action of the group G on the domain induces
a representation ρ of G on the space of signals X() via ρ(g)x(u) = x(g
–1
u). In
the above example, G is the translation group whose elements act by shifting
the coordinates, u + v, whereas ρ(g) is the shift operator acting on signals as
(S
v
x)(u) = x(u v). Finally, in order to apply a filter to the signal, we invoke our
assumption of X() being a Hilbert space, with an inner product
x, θ=
Z
x(u)θ(u)du,
3
where we assumed, for the sake of simplicity, scalar-valued signals, X(, R).
Having thus defined how to transform signals and match them with filters, we
can define the group convolution for signals on ,
(x θ)(g) = x, ρ(g)θ=
Z
x(u)θ(g
–1
u)du. (7.96)
Note that x θ takes values on the elements g of our group G rather than points
on the domain . Hence, the next layer, which takes x θ as input, should act
on signals defined on to the group G, a point we will return to shortly.
Just like how the traditional Euclidean convolution is shift-equivariant, the
more general group convolution is G-equivariant. The key observation is that
matching the signal x with a g-transformed filter ρ(g)θ is the same as matching
the inverse transformed signal ρ(g
–1
)x with the untransformed filter θ. Math-
ematically, this can be expressed as x, ρ(g)θ= ρ(g
–1
)x, θ. With this insight,
G-equivariance of the group convolution (Equation 7.96) follows immediately
from its definition and the defining property ρ(h
–1
)ρ(g) = ρ(h
–1
g) of group
representations,
(ρ(h)x θ)(g) = ρ(h)x, ρ(g)θ= x, ρ(h
–1
g)θ= ρ(h)(x θ)(g).
The G-equivariant group convolution can be seen as a “generator” of models: it
can provide a recipe for constructing G-equivariant neural networks given any
suitable G. In order to eventually understand how to ground this into concrete
model equations, we need to look at specific examples.
7.1.1 Grid convolution
The case of shift equivariance over the (one-dimensional) grid we have stud-
ied throughout Chapter 6 is obtained with the choice = Z
n
= {0, , n 1}
and the cyclic shift group G = Z
n
. The group elements in this case are cyclic
shifts of indices, i.e., an element g G can be identified with some u
{0, . . . , n 1} such that g.v = (v u) mod n, whereas the inverse element is
g
–1
.v = (v + u) mod n. Importantly, in this example the elements of the group
Groups 195
Figure 7.1
Left: Cosmic microwave background radiation, captured by the Planck space observa-
tory, is a signal on S
2
. Right: The action of the special orthogonal group, SO(3), on the
sphere, S
2
. Three types of rotation are possible; SO(3) is a three-dimensional manifold.
(shifts) are also elements of the domain (indices). We thus can, with some
abuse of notation, identify the two structures (i.e., = G); our expression for
the group convolution in this case
(x θ)(g) =
n–1
X
v=0
x
v
θ
g
–1
.v
,
leads to the familiar convolution
4
(x θ)
u
=
n–1
X
v=0
x
v
θ
v+u mod n
.
7.1.2 Spherical convolution
Now consider the two-dimensional sphere = S
2
with the group of rota-
tions, the special orthogonal group G = SO(3). While chosen for pedagogical
reasons, this example is actually very practical and arises in numerous appli-
cations. In astrophysics, for example, observational data often naturally has
spherical geometry (Figure 7.1). The same can be said of any task involv-
ing weather prediction (Lam et al. 2023). Furthermore, spherical symmetries
are very important in applications in chemistry when modeling molecules and
trying to predict their properties, e.g. for the purpose of virtual drug screening.
Representing a point on the sphere as a three-dimensional unit vector u :
u= 1, the action of the group can be represented as a 3 ×3 orthogonal matrix
R with det(R) = 1. The spherical convolution can thus be written as the inner
196 Chapter 7
product between the signal and the rotated filter,
(x θ)(R) =
Z
S
2
x(u)θ(R
–1
u)du.
The first thing to note is than now the group is not identical to the domain:
the group SO(3) is a Lie group that is in fact a three-dimensional manifold,
whereas S
2
is a two-dimensional one. Consequently, in this case, unlike the
previous example, the convolution is a function on SO(3) rather than on .
This has important practical consequences: in our Geometric Deep Learning
blueprint, we concatenate multiple equivariant maps (“layers” in deep learning
jargon) by applying a subsequent operator to the output of the previous one. In
the case of translations, we can apply multiple convolutions in sequence, since
their outputs are all defined on the same domain . In the general setting,
since x θ is a function on G rather than on , we cannot use exactly the
same operation subsequently—it means that the next operation has to deal with
signals on G, i.e. x X(G). Our definition of group convolution allows this
case: we take as domain = G acted on by G itself via the group action (g, h) 7→
gh defined by the composition operation of G. This yields the representation
ρ(g) acting on x X(G) by (ρ(g)x)(h) = x(g
–1
h)
5
. Just like before, the inner
product is defined by integrating the point-wise product of the signal and the
filter over the domain, which now equals = G. In our example of spherical
convolution, a second layer of convolution would thus have the form
((x θ) ϕ)(R) =
Z
SO(3)
(x θ)(Q)ϕ(R
–1
Q)dQ.
7.1.3 Limitations
Since convolutions involve inner products, that in turn require integrating over
the domain , we can only use it on domains that are small (in the discrete
case) or low-dimensional (in the continuous case).
For instance, we can use convolutions on the plane R
2
(two dimensional)
or special orthogonal group SE(3) (three dimensional), or on the finite set of
nodes of a graph (n-dimensional). It might then be tempting to construct a
highly expressive graph neural network by performing this kind of convolution
directly on the group of permutations S
n
. We can, for example, first transform
features from a set of nodes V into S
n
:
(x θ)(σ) =
X
u∈V
x(u)θ(σ
–1
(u)), (7.97)
and then continue transforming on S
n
as follows:
((x θ) ϕ)(σ) =
X
σ
S
n
(x θ)(σ
)ϕ(σ
–1
σ
), (7.98)
Groups 197
θ
1
θ
3
θ
2
ϕ
312
ϕ
132
ϕ
321
ϕ
231
ϕ
123
ϕ
213
Input nodes, V
u
1
u
2
u
3
S
3
(Lifted)
σ
(123)
σ
(213)
σ
(132)
σ
(312)
σ
(231)
σ
(321)
S
3
(Convolved)
σ
(123)
σ
(213)
σ
(132)
σ
(312)
σ
(231)
σ
(321)
Figure 7.2
Leveraging the group convolutional framework to construct a learnable S
3
-equivariant
transformation over three-node graphs, per Equations 7.97–7.98. The first layer maps
node features in X(V) directly to permutation features in X(S
3
), via parameters θ : V
R; the second layer maps X(S
3
) X(S
3
) via parameters ϕ : S
3
R. While elegant
and spanning a rich class of permutation-equivariant graph models, it quickly becomes
unwieldy at larger |V|, and does not easily transfer across graphs of different sizes.
where σ, σ
S
n
are permutations and is permutation composition (see
Figure 7.2 for an example over three nodes and six permutations).
However, in practice, such a layer cannot be constructed on any but the
smallest of graphs, because the permutation group S
n
has n! elements, and
the layer above requires storing a feature vector, (x θ)(σ), for each of those
elements. While such constructions are indeed impractical, it is interesting to
ponder that there exists an extremely rich family of permutation-equivariant
models acting directly on the permutation group in this way, encompassing
layers of possibly significant amounts of expressive power.
Similarly, integrating over higher-dimensional groups like the affine group
(containing translations, rotations, shearing and scaling, for a total of 6 dimen-
sions) is not feasible in practice. Nevertheless, as we have seen in Chapter 5,
we can still build equivariant convolutions for large groups G by working with
signals defined on low-dimensional spaces on which G acts. Indeed, it is
possible to show that any equivariant linear map f : X() X(
) between
198 Chapter 7
two domains ,
can be written as a generalised convolution similar to the
group convolution discussed here.
Second, we note that the Fourier transform we derived in Chapter 6 from the
shift-equivariance property of the convolution can also be extended to a more
general case by projecting the signal onto the matrix elements of irreducible
representations of G.
Finally, we point to the assumption that has so far underpinned our discus-
sion in this Chapter: whether was a grid, plane, or the sphere, we could
transform every point into any other point, intuitively meaning that all the
points on the domain “look the same. A domain with such property is
called a homogeneous space, where for any u, v there exists g G such
that g.u = v
6
. In future Chapters, we will try to relax this assumption.
7.2 Model
As discussed thus far, we can generalise the convolution operation from signals
on a Euclidean space to signals on any homogeneous space acted upon by
a group G. By analogy to the Euclidean convolution, where a translated filter
is matched with the signal, the idea of group convolution is to move the filter
around the domain using the group action, e.g. by rotating and translating. By
virtue of the transitivity of the group action, we can move the filter to any
position on . In this section, we will discuss several concrete examples of
the general idea of group convolution, including implementation aspects and
architectural choices.
7.2.1 Discrete group convolution
We begin by considering the case where the domain as well as the group
G are discrete. As our first example, we consider medical volumetric images
represented as signals of on 3D grids with discrete translation and rotation
symmetries. The domain is the 3D cubical grid = Z
3
and the images (e.g.
MRI or CT 3D scans) are modelled as functions x : Z
3
R, i.e. x X().
Although, in practice, such images have support on a finite cuboid [W] ×[H] ×
[D] Z
3
, we instead prefer to view them as functions on Z
3
with appropri-
ate zero padding. As our symmetry, we consider the group G = Z
3
O
h
of
distance- and orientation-preserving transformations on Z
3
. This group con-
sists of translations (Z
3
) and the discrete rotations O
h
generated by 90 degree
rotations about the three axes (see Figure 7.3).
As our second example, we consider DNA
7
sequences made up of four let-
ters: C, G, A, and T. The sequences can be represented on the 1D grid = Z
as signals x : Z R
4
, where each letter is one-hot coded in R
4
. Naturally, we
have a discrete 1D translation symmetry on the grid, but DNA sequences have
Groups 199
Figure 7.3
A 3 ×3 filter, rotated by all 24 elements of the discrete rotation group O
h
, generated by
90-degree rotations about the vertical axis (red arrows), and 120-degree rotations about
a diagonal axis (blue arrows).
an additional interesting symmetry. This symmetry arises from the way DNA
is physically embodied as a double helix, and the way it is read by the molec-
ular machinery of the cell. Each strand of the double helix begins with what
is called the 5
-end and ends with a 3
-end, with the 5
on one strand com-
plemented by a 3
on the other strand. In other words, the two strands have
an opposite orientation. Since the DNA molecule is always read off starting
at the 5
-end, but we do not know which one, a sequence such as ACC-
CTGG is equivalent to the reversed sequence with each letter replaced by its
complement, CCAGGGT. This is called reverse-complement symmetry of the
letter sequence, and is depicted in Figure 7.4. We thus have the two-element
group Z
2
= {0, 1} corresponding to the identity 0 and reverse-complement
transformation 1 (and composition 1 + 1 = 0 mod 2). The full group combines
translations and reverse-complement transformations.
In this discrete case, the previously defined group convolution (Equation
7.96) is given as the following inner product:
(x θ)(g) =
X
u
x
u
ρ(g)θ
u
, (7.99)
between the (single-channel) input signal x and a filter θ transformed by g G
via ρ(g)θ
u
= θ
g
–1
u
, and the output x θ is a function on G. Note that, since is
discrete, we have replaced the integral from Equation 7.96 by a sum.
200 Chapter 7
3’
5’ 5’
3’
C
G
A
T
T
A
T
A
C
G
C
G
T
A
T
A
G
C
G
C
G
C
T
A
Figure 7.4
A schematic of the DNAs double helix structure, with the two strands coloured in blue
and red. Note how the sequences in the helices are complementary and read in reverse
(from 5’ to 3’).
7.2.2 Transform + Convolve approach
We will show that the discrete group convolution can be implemented in two
steps: a filter transformation step, and a translational convolution step. The
filter transformation step consists of creating rotated (or reverse-complement
transformed) copies of a basic filter, while the translational convolution is the
same as in standard CNNs and thus efficiently computable on hardware such
as GPUs. To see this, note that, in both of our examples, we can write a general
transformation g G as a transformation h H (e.g. a rotation or reverse-
complement transformation) followed by a translation k Z
d
, i.e. g = kh (with
juxtaposition denoting the composition of the group elements k and h). By
properties of the group representation, we have ρ(g) = ρ(kh) = ρ(k)ρ(h). Thus,
(x θ)(kh) =
X
u
x
u
ρ(k)ρ(h)θ
u
=
X
u
x
u
(ρ(h)θ)
uk
(7.100)
We recognise the last equation as the standard (planar Euclidean) convolu-
tion of the signal x and the transformed filter ρ(h)θ. Thus, to implement
group convolution for these groups, we take the canonical filter θ, create
transformed copies θ
h
= ρ(h)θ for each h H (e.g. each rotation h O
h
or
reverse-complement DNA symmetry h Z
2
), and then convolve x with each of
these filters: (x θ)(kh) = (x θ
h
)(k). For both of our examples, the symmetries
act on filters by simply permuting the filter coefficients, as shown in Figure 7.3
for discrete rotations. Hence, these operations can be implemented efficiently
using an indexing operation with pre-computed indices.
While we defined the feature maps that are produced by the group convo-
lution x θ as functions on G, the fact that we can split any g G into g = hk
means that we can also think of them as a stack of Euclidean feature maps
Groups 201
(sometimes called orientation channels), with one feature map per filter trans-
formation / orientation k. For instance, in our first example we would associate
to each filter rotation (each node in Figure 7.3) a feature map, which is obtained
by convolving (in the traditional translational sense) the rotated filter. These
feature maps can thus still be stored as a W ×H ×C array, where the number
of channels C equals the number of independent filters times the number of
transformations h H (e.g. rotations).
As previously shown, the group convolution is equivariant: (ρ(g)x) θ =
ρ(g)(x θ). What this means in terms of orientation channels is that, under the
action of h, each orientation channel is transformed, and the orientation chan-
nels themselves are permuted. For instance, if we associate one orientation
channel per transformation in Figure 7.3 and apply a rotation by 90 degrees
about the z-axis (corresponding to the red arrows), the feature maps will be
permuted as shown by the red arrows.
This description makes it clear that a group convolutional neural network
bears much similarity to a traditional CNN. Hence, many of the network design
patterns discussed in Chapter 6, such as residual networks, can be used with
group convolutions as well.
7.2.3 Spherical CNNs via the Fourier domain
For the continuous symmetry group of the sphere that we saw in Section 7.1.2,
it is possible to efficiently implement the convolution in the spectral domain,
using the appropriate Fourier transform (we remind the reader that the con-
volution on S
2
is a function on SO(3), hence we need to define the Fourier
transform on both these domains in order to implement multi-layer spherical
CNNs). Spherical harmonics are an orthogonal basis on the 2D sphere, anal-
ogous to the classical Fourier basis of complex exponential. On the special
orthogonal group, the Fourier basis is known as the Wigner D-functions. Both
of these bases find wide applications in quantum mechanics and chemistry.
In both cases, the Fourier transforms (coefficients) are computed as the inner
product with the basis functions, and an analogy of the Convolution Theorem
holds: one can compute the convolution in the Fourier domain as the element-
wise product of the Fourier transforms. Furthermore, FFT-like algorithms exist
for the efficient computation of the Fourier transform on S
2
and SO(3). Using
this approach, Cohen et al. (2018) were able to build the first efficient spherical
CNN; we defer to their paper for specific details.
7.3 Case Study: TacticAI
As might be expected, group convolutional networks manifest in a wide variety
of interesting domains, but perhaps the reader might still find it surprising that
202 Chapter 7
5
6
4
1
3
2
5
6
4
1
3
2
5
6
4
1
3
2
5
6
4
1
3
2
5
6
4
1
3
2
e
↔↕
x
4
x
5
x
6
x
1
x
2
x
3
x
4
x
5
x
6
x
1
x
2
x
3
x
6
x
5
x
4
x
3
x
2
x
1
x
6
x
5
x
4
x
3
x
2
x
1
X
e
X
X
X
↔↕
G
X
e
X
G
X
X
e
G
X
X
↔↕
G
X
↔↕
X
H
Figure 7.5
Illustration of a single layer of TacticAI’s group-equivariant neural network. For a given
corner kick situation (here visualised using only six players for clarity), TacticAI first
generates four views, corresponding to all four possible corner kick locations (e.g., X
for the vertically reflected corner). Then, a graph neural network, G, processes carefully
chosen pairs of views, in a way that follows the equivariance constraint. Lastly, all of
the computed view-pair representations are aggregated to produce latent view features
(e.g. H
for the vertically-reflected view).
they have seen notable use in association football
8
analytics. To round off
the groups Chapter, we provide a targeted overview of the resulting TacticAI
method, developed in collaboration with Liverpool FC (Wang et al. 2024).
Rather than surveying the entire architecture and the tasks it was deployed on,
we particularly focus on its group-equivariant aspects (see also Figure 7.5),
along with meaningful context on why these aspects were called for.
Problem setup TacticAI is a system capable of predictive and generative
modelling over various tactical setups in football. The tactical setup input is
represented as a graph of 22 nodes—one for each player—with each node’s
features, x
u
R
k
, comprising both spatial (current position and velocity) and
physical (player height, weight and ball possession) information about the cor-
responding player. It is assumed that all pairs of players are connected to each
other, allowing the model to infer the most important connections automati-
cally (as we discussed in Chapter 5). The system needs to then either predict
Groups 203
1 2
34
e
4 3
21
2 1
43
3 4
12
↔↕
Figure 7.6
The Cayley graph of the dihedral group D
2
= {e, , , ↔↕}, organising all of its indi-
vidual elements and their action on a domain spanning the four corners of a rectangle.
Note that the arrows are bidirectional, since each D
2
group action is its own inverse.
Further, note that composing both horizontal and vertical reflections is equivalent to a
180
rotation.
the outcome of a future event (e.g. who will make next contact with the ball—a
node classification task, or will a shot be taken—a graph classification task) or
generate novel setups (that, e.g., modulate the likelihood a shot will be taken).
As it is often tricky to meaningfully influence open-play tactics in football,
TacticAI focusses all of its attention on modelling outcomes in set pieces,
during which the game is effectively frozen. Specifically, corner kick setups
were targeted, as they occur reasonably frequently, start from a rigid posi-
tion, and offer immediate goal-scoring potential. Further, corner kick tactics
are often determined well in advance of individual matches, allowing for a
cleaner window for TacticAI to influence coaching decisions.
An unexpected symmetry The decision to focus on corner kicks also brings
with it an undesirable tradeoff—they are simply not extremely frequent, lead-
ing to relatively modest dataset sizes. Indeed, even though TacticAI was trained
over several seasons’ worth of Premier League data, only 9, 693 unique train-
ing examples were extractable from this data—a far cry from the large scale
datasets used for many of the case studies previously discussed in this book.
Further coupled with the scarcity of certain target events (such as shots, which
are relatively rare), the quantity of useful signal is significantly reduced.
204 Chapter 7
At this level of data availability, exploiting symmetries arises as a natural
approach for optimising the extent to which the provided data is utilised. That
said, it was not immediately obvious where such symmetries could come from.
Owing to a direct suggestion from the Liverpool FC collaborators
9
, it was con-
cluded that, if one varies the specific corner in which the set piece is being
taken (out of the four possible corners of the pitch), the outcomes would remain
approximately equivariant.
The symmetry group which governs changing corner positions in a way that
preserves the relative positioning of all other points is the dihedral group
10
, G=
D
2
. This group comprises only four elements: D
2
= {e, , , ↔↕}, enumerat-
ing four possible transformations: identity, vertical flip, horizontal flip, vertical
and horizontal flip (180
rotation). It is fully described by Figure 7.6; note
that each operation is its own inverse, i.e., g = g
–1
for all g D
2
. Accordingly,
TacticAI’s neural network layers are designed to be D
2
-equivariant.
Note that, in the context of football, D
2
equivariance is not exact—reflecting
a particular corner kick may not result in exactly the same outcomes. Among
other factors, many corner-kick takers tend to have a preferred foot, which
directly impacts the precision with which they can cross the ball into the
penalty box. However, the data efficiency benefits of constraining a model into
outputting D
2
-equivariant predictions may outweigh any inaccuracies in those
predictions—an assumption that turned out to be true
11
.
Constructing D
2
-equivariant GNNs Since D
2
is a small, discrete group,
the blueprint of group convolutions we described throughout this Chapter per-
fectly applies. Firstly, let us assume we have at our disposal a graph neural
network (GNN) layer
12
, G(X), that can operate over input node features, X
(note we omit the adjacency matrix here for brevity, as it will always remain
fully connected). Our D
2
-equivariant GNN will carefully distribute G across
various views into a corner kick tactical setup, in a way that preserves the D
2
symmetry in the outputs (as pointed out by Figure 7.5).
Once we have our input features, X R
22×k
, we first “lift” them into the D
2
space by generating all four transformed views: X
g
= Xρ(g) for g D
2
13
. The
corresponding representation matrices ρ(g) R
k×k
are simple to construct if
our spatial features are zero-centered around the pitch center—we simply need
to flip the sign of all columns of X corresponding to the axes being reflected.
For example, over vertical flips this amounts to the following diagonal matrix:
ρ
ij
=
1 i = j i is not a y-axis feature
–1 i = j i is a y-axis feature
0 i ̸= j
Groups 205
Now, we can follow Equation 7.96 to define our group-convolutional layer:
H
g
=
1
4
X
hD
2
G
X
h
X
g
–1
h
wherein we’ve replaced the (inner) product with our GNN layer G applied
over concatenated features. This layer yields latent representations H
g
R
22×l
for all views g D
2
, and additional such layers may be easily stacked.
Final predictions may be obtained through either frame averaging (H =
H
e
+H
+H
+H
↔↕
4
) or retrieving H
e
, depending on whether an invariant or
equivariant prediction is required.
This approach has successfully delivered on its promise in the low-data
set-piece analytics domain: for receiver and shot prediction, it improved the
baseline GNN’s predictive power by over 5%; a comparable jump was obtained
by leveraging a graph structure in the first place (compared to using a Deep
Sets model). Accordingly, D
2
equivariance became one of the uniquely notable
features of TacticAI.
Suggested Further Reading
We have presented the key ingredients that are relevant for constructing arbi-
trary G-equivariant networks over homogeneous domains including the
all-important G-equivariant networks over G itself! as well as several rele-
vant examples. However, for reasons of brevity, our book deliberately does not
dive into how all of the various components of these architectures fit together
more broadly, nor does it detailedly survey how they historically emerged.
The earliest reference to group CNNs, explicitly extending CNNs to han-
dle 90-degree rotations, was provided concurrently by T. Cohen and Welling
(2016) and Dieleman, Fauw, and Kavukcuoglu (2016). Making these models
more efficient by leveraging irreducible representations has yielded a more
generic class of steerable CNNs, for which we recommend the paper by Taco
S. Cohen and Welling (2017b).
In the years following these papers, a wide range of architectures incorporat-
ing group or steerable convolutions have been produced, and these have been
thoroughly synthesised (with a unifying theory of equivariance over homoge-
neous spaces) by Cohen, Geiger, and Weiler (2019). We warmly recommend
this paper, not only for its synthesis efforts, but as an excellent collection of
references to other influential works in this space.
Building on these foundational ideas, subsequent research has focused on
scaling these architectures and extending them to continuous domains. For a
highly practical and definitive realization of general steerable convolutions—
which has become a standard reference for modern implementations—we
206 Chapter 7
point the reader to the work of Weiler and Cesa (2019b). Furthermore, while
our text remains focused on standard domains and avoids the broader topic of
Lie groups, readers interested in extending equivariance to arbitrary continu-
ous groups operating on irregular data, such as point clouds, will find a critical
milestone in the LieConv architecture proposed by Finzi et al. (2020).
Exercises
1. Consider a homogeneous domain , and a symmetry group G acting on it. Given
any element u , we can form the stabiliser subgroup G
u
= {g | g.u = u}, consisting
of those elements of G that leave u invariant.
(i) Show that the stabiliser G
u
is indeed a subgroup of G.
(ii) Let G = O(3) be the group of orthogonal 3 ×3 matrices (AA
= I, i.e., the group
of continuous 3D rotations and reflections) acting on the sphere = S
2
. What is the
stabiliser of an arbitrary point u (e.g., the north pole)?
(iii) Let G = S
n
be the group of permutations of n elements, = {1, 2, . . . , n}. What is the
stabiliser of an arbitrary point u ?
(iv) Let u, v be two elements on the domain. By homogeneity of , we can find a
g G such that g.u = v. Show that, if h G
v
, then g
–1
hg G
u
.
(v) From the previous part, it follows that there exists a mapping f : G
v
G
u
defined by
f (h) = g
–1
hg. Provide an inverse of f , i.e., a map f
–1
: G
u
G
v
.
(vi) Show that f (hh
) = f (h)f (h
) for all h, h
G
v
; i.e., that f is a group homomor-
phism. This, coupled with the existence of f
–1
, implies that any two points u, v of a
homogeneous space have isomorphic stabiliser subgroups.
2. Let G= S
n
be the group of permutations acting on a set of n elements = {1, . . . , n}.
Let S
n–1
be the subgroup of S
n
that leaves invariant the element 1, i.e. g.1 = 1 for
all g S
n–1
. We will study the cosets gS
n–1
= {gh | h S
n–1
} (where g S
n
) and the
quotient S
n
/S
n–1
= {gS
n–1
| g S
n
} (the set of cosets).
(i) What is the size, N, of S
n
/S
n–1
, i.e., how many cosets are there?
(ii) Characterise the cosets gS
n–1
, i.e., find a property that all elements of a given coset
have, and that no element outside of the coset has. In the following parts, we will
leverage this to identify S
n
/S
n–1
with the set {1, . . . , N} in the natural way.
(iii) Consider the map π : S
n
S
n
/S
n–1
defined by π(g) = gS
n–1
. What are the fibers, F
u
=
π
–1
(u), where u S
n
/S
n–1
is a coset? (Note: in this context, π is a fiber bundle.)
(iv) Consider the right action of S
n–1
on S
n
, defined as g.h = gh, for g S
n
, h S
n–1
. Show
that this action preserves fibers, i.e., π(g.h) = π(g) for all g S
n
, h S
n–1
.
(v) Additionally, show that this right action is transitive on the fibers, i.e., for any g, g
π
–1
(u), there exists h S
n–1
such that g.h = g
.
(vi) Additionally, show that this right action is fixed-point free. That is, if g.h = g for some
g S
n
and h S
n–1
, then h = e (the identity).
Groups 207
(vii) A section of the bundle π : S
n
S
n
/S
n–1
is a map s : S
n
/S
n–1
S
n
that satisfies π s =
id
S
n
/S
n–1
, i.e., π(s(u)) = u for all u S
n
/S
n–1
. Define such a section. (Note: a section of a
principal bundle is also known as a gauge.)
(viii) Let f : R represent a scalar feature assigned to each element (e.g. a node in a
graph). A permutation-equivariant linear layer over these features is defined by a kernel
K : × R such that K(g.u, g.v) = K(u, v) for all g S
n
. By considering the action
of S
n–1
and the results in previous parts, prove K can take at most two distinct values.
(ix) Let x R
n
be the vector of scalar features for all n elements of . Write down the
most general linear transformation y = Wx that is permutation-equivariant, expressing
the matrix W in terms of the n ×n identity matrix I and the all-ones matrix J. Based on
this formulation, show that computing the updated features y R
n
requires only O(n)
operations, avoiding the O(n!) complexity of lifting the features into the full group S
n
.
(x) If we allow the node update to be a general non-linear function rather than a linear
kernel, we specify a permutation-equivariant map F : R
n
R
n
. By leveraging a similar
argument as in Part (viii), prove that the output for node i must take the form y
i
=
ϕ(x
i
, X
i
), where X
i
is the unordered multiset of all other node features, and ϕ is
some non-linear function that is permutation-invariant in its second argument. Note
this formulation reflects the Deep Sets architecture (Zaheer et al. 2017).
Notes
1
Technically, we need the group to be locally compact, so that there exists a
left-invariant Haar measure. Integrating with respect to this measure, we can
“shift” the integrand by any group element and obtain the same result, just as
how we have
Z
+
x(u)du =
Z
+
x(u v)du
for functions x : R R.
2
Note that what we define here is not convolution but cross-correlation, which
is tacitly used in deep learning under the name ‘convolution’. We do it for
consistency with the following discussion, since in our notation (ρ(g)x)(u) =
x(u v) and (ρ(g
–1
)x)(u) = x(u + v).
3
The integration is done w.r.t. an invariant measure µ on . In case µ is
discrete, this means summing over .
4
Actually here again, this is cross-correlation.
5
Recall that the representation of G acting on functions defined on G itself is
called the regular representation of G.
6
The additional properties, e.u = u and g(h.u) = (gh).u, are tacitly assumed.
7
DNA is a biopolymer molecule made of four repeating units called
nucleotides (Cytosine, Guanine, Adenine, and Thymine), arranged into two
strands coiled around each other in a double helix, where each nucleotide
occurs opposite of the complementary one (base pairs A/T and C/G).
208 Chapter 7
8
Also occasionally referred to as “soccer”; a name that we disagree with.
9
LFC’s research team was, in fact, very well positioned to propose and
improve equivariant architectures, as many of them come from a physics back-
ground, and they have generally pioneered data-driven approaches to football
analytics over the years; see Graham (2024) for a beautiful, football-centric
account of these successes.
10
The dihedral group D
2
is also isomorphic to the Klein four-group, K
4
, first
described by Felix Klein in 1884. Abstractly, K
4
= {e, a, b, c} is a group of
four elements where each element is self-inverse (gg = e for all g K
4
) and
composing any two non-identities yields the third element (ab = ba = c, ac =
ca = b, bc = cb = a). K
4
has diverse applications, from algebra (where it can be
used to explain the existence of formulae for solving quartic equations), all the
way to music composition (being related to dodecaphony).
11
Generally, about 30% of the players in the Premier League are left-footed
or both-footed, which gives a sufficient amount of players who can effectively
attack a ball in a mirrored capacity. As an example, most teams have a left-
footed left-back and a right-footed right-back who take corners, and it would
be rare to find a team that doesn’t have both a left-footed and right-footed
corner taker. In terms of attacking the corners, headed shots are very common
in corner kick situations, so footedness tends to matter significantly less in
these situations. Also, many shots taken during corners are more reactive from
close range, where foot preference matters less.
12
Specifically, TacticAI uses the GATv2 (Brody, Alon, and Yahav 2022) as
the implementation of G.
13
Note that, unlike many previous examples, here the representation ρ(g) acts
via right-multiplication, as it acts on the node features rather than the nodes
themselves.