Group Convolution on Homogeneous Spaces

•

Convolutional neural networks rely on matching input features with

appropriately-transformed ﬁlter parameters; this idea extends well

beyond shift-equivariance on grids.

•

We can deﬁne a rich family of group convolutions which apply a more

general ﬁlter transformation, followed by an inner product with the

features over the domain—so long as the input domain is homogeneous.

•

For shift-equivariant functions on grids, the domain and group have

identical structure—constructing the more general case requires carefully

keeping track of the structure within the group.

•

This allows us to construct convolutional neural networks over spheres,

DNA sequences, 3D medical scans, and many more domains.

Our discussion of grids highlighted how shifts and convolutions are inti-

mately connected: convolutions are linear shift-equivariant

operations, and

vice versa, any shift-equivariant linear operator is a convolution. Furthermore,

shift operators can be jointly diagonalised by the Fourier transform. As it turns

out, this is part of a far larger story: both convolution and the Fourier transform

can be deﬁned for any group of symmetries that we can sum or integrate over.

Consider the Euclidean domain Ω = R. We can understand the convolution

as a pattern matching operation: we match shifted copies of a ﬁlter θ(u) with

an input signal x(u). The value of the convolution (x ⋆ θ)(u) at a point u is the

inner product of the signal x with the ﬁlter shifted by u,

(x ⋆ θ)(u) = ⟨x, S

θ⟩=

x(v)θ(u + v)dv.

Note that in this case u is both a point on the domain Ω = R and also an element

of the translation group, which we can identify with the domain itself, G = R.

We will now show how to generalise this construction, by simply replacing the

translation group by another group G acting on Ω.

194 Chapter 7

7.1 Domain

As discussed in Chapter 3, the action of the group G on the domain Ω induces

a representation ρ of G on the space of signals X(Ω) via ρ(g)x(u) = x(g

–1

u). In

the above example, G is the translation group whose elements act by shifting

the coordinates, u + v, whereas ρ(g) is the shift operator acting on signals as

x)(u) = x(u – v). Finally, in order to apply a ﬁlter to the signal, we invoke our

assumption of X(Ω) being a Hilbert space, with an inner product

⟨x, θ⟩=

Ω

x(u)θ(u)du,

where we assumed, for the sake of simplicity, scalar-valued signals, X(Ω, R).

Having thus deﬁned how to transform signals and match them with ﬁlters, we

can deﬁne the group convolution for signals on Ω,

(x ⋆ θ)(g) = ⟨x, ρ(g)θ⟩=

Ω

x(u)θ(g

–1

u)du. (7.96)

Note that x ⋆ θ takes values on the elements g of our group G rather than points

on the domain Ω. Hence, the next layer, which takes x ⋆ θ as input, should act

on signals deﬁned on to the group G, a point we will return to shortly.

Just like how the traditional Euclidean convolution is shift-equivariant, the

more general group convolution is G-equivariant. The key observation is that

matching the signal x with a g-transformed ﬁlter ρ(g)θ is the same as matching

the inverse transformed signal ρ(g

–1

)x with the untransformed ﬁlter θ. Math-

ematically, this can be expressed as ⟨x, ρ(g)θ⟩= ⟨ρ(g

–1

)x, θ⟩. With this insight,

G-equivariance of the group convolution (Equation 7.96) follows immediately

from its deﬁnition and the deﬁning property ρ(h

–1

)ρ(g) = ρ(h

–1

g) of group

representations,

(ρ(h)x ⋆ θ)(g) = ⟨ρ(h)x, ρ(g)θ⟩= ⟨x, ρ(h

–1

g)θ⟩= ρ(h)(x ⋆ θ)(g).

The G-equivariant group convolution can be seen as a “generator” of models: it

can provide a recipe for constructing G-equivariant neural networks given any

suitable G. In order to eventually understand how to ground this into concrete

model equations, we need to look at speciﬁc examples.

7.1.1 Grid convolution

The case of shift equivariance over the (one-dimensional) grid we have stud-

ied throughout Chapter 6 is obtained with the choice Ω = Z

= {0, …, n – 1}

and the cyclic shift group G = Z

. The group elements in this case are cyclic

shifts of indices, i.e., an element g ∈G can be identiﬁed with some u ∈

{0, . . . , n – 1} such that g.v = (v – u) mod n, whereas the inverse element is

–1

.v = (v + u) mod n. Importantly, in this example the elements of the group

Groups 195

Figure 7.1

Left: Cosmic microwave background radiation, captured by the Planck space observa-

tory, is a signal on S

. Right: The action of the special orthogonal group, SO(3), on the

sphere, S

. Three types of rotation are possible; SO(3) is a three-dimensional manifold.

(shifts) are also elements of the domain (indices). We thus can, with some

abuse of notation, identify the two structures (i.e., Ω = G); our expression for

the group convolution in this case

(x ⋆ θ)(g) =

n–1

v=0

–1

leads to the familiar convolution

(x ⋆ θ)

n–1

v=0

v+u mod n

7.1.2 Spherical convolution

Now consider the two-dimensional sphere Ω = S

with the group of rota-

tions, the special orthogonal group G = SO(3). While chosen for pedagogical

reasons, this example is actually very practical and arises in numerous appli-

cations. In astrophysics, for example, observational data often naturally has

spherical geometry (Figure 7.1). The same can be said of any task involv-

ing weather prediction (Lam et al. 2023). Furthermore, spherical symmetries

are very important in applications in chemistry when modeling molecules and

trying to predict their properties, e.g. for the purpose of virtual drug screening.

Representing a point on the sphere as a three-dimensional unit vector u :

∥u∥= 1, the action of the group can be represented as a 3 ×3 orthogonal matrix

R with det(R) = 1. The spherical convolution can thus be written as the inner

196 Chapter 7

product between the signal and the rotated ﬁlter,

(x ⋆ θ)(R) =

x(u)θ(R

–1

u)du.

The ﬁrst thing to note is than now the group is not identical to the domain:

the group SO(3) is a Lie group that is in fact a three-dimensional manifold,

whereas S

is a two-dimensional one. Consequently, in this case, unlike the

previous example, the convolution is a function on SO(3) rather than on Ω.

This has important practical consequences: in our Geometric Deep Learning

blueprint, we concatenate multiple equivariant maps (“layers” in deep learning

jargon) by applying a subsequent operator to the output of the previous one. In

the case of translations, we can apply multiple convolutions in sequence, since

their outputs are all deﬁned on the same domain Ω. In the general setting,

since x ⋆ θ is a function on G rather than on Ω, we cannot use exactly the

same operation subsequently—it means that the next operation has to deal with

signals on G, i.e. x ∈X(G). Our deﬁnition of group convolution allows this

case: we take as domain Ω = G acted on by G itself via the group action (g, h) 7→

gh deﬁned by the composition operation of G. This yields the representation

ρ(g) acting on x ∈X(G) by (ρ(g)x)(h) = x(g

–1

. Just like before, the inner

product is deﬁned by integrating the point-wise product of the signal and the

ﬁlter over the domain, which now equals Ω = G. In our example of spherical

convolution, a second layer of convolution would thus have the form

((x ⋆ θ) ⋆ ϕ)(R) =

SO(3)

(x ⋆ θ)(Q)ϕ(R

–1

Q)dQ.

7.1.3 Limitations

Since convolutions involve inner products, that in turn require integrating over

the domain Ω, we can only use it on domains Ω that are small (in the discrete

case) or low-dimensional (in the continuous case).

For instance, we can use convolutions on the plane R

(two dimensional)

or special orthogonal group SE(3) (three dimensional), or on the ﬁnite set of

nodes of a graph (n-dimensional). It might then be tempting to construct a

highly expressive graph neural network by performing this kind of convolution

directly on the group of permutations S

. We can, for example, ﬁrst transform

features from a set of nodes V into S

(x ⋆ θ)(σ) =

u∈V

x(u)θ(σ

–1

(u)), (7.97)

and then continue transforming on S

as follows:

((x ⋆ θ) ⋆ ϕ)(σ) =

′

∈S

(x ⋆ θ)(σ

′

)ϕ(σ

–1

◦σ

′

), (7.98)

Groups 197

312

132

321

231

123

213

Input nodes, V

(Lifted)

(123)

(213)

(132)

(312)

(231)

(321)

(Convolved)

(123)

(213)

(132)

(312)

(231)

(321)

Figure 7.2

Leveraging the group convolutional framework to construct a learnable S

-equivariant

transformation over three-node graphs, per Equations 7.97–7.98. The ﬁrst layer maps

node features in X(V) directly to permutation features in X(S

), via parameters θ : V →

R; the second layer maps X(S

) →X(S

) via parameters ϕ : S

→R. While elegant

and spanning a rich class of permutation-equivariant graph models, it quickly becomes

unwieldy at larger |V|, and does not easily transfer across graphs of different sizes.

where σ, σ

′

∈S

are permutations and ◦ is permutation composition (see

Figure 7.2 for an example over three nodes and six permutations).

However, in practice, such a layer cannot be constructed on any but the

smallest of graphs, because the permutation group S

has n! elements, and

the layer above requires storing a feature vector, (x ⋆ θ)(σ), for each of those

elements. While such constructions are indeed impractical, it is interesting to

ponder that there exists an extremely rich family of permutation-equivariant

models acting directly on the permutation group in this way, encompassing

layers of possibly signiﬁcant amounts of expressive power.

Similarly, integrating over higher-dimensional groups like the afﬁne group

(containing translations, rotations, shearing and scaling, for a total of 6 dimen-

sions) is not feasible in practice. Nevertheless, as we have seen in Chapter 5,

we can still build equivariant convolutions for large groups G by working with

signals deﬁned on low-dimensional spaces Ω on which G acts. Indeed, it is

possible to show that any equivariant linear map f : X(Ω) →X(Ω

′

) between

198 Chapter 7

two domains Ω, Ω

′

can be written as a generalised convolution similar to the

group convolution discussed here.

Second, we note that the Fourier transform we derived in Chapter 6 from the

shift-equivariance property of the convolution can also be extended to a more

general case by projecting the signal onto the matrix elements of irreducible

representations of G.

Finally, we point to the assumption that has so far underpinned our discus-

sion in this Chapter: whether Ω was a grid, plane, or the sphere, we could

transform every point into any other point, intuitively meaning that all the

points on the domain “look the same.” A domain Ω with such property is

called a homogeneous space, where for any u, v ∈Ω there exists g ∈G such

that g.u = v

. In future Chapters, we will try to relax this assumption.

7.2 Model

As discussed thus far, we can generalise the convolution operation from signals

on a Euclidean space to signals on any homogeneous space Ω acted upon by

a group G. By analogy to the Euclidean convolution, where a translated ﬁlter

is matched with the signal, the idea of group convolution is to move the ﬁlter

around the domain using the group action, e.g. by rotating and translating. By

virtue of the transitivity of the group action, we can move the ﬁlter to any

position on Ω. In this section, we will discuss several concrete examples of

the general idea of group convolution, including implementation aspects and

architectural choices.

7.2.1 Discrete group convolution

We begin by considering the case where the domain Ω as well as the group

G are discrete. As our ﬁrst example, we consider medical volumetric images

represented as signals of on 3D grids with discrete translation and rotation

symmetries. The domain is the 3D cubical grid Ω = Z

and the images (e.g.

MRI or CT 3D scans) are modelled as functions x : Z

→R, i.e. x ∈X(Ω).

Although, in practice, such images have support on a ﬁnite cuboid [W] ×[H] ×

[D] ⊂Z

, we instead prefer to view them as functions on Z

with appropri-

ate zero padding. As our symmetry, we consider the group G = Z

⋊ O

distance- and orientation-preserving transformations on Z

. This group con-

sists of translations (Z

) and the discrete rotations O

generated by 90 degree

rotations about the three axes (see Figure 7.3).

As our second example, we consider DNA

sequences made up of four let-

ters: C, G, A, and T. The sequences can be represented on the 1D grid Ω = Z

as signals x : Z →R

, where each letter is one-hot coded in R

. Naturally, we

have a discrete 1D translation symmetry on the grid, but DNA sequences have

Groups 199

Figure 7.3

A 3 ×3 ﬁlter, rotated by all 24 elements of the discrete rotation group O

, generated by

90-degree rotations about the vertical axis (red arrows), and 120-degree rotations about

a diagonal axis (blue arrows).

an additional interesting symmetry. This symmetry arises from the way DNA

is physically embodied as a double helix, and the way it is read by the molec-

ular machinery of the cell. Each strand of the double helix begins with what

is called the 5

′

-end and ends with a 3

′

-end, with the 5

′

on one strand com-

plemented by a 3

′

on the other strand. In other words, the two strands have

an opposite orientation. Since the DNA molecule is always read off starting

at the 5

′

-end, but we do not know which one, a sequence such as ACC-

CTGG is equivalent to the reversed sequence with each letter replaced by its

complement, CCAGGGT. This is called reverse-complement symmetry of the

letter sequence, and is depicted in Figure 7.4. We thus have the two-element

group Z

= {0, 1} corresponding to the identity 0 and reverse-complement

transformation 1 (and composition 1 + 1 = 0 mod 2). The full group combines

translations and reverse-complement transformations.

In this discrete case, the previously deﬁned group convolution (Equation

7.96) is given as the following inner product:

(x ⋆ θ)(g) =

u∈Ω

ρ(g)θ

, (7.99)

between the (single-channel) input signal x and a ﬁlter θ transformed by g ∈G

via ρ(g)θ

= θ

–1

, and the output x ⋆ θ is a function on G. Note that, since Ω is

discrete, we have replaced the integral from Equation 7.96 by a sum.

200 Chapter 7

3’

5’ 5’

3’

Figure 7.4

A schematic of the DNA’s double helix structure, with the two strands coloured in blue

and red. Note how the sequences in the helices are complementary and read in reverse

(from 5’ to 3’).

7.2.2 Transform + Convolve approach

We will show that the discrete group convolution can be implemented in two

steps: a ﬁlter transformation step, and a translational convolution step. The

ﬁlter transformation step consists of creating rotated (or reverse-complement

transformed) copies of a basic ﬁlter, while the translational convolution is the

same as in standard CNNs and thus efﬁciently computable on hardware such

as GPUs. To see this, note that, in both of our examples, we can write a general

transformation g ∈G as a transformation h ∈H (e.g. a rotation or reverse-

complement transformation) followed by a translation k ∈Z

, i.e. g = kh (with

juxtaposition denoting the composition of the group elements k and h). By

properties of the group representation, we have ρ(g) = ρ(kh) = ρ(k)ρ(h). Thus,

(x ⋆ θ)(kh) =

u∈Ω

ρ(k)ρ(h)θ

u∈Ω

(ρ(h)θ)

u–k

(7.100)

We recognise the last equation as the standard (planar Euclidean) convolu-

tion of the signal x and the transformed ﬁlter ρ(h)θ. Thus, to implement

group convolution for these groups, we take the canonical ﬁlter θ, create

transformed copies θ

= ρ(h)θ for each h ∈H (e.g. each rotation h ∈O

reverse-complement DNA symmetry h ∈Z

), and then convolve x with each of

these ﬁlters: (x ⋆ θ)(kh) = (x ⋆ θ

)(k). For both of our examples, the symmetries

act on ﬁlters by simply permuting the ﬁlter coefﬁcients, as shown in Figure 7.3

for discrete rotations. Hence, these operations can be implemented efﬁciently

using an indexing operation with pre-computed indices.

While we deﬁned the feature maps that are produced by the group convo-

lution x ⋆ θ as functions on G, the fact that we can split any g ∈G into g = hk

means that we can also think of them as a stack of Euclidean feature maps

Groups 201

(sometimes called orientation channels), with one feature map per ﬁlter trans-

formation / orientation k. For instance, in our ﬁrst example we would associate

to each ﬁlter rotation (each node in Figure 7.3) a feature map, which is obtained

by convolving (in the traditional translational sense) the rotated ﬁlter. These

feature maps can thus still be stored as a W ×H ×C array, where the number

of channels C equals the number of independent ﬁlters times the number of

transformations h ∈H (e.g. rotations).

As previously shown, the group convolution is equivariant: (ρ(g)x) ⋆ θ =

ρ(g)(x ⋆ θ). What this means in terms of orientation channels is that, under the

action of h, each orientation channel is transformed, and the orientation chan-

nels themselves are permuted. For instance, if we associate one orientation

channel per transformation in Figure 7.3 and apply a rotation by 90 degrees

about the z-axis (corresponding to the red arrows), the feature maps will be

permuted as shown by the red arrows.

This description makes it clear that a group convolutional neural network

bears much similarity to a traditional CNN. Hence, many of the network design

patterns discussed in Chapter 6, such as residual networks, can be used with

group convolutions as well.

7.2.3 Spherical CNNs via the Fourier domain

For the continuous symmetry group of the sphere that we saw in Section 7.1.2,

it is possible to efﬁciently implement the convolution in the spectral domain,

using the appropriate Fourier transform (we remind the reader that the con-

volution on S

is a function on SO(3), hence we need to deﬁne the Fourier

transform on both these domains in order to implement multi-layer spherical

CNNs). Spherical harmonics are an orthogonal basis on the 2D sphere, anal-

ogous to the classical Fourier basis of complex exponential. On the special

orthogonal group, the Fourier basis is known as the Wigner D-functions. Both

of these bases ﬁnd wide applications in quantum mechanics and chemistry.

In both cases, the Fourier transforms (coefﬁcients) are computed as the inner

product with the basis functions, and an analogy of the Convolution Theorem

holds: one can compute the convolution in the Fourier domain as the element-

wise product of the Fourier transforms. Furthermore, FFT-like algorithms exist

for the efﬁcient computation of the Fourier transform on S

and SO(3). Using

this approach, Cohen et al. (2018) were able to build the ﬁrst efﬁcient spherical

CNN; we defer to their paper for speciﬁc details.

7.3 Case Study: TacticAI

As might be expected, group convolutional networks manifest in a wide variety

of interesting domains, but perhaps the reader might still ﬁnd it surprising that

202 Chapter 7

↕

↔

↔↕

↕

↔

↔↕



∥X

↕





↕

∥X





↔

∥X

↔↕





↔↕

∥X

↔



↕

Figure 7.5

Illustration of a single layer of TacticAI’s group-equivariant neural network. For a given

corner kick situation (here visualised using only six players for clarity), TacticAI ﬁrst

generates four views, corresponding to all four possible corner kick locations (e.g., X

↕

for the vertically reﬂected corner). Then, a graph neural network, G, processes carefully

chosen pairs of views, in a way that follows the equivariance constraint. Lastly, all of

the computed view-pair representations are aggregated to produce latent view features

(e.g. H

↕

for the vertically-reﬂected view).

they have seen notable use in association football

analytics. To round off

the groups Chapter, we provide a targeted overview of the resulting TacticAI

method, developed in collaboration with Liverpool FC (Wang et al. 2024).

Rather than surveying the entire architecture and the tasks it was deployed on,

we particularly focus on its group-equivariant aspects (see also Figure 7.5),

along with meaningful context on why these aspects were called for.

Problem setup TacticAI is a system capable of predictive and generative

modelling over various tactical setups in football. The tactical setup input is

represented as a graph of 22 nodes—one for each player—with each node’s

features, x

∈R

, comprising both spatial (current position and velocity) and

physical (player height, weight and ball possession) information about the cor-

responding player. It is assumed that all pairs of players are connected to each

other, allowing the model to infer the most important connections automati-

cally (as we discussed in Chapter 5). The system needs to then either predict

Groups 203

1 2

4 3

↕

2 1

↔

3 4

↕

↔

↔↕

Figure 7.6

The Cayley graph of the dihedral group D

= {e, ↕, ↔, ↔↕}, organising all of its indi-

vidual elements and their action on a domain spanning the four corners of a rectangle.

Note that the arrows are bidirectional, since each D

group action is its own inverse.

Further, note that composing both horizontal and vertical reﬂections is equivalent to a

180

◦

rotation.

the outcome of a future event (e.g. who will make next contact with the ball—a

node classiﬁcation task, or will a shot be taken—a graph classiﬁcation task) or

generate novel setups (that, e.g., modulate the likelihood a shot will be taken).

As it is often tricky to meaningfully inﬂuence open-play tactics in football,

TacticAI focusses all of its attention on modelling outcomes in set pieces,

during which the game is effectively frozen. Speciﬁcally, corner kick setups

were targeted, as they occur reasonably frequently, start from a rigid posi-

tion, and offer immediate goal-scoring potential. Further, corner kick tactics

are often determined well in advance of individual matches, allowing for a

cleaner window for TacticAI to inﬂuence coaching decisions.

An unexpected symmetry The decision to focus on corner kicks also brings

with it an undesirable tradeoff—they are simply not extremely frequent, lead-

ing to relatively modest dataset sizes. Indeed, even though TacticAI was trained

over several seasons’ worth of Premier League data, only 9, 693 unique train-

ing examples were extractable from this data—a far cry from the large scale

datasets used for many of the case studies previously discussed in this book.

Further coupled with the scarcity of certain target events (such as shots, which

are relatively rare), the quantity of useful signal is signiﬁcantly reduced.

204 Chapter 7

At this level of data availability, exploiting symmetries arises as a natural

approach for optimising the extent to which the provided data is utilised. That

said, it was not immediately obvious where such symmetries could come from.

Owing to a direct suggestion from the Liverpool FC collaborators

, it was con-

cluded that, if one varies the speciﬁc corner in which the set piece is being

taken (out of the four possible corners of the pitch), the outcomes would remain

approximately equivariant.

The symmetry group which governs changing corner positions in a way that

preserves the relative positioning of all other points is the dihedral group

, G=

. This group comprises only four elements: D

= {e, ↕, ↔, ↔↕}, enumerat-

ing four possible transformations: identity, vertical ﬂip, horizontal ﬂip, vertical

and horizontal ﬂip (180

◦

rotation). It is fully described by Figure 7.6; note

that each operation is its own inverse, i.e., g = g

–1

for all g ∈D

. Accordingly,

TacticAI’s neural network layers are designed to be D

-equivariant.

Note that, in the context of football, D

equivariance is not exact—reﬂecting

a particular corner kick may not result in exactly the same outcomes. Among

other factors, many corner-kick takers tend to have a preferred foot, which

directly impacts the precision with which they can cross the ball into the

penalty box. However, the data efﬁciency beneﬁts of constraining a model into

outputting D

-equivariant predictions may outweigh any inaccuracies in those

predictions—an assumption that turned out to be true

Constructing D

-equivariant GNNs Since D

is a small, discrete group,

the blueprint of group convolutions we described throughout this Chapter per-

fectly applies. Firstly, let us assume we have at our disposal a graph neural

network (GNN) layer

, G(X), that can operate over input node features, X

(note we omit the adjacency matrix here for brevity, as it will always remain

fully connected). Our D

-equivariant GNN will carefully distribute G across

various views into a corner kick tactical setup, in a way that preserves the D

symmetry in the outputs (as pointed out by Figure 7.5).

Once we have our input features, X ∈R

22×k

, we ﬁrst “lift” them into the D

space by generating all four transformed views: X

= Xρ(g) for g ∈D

. The

corresponding representation matrices ρ(g) ∈R

k×k

are simple to construct if

our spatial features are zero-centered around the pitch center—we simply need

to ﬂip the sign of all columns of X corresponding to the axes being reﬂected.

For example, over vertical ﬂips this amounts to the following diagonal matrix:



↕













1 i = j ∧ i is not a y-axis feature

–1 i = j ∧ i is a y-axis feature

0 i ̸= j

Groups 205

Now, we can follow Equation 7.96 to deﬁne our group-convolutional layer:

h∈D



∥X

–1



wherein we’ve replaced the (inner) product with our GNN layer G applied

over concatenated features. This layer yields latent representations H

∈R

22×l

for all views g ∈D

, and additional such layers may be easily stacked.

Final predictions may be obtained through either frame averaging (H =

↕

↔

↔↕

) or retrieving H

, depending on whether an invariant or

equivariant prediction is required.

This approach has successfully delivered on its promise in the low-data

set-piece analytics domain: for receiver and shot prediction, it improved the

baseline GNN’s predictive power by over 5%; a comparable jump was obtained

by leveraging a graph structure in the ﬁrst place (compared to using a Deep

Sets model). Accordingly, D

equivariance became one of the uniquely notable

features of TacticAI.