Preface

La connaissance de certains principes supplée facilement à la connoissance de certains

faits.

—Claude Adrien Helvétius, De l’esprit (1759)

In October 1872, the philosophy faculty of a small university in the Bavar-

ian city of Erlangen appointed a new young professor. As customary, he

was requested to deliver an inaugural research programme, which he pub-

lished under the somewhat long and boring title ‘Vergleichende Betrachtungen

über neuere geometrische Forschungen’ (‘A comparative review of recent

researches in geometry’). The professor was Felix Klein, only twenty three

years of age at that time, and his inaugural work has entered the annals of

mathematics as the “Erlangen Programme.”

The nineteenth century had been remarkably fruitful for geometry. For the

ﬁrst time in nearly two thousand years after Euclid’s Elements, the construction

of projective geometry by Poncelet, hyperbolic geometry by Gauss, Bolyai,

and Lobachevsky, and elliptic geometry by Riemann showed that an entire zoo

of diverse geometries was possible. However, these constructions had quickly

diverged into independent and unrelated ﬁelds, with many mathematicians of

that period questioning how the different geometries are related to each other

and what actually deﬁnes a geometry.

The breakthrough insight of Klein was to approach the deﬁnition of geom-

etry as the study of invariants, or in other words, structures that are preserved

under a certain type of transformations (symmetries). Klein used the formalism

of group theory to deﬁne such transformations and use the hierarchy of groups

and their subgroups in order to classify different geometries arising from them.

Thus, the group of rigid motions leads to the traditional Euclidean geometry,

xii Preface

while afﬁne or projective transformations produce, respectively, the afﬁne and

projective geometries.

The impact of the Erlangen Programme on geometry was very profound.

Furthermore, it spilled to other ﬁelds, especially physics, where symmetry

principles allowed to derive conservation laws from ﬁrst principles of sym-

metry (an astonishing result known as Noether’s Theorem), and even enabled

the classiﬁcation of elementary particles as irreducible representations of the

symmetry group.

At the time of writing, the state of the ﬁeld of deep learning is somewhat

reminiscent of the ﬁeld of geometry in the nineteenth century. There is a ver-

itable zoo of neural network architectures for various kinds of data, but few

unifying principles. As in times past, this makes it difﬁcult to understand the

relations between various methods, inevitably resulting in the reinvention and

re-branding of the same concepts in different application domains. For a novice

entering the ﬁeld, absorbing the sheer volume of redundant and unconnected

ideas is a major challenge.

In this book, we make a modest attempt to apply the Erlangen Programme

mindset to the domain of deep learning, with the ultimate goal of obtaining a

systematisation of this ﬁeld and ‘connecting the dots’. We call this geometrisa-

tion attempt ‘Geometric Deep Learning’, and true to the spirit of Felix Klein,

propose to derive different inductive biases and network architectures imple-

menting them from ﬁrst principles of symmetry and invariance. In particular,

we focus on a large class of neural networks designed for analysing unstruc-

tured sets, grids, graphs, and manifolds, and show that they can be understood

in a uniﬁed manner as methods that respect the structure and symmetries of

these domains.

We believe this book would appeal to a broad audience of deep learning

researchers, practitioners, and enthusiasts. A novice may use it as an overview

and introduction to Geometric Deep Learning. A seasoned deep learning expert

may discover new ways of deriving familiar architectures from basic principles

and perhaps some surprising connections. Practitioners may get new insights

on how to solve problems in their respective ﬁelds. As a textbook, we believe

the book can be used in an advanced (graduate) machine learning course, or as

a foundational ML course for the mathematically-oriented audience.

With such a fast-paced ﬁeld as modern machine learning, the risk of writ-

ing a book like this is that it becomes obsolete and irrelevant before it sees

the light of day. Having focused on foundations, our hope is that the key con-

cepts we discuss will transcend their speciﬁc realisations — or, as Helvétius

(1759) put it, “the knowledge of certain principles easily compensates the lack

of knowledge of certain facts” .

Preface xiii

What this book is about

Our book is designed to introduce existing deep learning architectures through

the prism of geometry and categorise them based on the fundamental symme-

tries of the data they work on. We take care not to express predilections for any

speciﬁc architecture (though we might have our own such preferences–nihil

humanum nobis alienum) as we believe there is no “one true architecture,”

much like there was no “one true geometry” in mathematics.

What this book is not about

True to Helvetius’ maxim suggesting to focus on principles and their archi-

tectural instances, we avoid detailed discussion of speciﬁc machine learning

pipelines (such as self-supervised learning, generative modelling, or reinforce-

ment learning) and their training and regularisation procedures (such as many

gradient-descent variants or batch normalisation). As machine learning prac-

titioners ourselves, we are however aware that “gray are all theories, and

green alone Life’s golden tree”. It is not uncommon to see benchmarks and

leaderboards dominated by architectures that do not necessarily have rigorous

mathematical underpinnings, which we will typically refrain from exempli-

fying. There is a plethora of reasons why this could happen in practice. One

often cited reason is a bias in the data that does not reﬂect the symmetries

of the problem we actually care about. Another reason are the ‘hardware lot-

tery’ (Hooker 2021) and ‘hype lottery’ phenomena where substantial resources

can incentivise large-scale hyperparameter tuning leading to “winning” tricks

that have nothing to do with the choice of architecture (Trockman and Kolter

2022). Finally, in many applications, a carefully-tuned domain-speciﬁc archi-

tecture may outperform a generic mathematically-principled one on particular

problems (Liu et al. 2022).

How to use this book

We anticipate the best way to use this book is as a way to master the geomet-

ric approach of categorising and reasoning about deep learning architectures:

it can serve as a useful study companion while learning about existing archi-

tectures, or a source of inspiration while devising or describing novel ones.

These principles are embodied by the courses we have delivered ﬁrst at the

African Master’s in Machine Intelligence (AMMI) in 2021 and subsequently

at Cambridge and Oxford in 2022–2024. Accordingly, we expect that our text

can serve as a valuable foundation for undergraduate or graduate-level courses

in machine learning and provide our lecture slides as an accompaniment to the

book at geometricdeeplearning.com.

xiv Preface

We have attempted to cast a reasonably wide net in terms of the architectures

we discuss here, in order to illustrate the power of our geometric blueprint.

Hence, our book could be interpreted as a survey of machine learning archi-

tectures (circa 2022)—yet, we ﬁnd this to be a suboptimal way to utilise it.

Indeed, our work does not attempt to accurately summarise the entire exist-

ing wealth of research on Geometric Deep Learning. Rather, we study several

well-known architectures in-depth in order to demonstrate the key principles

and ground them in existing research, with the hope that we have left sufﬁcient

references for the reader to meaningfully apply these principles to any future

geometric deep architecture they encounter or devise.

Suggested Pre-requisites

While trying to make the book self-contained, we assume the reader to have

a good grasp of basic mathematical concepts and to be familiar with machine

learning. For those who need to ﬁll any lacunae, the classical book of Bruck-

ner, Bruckner, and Thomson (2008) provides a full overview of real analysis,

including calculus (the notions of derivatives and integration), linear algebra

(vector spaces and matrices), functional analysis (metric-, Banach-, Hilbert-,

and L

p

-spaces), and harmonic analysis (Fourier series and transforms). Since

group theory plays a central role in our exposition, we introduce the main con-

cepts in the following chapters. For a lightweight introduction into this subject,

we recommend the visual approach of Carter (2021). A deeper study of the

subject including the notions of Fourier transforms on groups and irreducible

representations is presented in the book of Folland (1989) on abstract harmonic

analysis. Our discussion on manifolds would beneﬁt from a basic background

in differential geometry, for which we suggest the classical text of Do Carmo

(2016).

We also ﬁnd it useful to have an understanding of the foundations of signal

processing—for which Mallat (1999) is an excellent text. Further, many of the

constructs and data domains we use will have an underlying graph structure,

so we believe that a foundation in graph theory may be of beneﬁt to the reader

as well; we recommend Chung and Graham (1997) for spectral graph theory

and Shuman et al. (2013) and Sandryhaila and Moura (2013) for graph signal

processing.

Lastly, as one might expect, a prior understanding of the foundations of

machine learning may amplify the reader’s understanding of the signiﬁcance

of various architectures we discuss, as well as a way to spot connections to their

implementation before we elaborate on them. There are many suitable intro-

ductory texts, of which we recommend Murphy (2022) and the most recent

book by Bishop and Bishop (2024).