Preface
La connaissance de certains principes supplée facilement à la connoissance de certains
faits.
—Claude Adrien Helvétius, De l’esprit (1759)
In October 1872, the philosophy faculty of a small university in the Bavar-
ian city of Erlangen appointed a new young professor. As customary, he
was requested to deliver an inaugural research programme, which he pub-
lished under the somewhat long and boring title ‘Vergleichende Betrachtungen
über neuere geometrische Forschungen’ (‘A comparative review of recent
researches in geometry’). The professor was Felix Klein, only twenty three
years of age at that time, and his inaugural work has entered the annals of
mathematics as the “Erlangen Programme.
The nineteenth century had been remarkably fruitful for geometry. For the
first time in nearly two thousand years after Euclid’s Elements, the construction
of projective geometry by Poncelet, hyperbolic geometry by Gauss, Bolyai,
and Lobachevsky, and elliptic geometry by Riemann showed that an entire zoo
of diverse geometries was possible. However, these constructions had quickly
diverged into independent and unrelated fields, with many mathematicians of
that period questioning how the different geometries are related to each other
and what actually defines a geometry.
The breakthrough insight of Klein was to approach the definition of geom-
etry as the study of invariants, or in other words, structures that are preserved
under a certain type of transformations (symmetries). Klein used the formalism
of group theory to define such transformations and use the hierarchy of groups
and their subgroups in order to classify different geometries arising from them.
Thus, the group of rigid motions leads to the traditional Euclidean geometry,
xii Preface
while affine or projective transformations produce, respectively, the affine and
projective geometries.
The impact of the Erlangen Programme on geometry was very profound.
Furthermore, it spilled to other fields, especially physics, where symmetry
principles allowed to derive conservation laws from first principles of sym-
metry (an astonishing result known as Noether’s Theorem), and even enabled
the classification of elementary particles as irreducible representations of the
symmetry group.
At the time of writing, the state of the field of deep learning is somewhat
reminiscent of the field of geometry in the nineteenth century. There is a ver-
itable zoo of neural network architectures for various kinds of data, but few
unifying principles. As in times past, this makes it difficult to understand the
relations between various methods, inevitably resulting in the reinvention and
re-branding of the same concepts in different application domains. For a novice
entering the field, absorbing the sheer volume of redundant and unconnected
ideas is a major challenge.
In this book, we make a modest attempt to apply the Erlangen Programme
mindset to the domain of deep learning, with the ultimate goal of obtaining a
systematisation of this field and ‘connecting the dots’. We call this geometrisa-
tion attempt ‘Geometric Deep Learning’, and true to the spirit of Felix Klein,
propose to derive different inductive biases and network architectures imple-
menting them from first principles of symmetry and invariance. In particular,
we focus on a large class of neural networks designed for analysing unstruc-
tured sets, grids, graphs, and manifolds, and show that they can be understood
in a unified manner as methods that respect the structure and symmetries of
these domains.
We believe this book would appeal to a broad audience of deep learning
researchers, practitioners, and enthusiasts. A novice may use it as an overview
and introduction to Geometric Deep Learning. A seasoned deep learning expert
may discover new ways of deriving familiar architectures from basic principles
and perhaps some surprising connections. Practitioners may get new insights
on how to solve problems in their respective fields. As a textbook, we believe
the book can be used in an advanced (graduate) machine learning course, or as
a foundational ML course for the mathematically-oriented audience.
With such a fast-paced field as modern machine learning, the risk of writ-
ing a book like this is that it becomes obsolete and irrelevant before it sees
the light of day. Having focused on foundations, our hope is that the key con-
cepts we discuss will transcend their specific realisations or, as Helvétius
(1759) put it, “the knowledge of certain principles easily compensates the lack
of knowledge of certain facts” .
Preface xiii
What this book is about
Our book is designed to introduce existing deep learning architectures through
the prism of geometry and categorise them based on the fundamental symme-
tries of the data they work on. We take care not to express predilections for any
specific architecture (though we might have our own such preferences–nihil
humanum nobis alienum) as we believe there is no “one true architecture,
much like there was no “one true geometry” in mathematics.
What this book is not about
True to Helvetius’ maxim suggesting to focus on principles and their archi-
tectural instances, we avoid detailed discussion of specific machine learning
pipelines (such as self-supervised learning, generative modelling, or reinforce-
ment learning) and their training and regularisation procedures (such as many
gradient-descent variants or batch normalisation). As machine learning prac-
titioners ourselves, we are however aware that “gray are all theories, and
green alone Life’s golden tree”. It is not uncommon to see benchmarks and
leaderboards dominated by architectures that do not necessarily have rigorous
mathematical underpinnings, which we will typically refrain from exempli-
fying. There is a plethora of reasons why this could happen in practice. One
often cited reason is a bias in the data that does not reflect the symmetries
of the problem we actually care about. Another reason are the ‘hardware lot-
tery’ (Hooker 2021) and ‘hype lottery’ phenomena where substantial resources
can incentivise large-scale hyperparameter tuning leading to “winning” tricks
that have nothing to do with the choice of architecture (Trockman and Kolter
2022). Finally, in many applications, a carefully-tuned domain-specific archi-
tecture may outperform a generic mathematically-principled one on particular
problems (Liu et al. 2022).
How to use this book
We anticipate the best way to use this book is as a way to master the geomet-
ric approach of categorising and reasoning about deep learning architectures:
it can serve as a useful study companion while learning about existing archi-
tectures, or a source of inspiration while devising or describing novel ones.
These principles are embodied by the courses we have delivered first at the
African Master’s in Machine Intelligence (AMMI) in 2021 and subsequently
at Cambridge and Oxford in 2022–2024. Accordingly, we expect that our text
can serve as a valuable foundation for undergraduate or graduate-level courses
in machine learning and provide our lecture slides as an accompaniment to the
book at geometricdeeplearning.com.
xiv Preface
We have attempted to cast a reasonably wide net in terms of the architectures
we discuss here, in order to illustrate the power of our geometric blueprint.
Hence, our book could be interpreted as a survey of machine learning archi-
tectures (circa 2022)—yet, we find this to be a suboptimal way to utilise it.
Indeed, our work does not attempt to accurately summarise the entire exist-
ing wealth of research on Geometric Deep Learning. Rather, we study several
well-known architectures in-depth in order to demonstrate the key principles
and ground them in existing research, with the hope that we have left sufficient
references for the reader to meaningfully apply these principles to any future
geometric deep architecture they encounter or devise.
Suggested Pre-requisites
While trying to make the book self-contained, we assume the reader to have
a good grasp of basic mathematical concepts and to be familiar with machine
learning. For those who need to fill any lacunae, the classical book of Bruck-
ner, Bruckner, and Thomson (2008) provides a full overview of real analysis,
including calculus (the notions of derivatives and integration), linear algebra
(vector spaces and matrices), functional analysis (metric-, Banach-, Hilbert-,
and L
p
-spaces), and harmonic analysis (Fourier series and transforms). Since
group theory plays a central role in our exposition, we introduce the main con-
cepts in the following chapters. For a lightweight introduction into this subject,
we recommend the visual approach of Carter (2021). A deeper study of the
subject including the notions of Fourier transforms on groups and irreducible
representations is presented in the book of Folland (1989) on abstract harmonic
analysis. Our discussion on manifolds would benefit from a basic background
in differential geometry, for which we suggest the classical text of Do Carmo
(2016).
We also find it useful to have an understanding of the foundations of signal
processing—for which Mallat (1999) is an excellent text. Further, many of the
constructs and data domains we use will have an underlying graph structure,
so we believe that a foundation in graph theory may be of benefit to the reader
as well; we recommend Chung and Graham (1997) for spectral graph theory
and Shuman et al. (2013) and Sandryhaila and Moura (2013) for graph signal
processing.
Lastly, as one might expect, a prior understanding of the foundations of
machine learning may amplify the reader’s understanding of the significance
of various architectures we discuss, as well as a way to spot connections to their
implementation before we elaborate on them. There are many suitable intro-
ductory texts, of which we recommend Murphy (2022) and the most recent
book by Bishop and Bishop (2024).