178 Chapter 6
While in general the domain can evolve in time together with the signals
on it, it is typically assumed that the domain is kept fixed across all the t, i.e.
Ω
(t)
= Ω. Here, we will exclusively focus on this case, but note that exceptions
are common. Social networks are an example where one often has to account
for the domain changing through time, as new links are regularly created as
well as erased. The domain in this setting is often referred to as a dynamic
graph (D. Xu et al. 2020; Rossi, Chamberlain, et al. 2020).
Often, the individual X
(t)
inputs will exhibit useful symmetries and hence
may be nontrivially treated by any of our previously discussed architectures.
Some common examples include: videos (Ω is a fixed grid, and signals are a
sequence of frames); fMRI scans (Ω is a fixed mesh representing the geometry
of the brain cortex, where different regions are activated at different times as a
response to presented stimuli); and traffic flow networks (Ω is a fixed graph rep-
resenting the road network, on which e.g. the average traffic speed is recorded
at various nodes).
Let us assume an encoder function f (X
(t)
) providing latent representations
at the level of granularity appropriate for the problem and respectful of the
symmetries of the input domain. As an example
17
, consider processing video
frames: that is, at each timestep, we are given a grid-structured input rep-
resented as an n ×d matrix X
(t)
, where n is the number of pixels (fixed in
time) and d is the number of input channels (e.g. d = 3 for RGB frames). Fur-
ther, we are interested in analysis at the level of entire frames, in which case
it is appropriate to implement f as a translation invariant CNN, outputting a
k-dimensional representation z
(t)
= f (X
(t)
) of the frame at time-step t.
We are now left with the task of appropriately summarising a sequence of
vectors z
(t)
across all the steps. A canonical way to dynamically aggregate
this information in a way that respects the temporal progression of inputs and
also easily allows for online arrival of novel data-points, is using a Recurrent
Neural Network (RNN).
18
What we will show here is that RNNs are an inter-
esting geometric architecture to study in their own right, since they implement
a rather unusual type of symmetry over the inputs z
(t)
.
SimpleRNNs At each step, the recurrent neural network computes an m-
dimensional summary vector h
(t)
of all the input steps up to and including t.
This (partial) summary is computed conditional on the current step’s features
and the previous step’s summary, through a shared update function, R : R
k
×
R
m
→R
m
, as follows (see Figure 6.5 for a summary):
h
(t)
= R(z
(t)
, h
(t–1)
) (6.80)
and, as both z
(t)
and h
(t–1)
are flat vector representations, R may be most easily
expressed as a single fully-connected neural network layer (often known as