Adding Custom Functions¶
As mentioned in the first section of this tutorial dolfin-adjoint works by overloading parts of FEniCS so that it may build up an annotation by recording each step of the forward model. The list of overloaded functions and objects is found in the API reference. The part of dolfin-adjoint that takes care of the fundamental annotation is pyadjoint, which is independent of FEniCS. dolfin-adjoint tells pyadjoint how to handle FEniCS types and functions. If the forward model uses custom functions rather than the standard FEniCS functions, pyadjoint won’t know how to record these steps, therefore we have to tell it how, by overloading the functions ourselves.
A Simple Example¶
Suppose we have a module we want to use with our FEniCS model,
in this example the module will be named normalise
and consist of
only one function: normalise(func)
. The module looks like this:
from fenics import *
from fenics_adjoint import *
def normalise(func):
vec = func.vector()
normalised_vec = vec / vec.norm('l2')
return Function(func.function_space(), normalised_vec)
The function normalise(func)
normalises the vector form of a FEniCS function,
then returns the FEniCS function form of that normalised vector. A simple fenics
program that uses this function might look like this:
from fenics import *
from fenics_adjoint import *
from normalise import normalise
mesh = UnitSquareMesh(10, 10)
V = FunctionSpace(mesh, 'CG', 1)
f = project(Expression('x[0]*x[1]', degree=1), V)
g = normalise(f)
J = assemble(g*dx)
Here we define a function on a space, normalise it with our function and integrate it over the space. Now we want to know the gradient of \(J\) with respect to the initial conditions, we could try simply adding
from fenics_adjoint import *
and
dJdf = compute_gradient(J, Control(f))
but that won’t work, because pyadjoint does not know that it should record
the normalisation and it does not know what the derivative of the normalisation is.
We should create a new module that overloads normalise(func)
, telling
pyadjoint how to deal with it.
Overloading a function¶
Let us now create a module overloading the normalise(func)
function.
We need to start by importing the FEniCS and dolfin-adjoint modules, along with
some specific functions needed for overloading and of course the function we want to
overload.
from fenics import *
from fenics_adjoint import *
from pyadjoint import Block
from normalise import normalise
Since we are overloading normalise(func)
we need to change it’s name
to keep access to it:
backend_normalise = normalise
The Block class¶
The pyadjoint tape consists of instances of Block
subclasses.
These subclasses implement methods that can compute partial derivatives of their respective function.
Thus, to properly overload normalise
we must implement a Block
subclass,
which we call NormaliseBlock
.
In addition to writing a constructor we have to override the methods evaluate_adj_component()
and
recompute_component()
, we will also add a __str__()
method.
In our example the constructor may look like this
class NormaliseBlock(Block):
def __init__(self, func, **kwargs):
super(NormaliseBlock, self).__init__()
self.kwargs = kwargs
self.add_dependency(func.block_variable)
We call the superclass constructor and save the key word arguments.
Then we tell pyadjoint that the operation this block represents depends
on the function func
. As func
should be an overloaded object it has a
block_variable()
attribute.
Next we can define a __str__()
method. This gives a name to the block,
so the output of this is for example how the block is represented in graphs made
with visualise
as explained in the section
on debugging.
def __str__(self):
return "NormaliseBlock"
We need a recompute
method that can
recompute the function with new inputs.
def recompute_component(self, inputs, block_variable, idx, prepared):
return backend_normalise(inputs[0])
We get a list of new inputs which is of length 1 because we only have one input variable. Or more precisely, we only added one dependency in the constructor.
The adjoint¶
The method evaluate_adj_component()
should evaluate the one component of the vector-Jacobian product.
In the mathematical background we discussed the tangent linear model
and the adjoint on the level of the whole model. Here we consider more concretely
how dolfin-adjoint treats each block. pyadjoint treats a forward model as a series of equation solves.
Some of these equations are complicated PDEs that are solved by the FEniCS function solve
,
but others are of the straightforward form
where \(y\) is the only unknown. Our normalise
function may be represented by this kind of equation.
When differentiating a functional pyadjoint works by considering each block as a link in chain formed by
the chain rule. If a functional is the result of a series of straightforward transformations on an initial condition:
then by the chain rule
If we consider instead the adjoint model we will find the transpose of \(\frac{\mathrm{d}J}{\mathrm{d}u_0}\):
Calculating from the right we find that for each link
where
and
Each block only needs to find the transpose of its own gradient!
This is implemented in evaluate_adj()
.
Back to our example¶
Mathematically our normalisation block may be represented in index notation as
The Jacobian matrix consists of the entries
evaluate_adj()
takes a vector as input and returns the transpose of the Jacobian matrixmultiplied with that vector:
evaluate_adj_component()
works as evaluate_adj()
,
but computes only the component that corresponds to a single dependency (input).
In other words, given an index \(i\) evaluate_adj_component()
computes
the component \(\left(\nabla f^* \cdot y\right)_i\).
By default, evaluate_adj()
calls evaluate_adj_component()
for each of the relevant components.
Now let us look at the implementation:
def evaluate_adj_component(self, inputs, adj_inputs, block_variable, idx, prepared=None):
adj_input = adj_inputs[0]
x = inputs[idx].vector()
inv_xnorm = 1.0 / x.norm('l2')
return inv_xnorm * adj_input - inv_xnorm ** 3 * x.inner(adj_input) * x
evaluate_adj_component()
takes 5 arguments:
inputs
is a list of the inputs where we compute the derivative, i.e \(x\) in the above derivations. This list has the same length as the list of dependencies.adj_inputs
is a list of the adjoint inputs, i.e \(y_{i+1}\) above with this method representing the computation of \(y_i\). This list has the same length as the list of outputs.block_variable
is the block variable of the dependency (input) that we differentiate with respect to.idx
is the index of the dependency, that we differentiate with respect to, in the list of dependencies. Given a function output \(z = f(x, y)\), where the dependency list is[x, y]
, then \((\partial z/\partial x)^*\) foridx == 0
and \((\partial z/\partial y)^*\) foridx == 1
.prepared
can be anything. It is the return value ofprepare_evaluate_adj()
, which is run beforeevaluate_adj_component()
is called for each relevant dependency and the default return value isNone
. If your implementation would benefit from doing some computations independent of the relevant dependencies, you should consider implementingprepare_evaluate_adj()
. For example, forsolve()
the adjoint equation is solved inprepare_evaluate_adj()
, and the adjoint solution is provided in theprepared
parameter.
For more in-depth documentation on Blocks in pyadjoint, see
The overloading function¶
Now we are ready to define our overloaded function. For simple functions, where the function return value is the output, pyadjoint offers a convenience function for overloading. For this example, we utilize this convenience function:
from pyadjoint.overloaded_function import overload_function
normalise = overload_function(normalise, NormaliseBlock)
download the overloaded module
That’s it! Now we are ready to use our function normalise
with dolfin-adjoint.
Let us perform a taylor test to see if it works:
from fenics import *
from fenics_adjoint import *
from normalise_overloaded import normalise
from numpy.random import rand
mesh = UnitSquareMesh(10, 10)
V = FunctionSpace(mesh, 'CG', 1)
f = project(Expression('x[0]*x[1]', degree=1), V)
g = normalise(f)
J = assemble(g*dx)
h = Function(V)
h.vector()[:] = rand(h.vector().local_size())
taylor_test(ReducedFunctional(J, Control(f)), f, h)
This gives the output:
Running Taylor test
Computed residuals: [5.719808547999933e-06, 1.4356712128880207e-06, 3.596346874345e-07, 8.999840626988876e-08]
Computed convergence rates: [1.99424146695553, 1.9971213080328687, 1.9985608192605893]
It works.