The heat equation is a partial differential equation describing the propagation of heat through a medium. Simulating the heat equation is a well-known toy problem in numerics.
For simplicity, we will implement the heat equation in 1D with periodic boundary conditions. You can imagine this as simulating heat propagation through a strand of cooked spaghetti whose ends are joined.
The heat equation in 1D is given by:
Where $U$ is the temperature, $\alpha$ is the thermal diffusivity constant, $t$ is the time coordinate, and $x$ is the spatial coordinate.
To implement this numerically, we need to discretize the equation in space and time. The first step is to write our derivative as a finite difference. In other words: we want to know, given $U^t_x$, how to compute $U^{t + \Delta t}_x$.
To do this, we will employ a Taylor series expansion:
If we assume that $\Delta t$ is infinitesimally small, we can neglect terms of $(\Delta t)^n$ where $n \ge 2$:
The above is an equation to compute the next time step's value in terms of the current one. It is known as Euler's method.
If we take the same Taylor expansion in $x$ we get:
We can likewise take the Taylor series in the $-x$ direction:
Which we can also write as:
Adding both series together gives us:
Treating $\Delta x$ as infinitesimally small (neglecting terms larger than $(\Delta x)^2$), and solving for $\partial_x^2 U^t_x$ gives us this:
We now have the tools we need to simulate the heat equation on a computer.
We will solve our equation on a grid of $N$ grid points with grid spacing $\Delta x$:
This step is called discretizing in space. Put simply, we're just saying that $x_{i}$ is the $i$th grid point, corresponding to the spatial coordinate $i \cdot \Delta x$.
To discretize in time, we use both Euler's method and our second order spatial discretization derived above to get:
For simplicity, we will factor $\Delta t$, $\Delta x$, and $\alpha$ into one constant: $C$. This gives us:
For the code to run quickly and without spurious numerical errors, we need $C = .49 = \alpha (\Delta t) / (\Delta x)^2$. This formula will also help you compute $\Delta t$ after choosing a $\Delta x$ and an $\alpha$.
This is a basic kernel, or stencil. To compute the value of any particular grid point at any particular time step, we need to read the values of that grid point and its neighbors at the previous time step.
Let's say we have an array of values representing a spaghetti noodle at some time step composed of five grid points:
prevStep[0] = 1;
prevStep[1] = 2;
prevStep[2] = 3;
prevStep[3] = 4;
prevStep[4] = 5;
To evolve this noodle one time step forward, we need to calculate one grid point at a time according to the kernel, and then store the result in a new array. Imagine a sliding window three grid points wide, traveling across the array. The center of the window is the grid point we're currently evolving, with the left and right neighbors being the other inputs we need.
For example, to calculate nextStep[2], we need to read prevStep[1], prevStep[2], and prevStep[3], and then
calculate the value of nextStep[2] according to the kernel:
nextStep[2] = prevStep[2] + C * (prevStep[1] - 2 * prevStep[2] + prevStep[3]);
We can generalize this easily for the rest of the interior:
nextStep[1] = prevStep[1] + C * (prevStep[0] - 2 * prevStep[1] + prevStep[2]);
nextStep[2] = prevStep[2] + C * (prevStep[1] - 2 * prevStep[2] + prevStep[3]);
nextStep[3] = prevStep[3] + C * (prevStep[2] - 2 * prevStep[3] + prevStep[4]);
What about the boundaries, prevStep[0] and prevStep[4]? Their left and right neighbors, respectively, seem to be out
of bounds. Actually, since we're using periodic boundary conditions, we need to treat this array as circular:
nextStep[0] = prevStep[0] + C * (prevStep[4] - 2 * prevStep[0] + prevStep[1]);
nextStep[4] = prevStep[4] + C * (prevStep[3] - 2 * prevStep[4] + prevStep[0]);
The standard way to parallelize this would be to split the array into chunks, pass each chunk to a different thread, then allow each thread to evolve its own chunk in parallel. However, the fact that we always need to read from a point's neighbors introduces a wrinkle. If we're evolving the leftmost grid point in a chunk, then another thread owns the left neighbor, which means we can't access it without some kind of synchronization.
Here's an example where we're splitting an array of 10 values between two threads:
| Thread 1 | Thread 2 |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
^
If we want to evolve point 5, how do we read point 4?
The solution is to use ghost zones. These are extra slots in the array that serve to hold copies of values which need to cross the boundary between threads. Between time steps, we need to update the ghost zones so that each thread has a current view of the values straddling its boundaries.
For example:
| Thread 1 | Thread 2 |
|*9 | 0 | 1 | 2 | 3 | 4 |*5 |*4 | 5 | 6 | 7 | 8 | 9 |*0 |
^
If we want to evolve point 5,
we can read the value of point 4 from the ghost zone *4.
The above would be our only option if we were using distributed memory. However, with shared memory, we can eliminate ghost zones entirely if we synchronize carefully. This is possible in Guardian. However, the most straightforward formulation of the heat equation in Guardian is a scheme where we use ghost zones to implement periodic boundary conditions on the extremities of the array, but elide ghost zones in other cases. When we use the term "ghost zone" in Guardian, we usually mean the aforementioned scheme. Here's how that looks:
| Thread 1 | Thread 2 |
|*9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |*0 |
^
If we want to evolve point 5, we can read the value of point 4,
but we need some kind of synchronization with the other thread.
Let's implement the heat equation in Guardian with PartList!
First, let's define a function which takes one time step. It should take two PartLists, one containing the previous
time step, and one used as a destination to write the values of the next time step:
namespace heat;
value class Heat {
static C: real = 0.49;
async static fn step(src: PartList<real>, dst: PartList<real>) -> Fut<void> {
/* Implement me! */
}
}
First, let's handle the periodic boundary conditions:
PartList::runParted(1, src.readWrite()) |all| {
all[0] = all[all.end()-2];
all[all.end()-1] = all[1];
}
This is similar to the way we used runParted in the previous section---running a single task over the entire list.
We're just setting the first and last elements of the list to the values of the second-to-last and last elements,
respectively. This matches the diagram above, where the first and last elements are the ghost zones.
Next, we implement the stencil function to write the next timestep into dst. Since this needs to happen after the
boundary conditions are handled, we use .then() to chain the operations:
// Handle periodic boundary conditions
PartList::runParted(1, src.readWrite()) |all| {
all[0] = all[all.end()-2];
all[all.end()-1] = all[1];
}.then():
// Handle the actual kernel with 2 partitions and 1 ghost zone
PartList::runParted(2, 1, src.read(), dst.write()) |src, dst| {
dst[..] = src[..] +
Heat::C * (src[1..] + src[-1..] - 2*src[..]);
}
This matches the kernel function we described earlier. Note the offsets in the expression
src[1..] + src[-1..] - 2*src[..]. This is how we access the neighbors of the current point. Because we specified
one ghost zone, we can offset the indices by one in either direction.
Finally, let's return from the function:
async static fn step(src: PartList<real>, dst: PartList<real>) -> Fut<void> {
PartList::runParted(1, src.readWrite()) |all| {
all[0] = all[all.end()-2];
all[all.end()-1] = all[1];
}.then():
PartList::runParted(2, 1, src.read(), dst.write()) |src, dst| {
dst[..] = src[..] +
Heat::C * (src[1..] + src[-1..] - 2*src[..]);
}.then():
async return;
}
Now, we'll write some driver code in main(). The basic idea is to cycle between two arrays of values, each taking
turns being the source and destination, until we've evolved all the time steps.
async entry fn main() -> Fut<void> {
let n = 10;
let steps = 5000;
let u1 = PartList::of(n) |i| { yield i + 1.0; };
let u2 = PartList::of(n) |i| { yield i + 1.0; };
printPartList(u1).then():
Loop::marchingFor(0, steps) |timeStep| {
if timeStep % 2 == 0 {
yield step(u1, u2);
} else {
yield step(u2, u1);
}
}.then():
printPartList(u1).then():
verify(u1, n).then() |verified|:
if verified {
Out::println("Success!");
} else {
Out::println("Failure!");
}
async return;
}
Let's define those helper functions printPartList and verify.
Here's the entire program:
namespace heat;
import gu4::Out;
import gu4::Math;
value class Heat {
public static C: real = 0.49;
async static fn step(src: PartList<real>, dst: PartList<real>) -> Fut<void> {
PartList::runParted(1, src.readWrite()) |all| {
all[0] = all[all.end()-2];
all[all.end()-1] = all[1];
}.then():
PartList::runParted(2, 1, src.read(), dst.write()) |src, dst| {
dst[..] = src[..] +
Heat::C * (src[1..] + src[-1..] - 2*src[..]);
}.then():
async return;
}
async static fn verify(p: PartList<real>, n: int) -> Fut<bool> {
let expectedVal = ((n - 2) * (n - 1) / 2 + (n - 2)) / (n - 2.0);
let tightTol = 0.000001;
let looseTol = 0.01;
PartList::runParted(1, p.read()) |p|:
let mutable sum = 0.0;
for val in p {
sum = sum + val;
if Math::abs(val - expectedVal) > looseTol {
async return false;
}
}
async return Math::abs(sum/n - expectedVal) <= tightTol;
}
static fn printPartList(partList: PartList<real>) -> Fut<void> {
return PartList::runParted(1, partList.read()) |view|:
if view.readableSize() == 0 {
Out::println("[]");
yield;
}
let mutable str = "[" + view[0];
for e in view[1..] {
str = str + ", " + e;
}
Out::println(str + "]");
}
async entry fn main() -> Fut<void> {
let n = 10;
let steps = 5000;
let u1 = PartList::of(n) |i| { yield i + 1.0; };
let u2 = PartList::of(n) |i| { yield i + 1.0; };
printPartList(u1).then():
Loop::marchingFor(0, steps) |timeStep| {
if timeStep % 2 == 0 {
yield step(u1, u2);
} else {
yield step(u2, u1);
}
}.then():
printPartList(u1).then():
verify(u1, n).then() |verified|:
if verified {
Out::println("Success!");
} else {
Out::println("Failure!");
}
async return;
}
}
And the output we expect:
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
[5.500000000000011, 5.499999999999989, 5.500000000000011, 5.499999999999989, 5.500000000000011, 5.499999999999989, 5.500000000000011, 5.499999999999989, 5.500000000000011, 5.499999999999989]
Success!
It works! Feel free to adjust n and steps to experiment with different values.