Matrix addition is a simple, embarrassingly parallel element-wise operation. With our previous knowledge, we could
parallelize matrix addition by enclosing each element of the matrix in a GuardVar, but this would in all likelihood
be over-synchronized and would perform poorly.
In this section, we will explore the PartList type, which allows us to express
array-based parallel computations efficiently and concisely.
Matrix addition is defined as:
Put simply, we add each element of the first matrix to the corresponding element of the second matrix.
Guardian currently does not have a built-in matrix or n-dimensional array type. However, we can define a simple
matrix type using PartList to store a flattened array of elements. That is, for a
$M \times N$ matrix, we can store the elements in a PartList of length $M \cdot N$, where the first $N$ elements
comprise the first row, the next $N$ elements comprise the second row, and so on.
Formally:
Let's define this as a class in Guardian. Our class should have three fields:
m: The number of rows in the matrix.n: The number of columns in the matrix.data: The flattened PartList of elements.The fields m and n are immutable value types (integers), because the matrix never changes dimensions. The data
field is a PartList, which is a vault type, meaning that it has interior, thread-safe mutability. This means that
our class must be a vault type as well.
Here's our class:
vault class Matrix {
m: int; // Number of rows
n: int; // Number of columns
data: PartList<real>; // Flattened array of elements
}
Let's add a constructor to our class that initializes the matrix. To initialize a PartList, we need to call the
PartList::of function, which takes a size and an initialization function of type (int) -> T, where the int
argument is an index and T is the type of the elements in the PartList.
Since our matrix type is supposed to be two-dimensional, let's say that its constructor must take an initialization
function of type (int, int) -> real, where the int arguments are the row and column indices.
Note: Due to a language limitation which may be lifted in the future, we cannot attach a suffix lambda to a constructor
call using the new keyword. To work around this, we create a static of method on the class that takes the same
arguments as the constructor, and call the constructor from inside that method. This will create an interface similar
to PartList::of.
Here's our Matrix class with the constructor added:
vault class Matrix {
m: int;
n: int;
data: PartList<real>;
// Constructor
private new(m: int, n: int, initFn: (int, int) -> real) {
this.m = m;
this.n = n;
this.data = PartList::of(m * n) |i| {
yield initFn(i / n, i % n);
};
}
// HACK: Currently can't pass suffix lambda to a constructor directly.
static fn of(m: int, n: int, initFn: (int, int) -> real) -> Matrix {
return new Matrix(m, n, initFn);
}
}
First, let's define a method to print a Matrix object to the console.
To access the elements of a PartList, we must use the PartList::runParted method. This method transparently
partitions the elements, then runs a task on those partitions in parallel. It takes four arguments:
int specifying the number of evenly-sized chunks to create, or
a ValueList<(int, int)> specifying the exact intervals comprising each partition.PartIntent, specifying which PartLists are involved in the computation, and whether they are being
read to or written from.PartViews reflecting the number of intents. Each
view corresponds to a slice of a PartList, and has reading or writing capabilities according to the intent. We can
subscript and assign to these views with square brackets like a normal array type.This is a lot of information to take in all at once, but it will hopefully become clearer when we look at some usages
of runParted.
Here's our print method. Note that it has to return a future because runParted is asynchronous:
fn print() -> Fut<void> {
return PartList::runParted(1, this.data.read()) |data|: // Line A
for i in 0..this.m {
for j in 0..this.n {
Out::print(data[i * this.n + j] + " ");
}
Out::println("");
}
}
On line A, we call runParted. First, we pass in a chunk specifier of 1, which means that we want to
create one partition (i.e., we are not parallelizing this operation at all). We also omit the ghost zone argument,
since we are not performing a stencil computation.
Next, we pass in a PartIntent that specifies that we want to read from the data field.
Then, we attach a suffix lambda specifying what to do with each partition. In this case, we simply iterate over the elements of the partition and print them to the console.
Finally, we return the future returned by runParted to satisfy the must-consume rule.
Let's make sure everything is working by adding a main class with an entry function and executing the program. Here's our code so far:
namespace program;
import gu4::Out;
vault class Matrix {
m: int; // num rows
n: int; // num cols
data: PartList<real>;
private new(m: int, n: int, initFn: (int, int) -> real) {
this.m = m;
this.n = n;
this.data = PartList::of(m * n) |i| {
yield initFn(i / n, i % n);
};
}
// HACK: Currently can't pass suffix lambda to a constructor directly.
static fn of(m: int, n: int, initFn: (int, int) -> real) -> Matrix {
return new Matrix(m, n, initFn);
}
fn print() -> Fut<void> {
return PartList::runParted(1, this.data.read()) |data|:
for i in 0..this.m {
for j in 0..this.n {
Out::print(data[i * this.n + j] + " ");
}
Out::println("");
}
}
}
value class Program {
async entry fn main() -> Fut<void> {
let a = Matrix::of(3, 3) |i, j| { yield 0.0 + i + j; }; // Add 0.0 to each element to coerce from int to real.
Out::println("Matrix A:");
a.print().then():
async return;
}
}
If we execute this, we should see the following output:
Matrix A:
0.0 1.0 2.0
1.0 2.0 3.0
2.0 3.0 4.0
Now that we have a working matrix type, let's write a method to add two matrices together. In this case, rather than modifying the first matrix in-place, we will return a new matrix containing the sum.
Here's the signature for our method:
async fn add(other: Matrix) -> Fut<Matrix> {
/* Implement me! */
}
First, we need to check that both matrices have the same dimensions. We can do this by checking that the m and n
fields of both matrices are equal:
if this.m != other.m || this.n != other.n {
Error::die("Mismatched matrix dimensions: " + this.m + "x" + this.n + " + " + other.m + "x" + other.n);
}
The Error::die function will terminate the program with an error message. We need to import Error; to use it.
Next, we need to create a new matrix to hold the result. We can do this by calling the Matrix::of method to obtain a
new Matrix, passing in the dimensions of the result and a function that initializes each element of the result
to zero:
let result = Matrix::of(this.m, this.n) |i, j| {
yield 0.0;
};
Let's also decide how many partitions we want to create. This is essentially the number of threads we want to use
to compute the result. For simplicity, we will take the larger of m and n:
let threads = Math::max(this.m, this.n);
We need to import Math; to use Math::max.
Now, we can use runParted to compute the result in parallel. We need to read from the two addends and write to the
result. We can do this by passing in three PartIntent objects:
PartList::runParted(threads, this.data.read(), other.data.read(), result.data.write()) |self, other, result| {
result[..] = self[..] + other[..];
}
The [..] syntax is part of a feature in Guardian called auto-vectorization. It's used to concisely express loops over
elements of subscriptable types. In this case, the .. with no indices means "all elements." The compiler will generate
a loop over all elements of the views, like this:
PartList::runParted(threads, this.data.read(), other.data.read(), result.data.write()) |self, other, result| {
for i in 0..result.writableSize() {
result[i] = self[i] + other[i];
}
}
Recall that runParted returns a future, and we will need to consume it. Since this future represents the completion
of all tasks initiated by runParted, its completion means our result is ready. So, we will simply return the result:
PartList::runParted(threads, this.data.read(), other.data.read(), result.data.write()) |self, other, result| {
result[..] = self[..] + other[..];
}.then():
async return result;
Here's the full program with the add function and an expanded main method:
namespace program;
import gu4::Out;
import gu4::Error;
import gu4::Math;
vault class Matrix {
m: int; // num rows
n: int; // num cols
data: PartList<real>;
private new(m: int, n: int, initFn: (int, int) -> real) {
this.m = m;
this.n = n;
this.data = PartList::of(m * n) |i| {
yield initFn(i / n, i % n);
};
}
// HACK: Currently can't pass suffix lambda to a constructor directly.
static fn of(m: int, n: int, initFn: (int, int) -> real) -> Matrix {
return new Matrix(m, n, initFn);
}
async fn add(other: Matrix) -> Fut<Matrix> {
if this.m != other.m || this.n != other.n {
Error::die("Mismatched matrix dimensions: " + this.m + "x" + this.n + " + " + other.m + "x" + other.n);
}
let result = Matrix::of(this.m, this.n) |i, j| {
yield 0.0;
};
let threads = Math::max(this.m, this.n);
PartList::runParted(threads, this.data.read(), other.data.read(), result.data.write()) |self, other, result| {
result[..] = self[..] + other[..];
}.then():
async return result;
}
fn print() -> Fut<void> {
return PartList::runParted(1, this.data.read()) |data|:
for i in 0..this.m {
for j in 0..this.n {
Out::print(data[i * this.n + j] + " ");
}
Out::println("");
}
}
}
value class Program {
async entry fn main() -> Fut<void> {
let a = Matrix::of(3, 3) |i, j| { yield 0.0 + i + j; };
let b = Matrix::of(3, 3) |i, j| { yield 0.0 + i - j; };
a.add(b).then() |c|:
Out::println("Matrix A:");
a.print().then():
Out::println("Matrix B:");
b.print().then():
Out::println("Matrix A + B:");
c.print().then():
async return;
}
}
Here's what we expect to see when we run the program:
Matrix A:
0.0 1.0 2.0
1.0 2.0 3.0
2.0 3.0 4.0
Matrix B:
0.0 -1.0 -2.0
1.0 0.0 -1.0
2.0 1.0 0.0
Matrix A + B:
0.0 0.0 0.0
2.0 2.0 2.0
4.0 4.0 4.0
All done!
In the next section, we'll look at how to use PartList to implement a more
complicated parallel computation.