Tour: Parallel Fibonacci

Generating the Fibonacci sequence is a classic toy problem in parallel programming. For the uninitiated, the (n)th number in the Fibonacci sequence fib(n) is defined as:

fib(n) = fib(n-1) + fib(n-2)
fib(0) = 0
fib(1) = 1

This generates the following sequence: 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 ...

The most efficient way to generate this sequence (aside from the analytical solution) is an iterative algorithm, but we will use the recursive formulation instead specifically for its inefficiency.

Fib in Serial

Let's start with a naïve serial implementation in Guardian.

namespace program;

import gu4::Out;

value class Fib {
    static fn fib(n: int) -> int {
        if n < 2 {
            return n;
        }

        return fib(n - 1) + fib(n - 2);
    }

    entry fn main() -> void {
        let first = 0;
        let last  = 20;

        for i in first..=last {    // Line A
            let f = fib(i);
            Out::println("fib(" + i  + ") = " + f);
        }
    }
}

This code should be easy to understand. Note that on line A, the equals sign in the range first..=last means that the range is inclusive of both the lower and upper bounds. In other words, we are iterating over the closed interval [first, last]. Without the equals sign, we would be iterating over the half-open interval [first, last).

Let's run it:

fib(0) = 0
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765

Fib in Parallel

So, how do we parallelize this code? An easy target is the final line of the fib function: return fib(n - 1) + fib(n - 2);. At this point, the execution tree bifurcates:

                  fib(8)
              /            \
          fib(7)          fib(6)
         /      \        /      \
    fib(6)     fib(5)  fib(5)  fib(4)
   /     \     /    \   /    \  /    \
fib(5) fib(4) ...  ... ... ... ... ...

Each call to fib(n) spawns two more calls, leading to an exponential number of function calls. This makes it a good candidate for parallelization since each subtree can potentially be computed independently. In serial, for any given node, the program must first walk the left subtree, then the right subtree, and then combine the results. In parallel, we can submit the right subtree to be computed in another thread while the current thread visits the left subtree. Of course, since each subtree (except those for which n < 2) implies an asynchronous computation, any call to fib(n) needs to return a Fut.

Let's start by adding a new function, poolFib. This function will call fib in another thread by use of Pool::run:

static fn poolFib(n: int) -> Fut<int> {
   let result = new Fut<int>();

    Pool::run() {
        fib(n).then() |f|:    // fib(n) doesn't return a Fut yet, but let's pretend it does...
        result.complete(f);
    }

    return result;
}

This code exhibits a common pattern: we create a new Fut, submit a task to the pool, and then return the Fut. The task we submit to the pool will asynchronously compute the result of fib(n) and complete the Fut with the result. The overall result is a function that returns a Fut<int> immediately, then asynchronously completes the Fut with the result at a later time.

Because this pattern is so common, Guardian has a mechanism to reduce the boilerplate associated with constructing, completing, and returning Futs: the async keyword. If we declare a function as async, then that function will implicitly construct and return a Fut of the appropriate type. This Fut is known as the implicit future of the async function. To complete the implicit future, we use async return instead of return. This will work anywhere in the body of the function, even inside a lambda.

Here's poolFib rewritten as an async function:

async static fn poolFib(n: int) -> Fut<int> {
    Pool::run():
    fib(n).then() |f|:
    async return f;
}

We no longer need to write or think about the ceremony of returning our result. We can instead focus entirely on expressing the computation itself.

Of course, now we need to modify the fib function to return a Fut<int> instead of an int. Let's begin by making it async and replacing return with async return:

async static fn fib(n: int) -> Fut<int> {
    if n < 2 {
        async return n;
    }

    async return fib(n - 1) + fib(n - 2);
}

Here's our current program:

namespace program;

import gu4::Out;

value class Fib {
    async static fn poolFib(n: int) -> Fut<int> {
        Pool::run():
        fib(n).then() |f|:
        async return f;
    }

    async static fn fib(n: int) -> Fut<int> {
        if n < 2 {
            async return n;
        }

        async return fib(n - 1) + fib(n - 2);
    }

    entry fn main() -> void {
        let first = 0;
        let last  = 20;

        for i in first..=last {
            let f = fib(i);
            Out::println("fib(" + i  + ") = " + f);
        }
    }
}

Let's try to compile it:

[Type Inference Pass] ERROR IN scratch/Program.gu4: Cannot perform operation '+' on types value class _::Fut<int> and value class _::Fut<int>.
See near [17:21]

async return fib(n - 1) + fib(n - 2);
             ^^^^^^^^^^^^^^^^^^^^^^^

We got a type error! Now that fib returns a future, we can no longer directly add the two results. We know that we can use .then() to extract the result of a future, so let's try that:

async static fn fib(n: int) -> Fut<int> {
    if n < 2 {
        async return n;
    }
    
    // Important that we get both futures first! If we call fib(n - 2) after a .then(), then that computation
    // will wait until fib(n - 1) completes, which is unnecessary.
    let f1 = fib(n - 1);
    let f2 = fib(n - 2);

    f1.then() |f1|:
    f2.then() |f2|:
        
    async return f1 + f2;
}

Let's try to compile the program with this change:

[Type Inference Pass] ERROR IN scratch/Program.gu4: Future returned by the call to '_::Fut::then()' is discarded.
Hint: You may want to pass this future to a function, call .then(), or assign with `let`.
See near [20:8]

f1.then() |f1|:
^^^^^^^^^^^^^^^

We have another error. The words "future is discarded" let us know that we have run afoul of the must-consume rule. Even though the second continuation f2.then() |f2|: contains an async return, the first continuation f1.then() |f1|: does not. While this limitation may be relaxed in a future version of Guardian, there's a better way to express this computation anyway: Fut::all. This function takes an array of futures and returns a future that completes when every provided future completes, giving us access to the results of all the futures. Let's try it:

async static fn fib(n: int) -> Fut<int> {
    if n < 2 {
        async return n;
    }

    let f1 = fib(n - 1);
    let f2 = fib(n - 2);

    Fut::all([f1, f2]).then() |fs|:

    async return fs[0] + fs[1];
}

This now compiles! Finally, we need to replace one of the calls to fib with poolFib so that we actually execute in parallel:

async static fn fib(n: int) -> Fut<int> {
    if n < 2 {
        async return n;
    }

    let f1 = poolFib(n - 1);
    let f2 = fib(n - 2);

    Fut::all([f1, f2]).then() |fs|:

    async return fs[0] + fs[1];
}

Here's the whole program (we've also replaced fib with poolFib in the main method):

namespace program;

import gu4::Out;

value class Fib {
    async static fn poolFib(n: int) -> Fut<int> {
        Pool::run():
        fib(n).then() |f|:
        async return f;
    }

    async static fn fib(n: int) -> Fut<int> {
        if n < 2 {
            async return n;
        }

        let f1 = poolFib(n - 1);
        let f2 = fib(n - 2);

        Fut::all([f1, f2]).then() |fs|:

        async return fs[0] + fs[1];
    }

    entry fn main() -> void {
        let first = 0;
        let last  = 20;

        for i in first..=last {
            let f = poolFib(i);
            Out::println("fib(" + i  + ") = " + f);
        }
    }
}

Can you spot the bug? Let's run it:

fib(0) = java.util.concurrent.CompletableFuture@75b84c92[Completed normally]
fib(1) = java.util.concurrent.CompletableFuture@232204a1[Completed normally]
fib(2) = java.util.concurrent.CompletableFuture@5a07e868[Completed normally]
fib(3) = java.util.concurrent.CompletableFuture@5acf9800[Not completed]
fib(4) = java.util.concurrent.CompletableFuture@5ca881b5[Not completed]
fib(5) = java.util.concurrent.CompletableFuture@372f7a8d[Not completed]
fib(6) = java.util.concurrent.CompletableFuture@2f92e0f4[Not completed]
fib(7) = java.util.concurrent.CompletableFuture@28a418fc[Not completed]
fib(8) = java.util.concurrent.CompletableFuture@5305068a[Not completed]
fib(9) = java.util.concurrent.CompletableFuture@1f32e575[Not completed]
fib(10) = java.util.concurrent.CompletableFuture@279f2327[Not completed]
fib(11) = java.util.concurrent.CompletableFuture@2ff4acd0[Not completed]
fib(12) = java.util.concurrent.CompletableFuture@54bedef2[Not completed]
fib(13) = java.util.concurrent.CompletableFuture@5caf905d[Not completed]
fib(14) = java.util.concurrent.CompletableFuture@27716f4[Not completed]
fib(15) = java.util.concurrent.CompletableFuture@8efb846[Not completed]
fib(16) = java.util.concurrent.CompletableFuture@2a84aee7[Not completed]
fib(17) = java.util.concurrent.CompletableFuture@a09ee92[Not completed]
fib(18) = java.util.concurrent.CompletableFuture@30f39991[Not completed]
fib(19) = java.util.concurrent.CompletableFuture@452b3a41[Not completed]
fib(20) = java.util.concurrent.CompletableFuture@4a574795[Not completed]

Well, that doesn't look right. In the main method, we forgot to extract the value from the Fut returned by calling poolFib; the must-consume rule doesn't catch us because, technically, we've passed it to a function (Out::println).

Let's extract the result with then:

entry fn main() -> void {
    let first = 0;
    let last  = 20;

    for i in first..=last {
        poolFib(i).then() |f|:
        Out::println("fib(" + i  + ") = " + f);
    }
}

Can you spot where the compiler will complain? Let's compile it:

[Type Inference Pass] ERROR IN scratch/Program.gu4: Future returned by the call to '_::Fut::then()' is discarded.
Hint: You may want to pass this future to a function, call .then(), or assign with `let`.
See near [30:12]

poolFib(i).then() |f|:
^^^^^^^^^^^^^^^^^^

When we continue a future with then(), that also generates a future. In order for the program to have a sound termination condition, we need to consume this future as well. With the tools we've learned about so far, there isn't an obvious way to do that.

Let's zoom out a bit. What should be the termination condition for the program? We want to see the result of each successive call to poolFib, we want to see them in order, and we want the program to terminate as soon as the ultimate result is available.

We could use a latch to count the number of results we've seen, but there's actually a better way. If you look at the way main is structured, you'll notice that the for loop fires off a sequence of tasks. We want the program to finish not just when the for loop finishes, but when the last task enqueued by that loop finishes. For this situation, Guardian provides two special functions: Loop::marchingFor and Loop::parFor. These functions work similarly to a for loop, but the loop body is given by a suffix lambda which yields a Fut for each iteration. Once all these Futs have completed, the Fut returned by the call to Loop::marchingFor or Loop::parFor will complete.

The difference between Loop::marchingFor and Loop::parFor is that each iteration of marchingFor waits for the previous iteration to complete before starting the next iteration. In parFor, all iterations are allowed to run concurrently. In our case, since we want to see the results in order, we can use marchingFor.

Let's start by just performing a 1:1 swap, replacing the for loop with a call to marchingFor:

entry fn main() -> void {
    let first = 0;
    let last  = 20;

    Loop::marchingFor(first, last + 1) |i| {
        poolFib(i).then() |f|:
        Out::println("fib(" + i  + ") = " + f);
    }
}

The first two arguments to marchingFor are the lower and upper bounds of the half-open interval to iterate over. Since we are coming from a closed range constructed from the ..= expression, we need to add 1 to the upper bound. The suffix lambda takes the iteration number as an argument and yields a Fut for that iteration. We haven't handled that last part yet, so let's yield the Fut returned by the poolFib(i).then() call:

entry fn main() -> void {
    let first = 0;
    let last  = 20;

    Loop::marchingFor(first, last + 1) |i| {
        yield poolFib(i).then() |f|:
        Out::println("fib(" + i  + ") = " + f);
    }
}

Let's compile:

[Type Inference Pass] ERROR IN scratch/Program.gu4: Future returned by the call to '_::Loop::marchingFor()' is discarded.
Hint: You may want to pass this future to a function, call .then(), or assign with `let`.
See near [29:8]

Loop::marchingFor(first, last + 1) |i| {
^

Almost there. Remember that marchingFor returns a Fut, and so we need to consume it. In this case, since the completion of this future is the termination condition for our program, we simply return it from main:

entry fn main() -> Fut<void> {
    let first = 0;
    let last  = 20;

    return Loop::marchingFor(first, last + 1) |i| {
        yield poolFib(i).then() |f|:
        Out::println("fib(" + i  + ") = " + f);
    };
}

Here's the whole program:

namespace program;

import gu4::Out;

value class Fib {
    async static fn poolFib(n: int) -> Fut<int> {
        Pool::run():
        fib(n).then() |f|:
        async return f;
    }

    async static fn fib(n: int) -> Fut<int> {
        if n < 2 {
            async return n;
        }

        let f1 = poolFib(n - 1);
        let f2 = fib(n - 2);

        Fut::all([f1, f2]).then() |fs|:

        async return fs[0] + fs[1];
    }

    entry fn main() -> Fut<void> {
        let first = 0;
        let last  = 20;

        return Loop::marchingFor(first, last + 1) |i| {
            yield poolFib(i).then() |f|:
            Out::println("fib(" + i  + ") = " + f);
        };
    }
}

Let's run it:

fib(0) = 0
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765

Success! Feel free to adjust the first and last values to see how the program behaves.

In the next section, we'll look at how to use PartList to implement parallel matrix addition.

Tour: Parallel Fibonacci

Fib in Serial

Fib in Parallel

Next