(<< back)
Benchmarking Using a 2D Finite Difference CodePresented here is a 2D finite difference code that is pretty handy for testing and benchmarking clusters. There are some reasons why it is handy:
OverviewThis code is a parallel (using MPI) implementation of 2d heat conduction, finite difference over a rectangular domain using the following methods:
It should be noted that this is one of many types of communication schemes. This implementation is not meant to be better than any particular one for testing a cluster environment, but it is useful nonetheless. PlatformsThe code has been run on various platforms. As far as I know it is standard with minimal standard library dependencies. In other words, if there is a C compiler and MPI libs, then it should compile and run.Getting and Compiling 2dheat% svn co https://svn.loni.org/repos/2dheat/trunk 2dheat # LONI login requiredIf that doesn't work, download it here, though this most likely will not be current. Compiling is simple. It requires an MPI library and a C compiler. The example below uses an MPI library wrapper common to most Linux clusters (using MPICH, etc): % mpicc -lm 2dheat.c -o 2dheat.xThe file "./compile.sh" is set up to compile using "mpicc" or "mpcc_r" (AIX), whichever is found first. UsageThese options have been designed with the thought of incorporating the executable into a script that iterates over various grid dimensions and number of processors to get an accurate picture of what is happening on the cluster being tested.-h [1-9][\0-9]* : height of grid (number of nodes); default is 50 -w [1-9][\0-9]* : width of grid (in number of nodes); default is 50 -m [123] : 1 = Jacobi, 2 = Gauss-Seidel, 3 = SOR -e [\.1-9][\0-9]* : convergence criteria; default is 0.1 -t : output time to convergence only (in seconds); overidden by -v -v : full verbose output - task assignment, L2 norm for each iteration, etcExample: % mpirun -np 64 ./2dheat.x -h 1000 -w 1000 -tThis says run 2dheat on a 1000x1000 node grid and report the time in seconds at the end. Implementation Details and Publications
BenchmarkingThis code has been adapted to be used as benchmarking tool. Command line arguments allow one to control the size of the domai n (width and height), the solved user, the convergence criteria, as well as various levels of verbosity.There are several tunable parameters that may control certain aspects of the execution. This is important because with out the ability to affect the width of the grid, this code becomes communications bound as the number of rows/proc goes to 1. This means that at some point the amount of communication will dominate the time to solution. By controlling the width of the grid, one may increase the number of computations per row. The following highlights some parameters of control. To control:
#!/bin/sh
METHOD=3
EPSILON=3.0
HEIGHT="164 256 512 1024" # 1024 2048 4096"
WIDTH="1 2 4 8 16 32 64 128 256 512" # 1024 2048 4096"
PROCS="1 2 4 8 16 32 64 96" # 64 128 256 512 1024"
echo "$METHOD $EPSILON"
echo "Widths $WIDTH"
echo "Heights $HEIGHT"
echo "Procs $PROCS"
echo
echo " W H P H/P W/H Time(s) Spd Up %Eff"
for w in ${WIDTH}; do
for h in ${HEIGHT}; do
serial=0
for p in ${PROCS}; do
for interation in 1 2 3 4 5; do
out=`mpirun -np ${p} ../bin/2dheat.x -t -w ${w} -h ${h} -m ${METHOD} -e ${EPSILON}`
# capture serial time
if [ 1 -eq ${p} ]; then
serial=$out
fi
# calculate speed up
s=`perl -e "print ($serial / $out)"`
# calculate efficiency
e=`perl -e "print ($s / $p)"`
# rows per cpu
r=`perl -e "print ($h / $p)"`
# w / h
c=`perl -e "print ($w / $p)"`
printf "%5d %5d %3d %10.5f %10.5f %15.9f %15.9f %15.9f\n" $w $h $p $r $c $out $s $e
done
done
done
done
The script outputs the specific details in an easy to read and analyze format:
Widths 1 2 4 8 16 32 64 128 256 512
Heights 164 256 512 1024
Procs 1 2 4 8 16 32 64 96
W H P H/P W/H Time(s) Spd Up %Eff
1 164 1 164.00000 1.00000 0.000067000 1.000000000 1.000000000
1 164 1 164.00000 1.00000 0.000084000 1.000000000 1.000000000
1 164 1 164.00000 1.00000 0.000076000 1.000000000 1.000000000
1 164 1 164.00000 1.00000 0.000067000 1.000000000 1.000000000
1 164 1 164.00000 1.00000 0.000067000 1.000000000 1.000000000
1 164 2 82.00000 0.50000 0.000057000 1.175438596 0.587719298
1 164 2 82.00000 0.50000 0.000055000 1.218181818 0.609090909
1 164 2 82.00000 0.50000 0.000054000 1.240740741 0.620370370
1 164 2 82.00000 0.50000 0.000052000 1.288461538 0.644230769
1 164 2 82.00000 0.50000 0.000052000 1.288461538 0.644230769
1 164 4 41.00000 0.25000 0.000077000 0.870129870 0.217532468
1 164 4 41.00000 0.25000 0.000082000 0.817073171 0.204268293
1 164 4 41.00000 0.25000 0.000072000 0.930555556 0.232638889
1 164 4 41.00000 0.25000 0.000079000 0.848101266 0.212025316
1 164 4 41.00000 0.25000 0.000079000 0.848101266 0.212025316
1 164 8 20.50000 0.12500 0.000240000 0.279166667 0.034895833
1 164 8 20.50000 0.12500 0.000225000 0.297777778 0.037222222
1 164 8 20.50000 0.12500 0.000234000 0.286324786 0.035790598
1 164 8 20.50000 0.12500 0.000219000 0.305936073 0.038242009
1 164 8 20.50000 0.12500 0.000247000 0.271255061 0.033906883
1 164 16 10.25000 0.06250 0.000321000 0.208722741 0.013045171
1 164 16 10.25000 0.06250 0.000335000 0.200000000 0.012500000
1 164 16 10.25000 0.06250 0.000314000 0.213375796 0.013335987
1 164 16 10.25000 0.06250 0.000321000 0.208722741 0.013045171
1 164 16 10.25000 0.06250 0.000331000 0.202416918 0.012651057
1 164 32 5.12500 0.03125 0.001415000 0.047349823 0.001479682
1 164 32 5.12500 0.03125 0.001426000 0.046984572 0.001468268
1 164 32 5.12500 0.03125 0.001405000 0.047686833 0.001490214
1 164 32 5.12500 0.03125 0.001428000 0.046918768 0.001466211
1 164 32 5.12500 0.03125 0.001413000 0.047416844 0.001481776
1 164 64 2.56250 0.01562 0.001055000 0.063507109 0.000992299
...
There is a script in scripts/bench.sh. To use it, adjust the number of procs to use, height, and width. When running, one may use tee to view the output and record it simulateously.
% ./bench.sh | tee bench.out Platform ResultsThe following is a list of platforms and the results. The script used to run the results is included. In order for results submitted by others to be included, one must provide the result file and the script used to run the benchmark.FutureRegarding this code, what combinations of number of processors, heights, and widths will yield some good measure or view of a cluster's capabilities remains an open question. Furthermore, how can this information be visualized to provide a picture of what is going on? One thought is to whip up a simple gnuplot script (or set of them) to generate a series of plots.Also, there is nothing saying that the methods employed in this code have to solved a 2d heat conduction equation - i.e., arbitrary methods may be employed to create more computationally intense kernels. It would also be helpful to provide performance characteristics of the code to provide some ways to compare various clusters and architectures. (<< back) |
| Copyright © 2004-2009. All Rights Reserved. The statements and opinions included in these pages are those of me only. Any statements and opinions included in these pages are not those of Louisiana State University or the LSU Board of Supervisors. |