February 27th, 2008

Link to previous meetings.

Topics:

Fixed Grid Performance, Revisited

Fixed Grid Performance, Revisited

How is the fixed grid code scaling? We can base our assessment off of a few metrics. For starters, if we look at this image from a FLASH paper,

we see that their implementation of paramesh gets them nearly linear fixed-grid scaling up through 64 processors, with degraded efficiency after that. (They also include the AMR scaling in the figure; I especially like how they stop one plot point before the place where the AMR/Fixed Grid lines probably would cross.)

We have examined the following to address scaling:

Five different problem sizes: 128x128, 256x256, 512x512, 1024x1024, and 2048x2048.
Four different No. Processors: 1, 4, 16, 64.

(The number of processors were chosen to be squares because the domain was square.) If we then recreate the "FIg. 9" above with our data, we get the following.

We can see a couple things:

Deviation from linear is apparent prior to 64 processors.
The deviation decreases with increasing problem size.

To illustrate the first point, we can scale each problem size's runtimes by the single-processor runtime. This gives us the following (the colors are the same):

The numbers in parenthesies next to the lines are the √N cells per processor for 4, 16, and 64 processors.

This reinforces the idea (similar to Andy's comment about the LLNL code being "efficient above 32³") that the larger problem per node, the better.

So how is the code scaling? This isn't a straightforward question to answer, due to the low number of datapoints. If we go ahead and fit the data with a power law, the time (t) is a function of the number of processors (np) to some power (k) dependent on the problem size (ps),

t(ps) = np^-k_ps(ps) .

For linear scaling k_linear=1. Poor scaling is k>1. NOTE: Also, I've included the single-processor data. This is because otherwise the trends for ≥256² don't make sense—they give k<1, which would be "better than perfect," which we know we aren't.

The fits result in an average value of k=1.17 . This should be taken with a large grain of salt, due to the questionable trustworthiness of the fits, illustrated by the figure below.

Dashed lines correspond to 95% confidence intervals (i.e., error bars).

The very large uncertainties means that we really can't say anything more specific at present than "We're doing worse than linear."

Finally, for comparison's sake we can look again at the FLASH figure above and overplot our data,

So, we're not doing too bad in fixed grid. At 64 processors, for the 2048 problem we're about a factor of 2 above linear, and a factor of ~1.4 above FLASH.

This problem was hydro with cooling.