Link to previous meetings. Topics:
Fixed Grid Performance, Revisited How is the fixed grid code scaling? We can base our assessment off of a few metrics. For starters, if we look at this image from a FLASH paper, we see that their implementation of paramesh gets them nearly linear fixed-grid scaling up through 64 processors, with degraded efficiency after that. (They also include the AMR scaling in the figure; I especially like how they stop one plot point before the place where the AMR/Fixed Grid lines probably would cross.) We have examined the following to address scaling:
(The number of processors were chosen to be squares because the domain was square.) If we then recreate the "FIg. 9" above with our data, we get the following. We can see a couple things:
To illustrate the first point, we can scale each problem size's runtimes by the single-processor runtime. This gives us the following (the colors are the same):
This reinforces the idea (similar to Andy's comment about the LLNL code being "efficient above 323") that the larger problem per node, the better. So how is the code scaling? This isn't a straightforward question to answer, due to the low number of datapoints. If we go ahead and fit the data with a power law, the time (t) is a function of the number of processors (np) to some power (k) dependent on the problem size (ps), For linear scaling klinear=1. Poor scaling is k>1. NOTE: Also, I've included the single-processor data. This is because otherwise the trends for ≥256² don't make sense—they give k<1, which would be "better than perfect," which we know we aren't. The fits result in an average value of k=1.17 . This should be taken with a large grain of salt, due to the questionable trustworthiness of the fits, illustrated by the figure below.
The very large uncertainties means that we really can't say anything more specific at present than "We're doing worse than linear." Finally, for comparison's sake we can look again at the FLASH figure above and overplot our data, So, we're not doing too bad in fixed grid. At 64 processors, for the 2048 problem we're about a factor of 2 above linear, and a factor of ~1.4 above FLASH. This problem was hydro with cooling. |