At
Interactive Supercomputing, our mission is to improve productivity in
high performance technical computing. Of course, “productivity” means
very different things to different people, depending on the nature of
the project, the model or simulation, available hardware and
programming resources, available time, capital, etc. What I’d like to
do here is outline some common scenarios and definitions we have seen
from our customer base, and solicit your feedback on how you define
productivity.
Although each situation is unique, as we see more and
more customers and prospect scenarios the goals they seek seem to
naturally fall into one of roughly five categories:
1. “Breakthrough
acceleration at minimal effort.” These applications run too slow on a
desktop, and faster computation – say, 10X faster – would enable a
breakthrough. In one application we’ve seen, it took 45 minutes to
analyze an MRI scan of a brain. Cutting this down to just 5 minutes not
only can alleviate some patient anxiety in waiting for a diagnosis, but
is also quick enough to make another scan if needed while the patient
is still in the MRI exam. And in a very different application –
financial portfolio optimization – we saw that accelerating the
rebalancing of a portfolio enabled many more portfolios to be optimized
in time for next day’s trading.
2. “Working with bigger data sets”:
Similar to the “breakthrough acceleration” scenario above, the goal
here is the ability to run a larger model – one that may not fit on a
desktop. Even with 64-bit memory addressing, the required memory
footprint may be too much for a top-of-the-line workstation (that today
might top out at 8-16 GB of RAM), and the distributed memory of a
cluster may be a practical alternative.
3. “Time-to-solution” on an
important project. This can be on a critical path for a project, and
compressing the calendar time is key. Because application development
often makes up the vast majority of such projects (sometimes >75% of
the calendar time), compressing the programming workflow has enormous
leverage on the project goals.
4. Time and effort required to
“algorithmic exploration.” This is an interesting one. We see cases
when a customer is unsure of the final algorithm, but wants to be able
to prototype something quickly, run the model at full scale, and then
play with the algorithms and data sets. Roughly speaking, when
computation times take too long, there is no time to interact with the
problem – but at high performance speeds, interactivity becomes
possible not just on the data itself, but even on the approaches to
modeling the data.
5. “X Flops at Y hardware efficiency.” Some codes
are built to be run nearly continuously for years (even decades) – for
many different iterations, new input parameters, etc. We’ve heard of
codes at national labs that would literally take over 10,000 years to
run on a serial computer. They must therefore be run in parallel, on
large machine, with high efficiency.
So, depending on the nature of
the project, available resources and constraints, scientists and
engineers are turning to high performance computers with different
goals in mind, and thus different expectations and definitions of
productivity.
Defining Productivity
Conceptually, one way to define productivity is as follows:
Productivity = (application performance) / (application programming effort)
Qualitatively,
for a given programming approach, the more effort is expended, the more
performance can be achieved. The exact nature of this, of course,
depends on the approach:
1. MPI: MPI programming typically involves
long development periods, and discrete jumps in application performance
as new revisions are completed. It can take months and even years
before the first rev of a complex MPI application is completed. And it
may take weeks or months to tune the performance further. That said,
MPI codes typically achieve good performance, arguably at a high
development cost (in terms of both time and effort).
2. Desktop
Tools: On the other hand, very high level languages (such as Python,
or MATLAB® from The MathWorks), are ideally suited for rapid
application development and interactive algorithm refinement, and a
good deal of performance improvement techniques (memory pre-allocation,
vectorization, etc.) can be done relatively quickly, often within hours
or days. But on the desktop these tools increasingly run out of steam,
driving the scientists and analysts towards parallel servers and
clusters.
3. Star-P: Which brings us to the new parallel programming
model enabled by software such as ours. The idea is to use highly
expressive tools – such as MATLAB®, Python and R – and with a handful
of language extensions execute the simulations on parallel servers and
clusters. This rapidly delivers a good fraction of a parallel
computer’s capability, in terms of both speedup, and the ability to
work with larger data sets. But it comes at a price: the first
iteration – although ready months ahead of a corresponding MPI
implementation – may not perform as efficiently. In some cases that may
not be critical, if users can quickly get the advantages of the
parallel processors and big memory of an HPC. And if it is important,
serial and parallel performance can be optimized – with additional
effort, and can approach MPI efficiencies. In fact, if it’s
sufficiently important, efficient MPI codes can be plugged in via the
SDK.
So, in each case, there is some trade-off between effort and
achievable performance, although the three approaches may have
qualitatively different curves:

Questions for YOU...
So with all that said, here are some questions we would love to hear from you on:
• How do you define and measure performance? Speed of calculation?
Ability to run a model you previously could not? Something else?
• How do you define productivity in high performance technical computing?
• What might be a reasonable trade-off between performance and human
effort? For example, which would you prefer, and for what kinds of
projects:
1. The ability to run 5X faster than the desktop after 1 day of coding
2. The ability to run 25X faster than the desktop after 1 week of coding
3. The ability to run 125X faster than the desktop after 3 weeks of coding

• In a similar manner, if you come at this from a perspective of peak
hardware efficiency, what might be a reasonable trade-off between
performance and human effort? Again - which would you prefer, and for
what kinds of projects:
1. The ability to run at 90% hardware efficiency after 3 months of coding
2. The ability to run at 50% hardware efficiency after 3 weeks of coding
3. The ability to run at 30% hardware efficiency after 3 days of coding

• What are the key determinants in placing a project or target goal along the curve?
- Parallel programming experience of scientist/engineer/team?
- Size of programming team?
- Importance of accelerating time-to-solution?
- Phase of the project (early algorithm exploration versus later production runs)?
- Importance of interactive algorithm development?
- Memory required for the computation?
- Available hardware resources?
- Hardware efficiency / utilization?
- Something else?