Star-P Blog
    

Parallel Lounge: Parallel Computing Blog for Engineers, Scientists, Analysts

Current Articles | RSS Feed RSS Feed

What is “Productivity” in High Performance Computing?

Digg digg it | Reddit reddit | del.icio.us del.icio.us | StumbleUpon StumbleUpon 

At Interactive Supercomputing, our mission is to improve productivity in high performance technical computing. Of course, “productivity” means very different things to different people, depending on the nature of the project, the model or simulation, available hardware and programming resources, available time, capital, etc. What I’d like to do here is outline some common scenarios and definitions we have seen from our customer base, and solicit your feedback on how you define productivity.

Although each situation is unique, as we see more and more customers and prospect scenarios the goals they seek seem to naturally fall into one of roughly five categories:

1. “Breakthrough acceleration at minimal effort.” These applications run too slow on a desktop, and faster computation – say, 10X faster – would enable a breakthrough. In one application we’ve seen, it took 45 minutes to analyze an MRI scan of a brain. Cutting this down to just 5 minutes not only can alleviate some patient anxiety in waiting for a diagnosis, but is also quick enough to make another scan if needed while the patient is still in the MRI exam. And in a very different application – financial portfolio optimization – we saw that accelerating the rebalancing of a portfolio enabled many more portfolios to be optimized in time for next day’s trading.

2. “Working with bigger data sets”: Similar to the “breakthrough acceleration” scenario above, the goal here is the ability to run a larger model – one that may not fit on a desktop. Even with 64-bit memory addressing, the required memory footprint may be too much for a top-of-the-line workstation (that today might top out at 8-16 GB of RAM), and the distributed memory of a cluster may be a practical alternative.

3. “Time-to-solution” on an important project. This can be on a critical path for a project, and compressing the calendar time is key. Because application development often makes up the vast majority of such projects (sometimes >75% of the calendar time), compressing the programming workflow has enormous leverage on the project goals.

4. Time and effort required to “algorithmic exploration.” This is an interesting one. We see cases when a customer is unsure of the final algorithm, but wants to be able to prototype something quickly, run the model at full scale, and then play with the algorithms and data sets. Roughly speaking, when computation times take too long, there is no time to interact with the problem – but at high performance speeds, interactivity becomes possible not just on the data itself, but even on the approaches to modeling the data.

5. “X Flops at Y hardware efficiency.” Some codes are built to be run nearly continuously for years (even decades) – for many different iterations, new input parameters, etc. We’ve heard of codes at national labs that would literally take over 10,000 years to run on a serial computer. They must therefore be run in parallel, on large machine, with high efficiency.

So, depending on the nature of the project, available resources and constraints, scientists and engineers are turning to high performance computers with different goals in mind, and thus different expectations and definitions of productivity.

Defining Productivity
Conceptually, one way to define productivity is as follows:
          Productivity = (application performance) / (application programming effort)

Qualitatively, for a given programming approach, the more effort is expended, the more performance can be achieved. The exact nature of this, of course, depends on the approach:

1. MPI: MPI programming typically involves long development periods, and discrete jumps in application performance as new revisions are completed. It can take months and even years before the first rev of a complex MPI application is completed. And it may take weeks or months to tune the performance further. That said, MPI codes typically achieve good performance, arguably at a high development cost (in terms of both time and effort).

2. Desktop Tools: On the other hand, very high level languages (such as Python, or MATLAB® from The MathWorks), are ideally suited for rapid application development and interactive algorithm refinement, and a good deal of performance improvement techniques (memory pre-allocation, vectorization, etc.) can be done relatively quickly, often within hours or days. But on the desktop these tools increasingly run out of steam, driving the scientists and analysts towards parallel servers and clusters.

3. Star-P: Which brings us to the new parallel programming model enabled by software such as ours. The idea is to use highly expressive tools – such as MATLAB®, Python and R – and with a handful of language extensions execute the simulations on parallel servers and clusters. This rapidly delivers a good fraction of a parallel computer’s capability, in terms of both speedup, and the ability to work with larger data sets. But it comes at a price: the first iteration – although ready months ahead of a corresponding MPI implementation – may not perform as efficiently. In some cases that may not be critical, if users can quickly get the advantages of the parallel processors and big memory of an HPC. And if it is important, serial and parallel performance can be optimized – with additional effort, and can approach MPI efficiencies. In fact, if it’s sufficiently important, efficient MPI codes can be plugged in via the SDK.

So, in each case, there is some trade-off between effort and achievable performance, although the three approaches may have qualitatively different curves:



Questions for YOU...
So with all that said, here are some questions we would love to hear from you on:

• How do you define and measure performance? Speed of calculation? Ability to run a model you previously could not? Something else?

• How do you define productivity in high performance technical computing?

• What might be a reasonable trade-off between performance and human effort? For example, which would you prefer, and for what kinds of projects:
    1. The ability to run 5X faster than the desktop after 1 day of coding
    2. The ability to run 25X faster than the desktop after 1 week of coding
    3. The ability to run 125X faster than the desktop after 3 weeks of coding



• In a similar manner, if you come at this from a perspective of peak hardware efficiency, what might be a reasonable trade-off between performance and human effort? Again - which would you prefer, and for what kinds of projects:
    1. The ability to run at 90% hardware efficiency after 3 months of coding
    2. The ability to run at 50% hardware efficiency after 3 weeks of coding
    3. The ability to run at 30% hardware efficiency after 3 days of coding


• What are the key determinants in placing a project or target goal along the curve?
    - Parallel programming experience of scientist/engineer/team?
    - Size of programming team?
    - Importance of accelerating time-to-solution?
    - Phase of the project (early algorithm exploration versus later production runs)?
    - Importance of interactive algorithm development?
    - Memory required for the computation?
    - Available hardware resources?
    - Hardware efficiency / utilization?
    - Something else?


Posted by Ilya Mirman

COMMENTS

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.
 
 

Subscribe by Email

Your email:
 
 

Latest Posts

 
 

Browse by Tag

 
 

Most Popular Posts