|
RSS Feed
This round's TOP500 has an NVIDIA based machine -- a first. Here's a Gigaom posting on the topic. But is this news? I'm sure the folks at NVIDIA are happy about it. I'm writing this posting after the opening evening of Supercomputing 2008. As usual, the TOP500 list has a way of grabbing attention and headlines, but I think the real news is the growing number of users that are actively considering production use of accelerators. We are watching this carefully. Star-P is an excellent way for an organization to get started with accelerators as it provides a level of abstraction for the programmer. As one customer pointed out to me, 'the fastest accelerator crown is going to change often and may vary by application, we're going to want an easy way to port codes as the hardware changes.' What better way than to write the code in a very high level language like Matlab or Python and then place the accelerator specific code in function libraries?
Article has 0 comments. Click To Read/Write Comments
Star-P On Demand was mentioned yesterday in an HPCWire article; Increasing Clouds by Michael Feldman. Certainly we are seeing a lot of interest in clouds, but there is an additional reason why we, as an ISV have some interest in their success. It turns out that many of our customers are new to HPC and have little experience buying, installing or adminstering clusters. Often times our support engineers get involved in debugging cluster administration problems. For some of these users, a pre-configured, managed by somebody else compute resource would save quite a bit of time and effort. Over time, clusters will become easier to install and manage and cloud usage models will become better understood and supported. As usual, there is an interesting race to watch.
Article has 0 comments. Click To Read/Write Comments
Ed Sperling notes that many applications, especially commercial, don't have enough parallelism to keep the current and planned multicore processors busy. Depending on the context, this is both true and not true.
- Applications can have parallelism at many different levels (e.g., for a payroll, calculating FICA and 401(k) deductions in parallel for a given employee, or calculating paychecks for different employees in parallel). As Sperling notes, there are many apps that have absurd levels of parallelism in them (e.g.,search). If there is truly no parallelism, then multicore won't help. But many apps do have significant parallelism, when viewed at the right level.
- Applications depend to varying degrees on parallel infrastructure. For instance, if the application is rendering a frame on a screen, the output is typically completely independent, and the potentially hard parts that Sperling notes of splitting the task up and putting it back together are not very hard. By contrast, handling numerous ATM transactions in parallel depends on having an infrastructure, often a database, that accepts simultaneous requests for data that is truly independent, and appropriately mediates access to data for requests that are not completely independent. For these latter apps, an organization will be highly dependent on the infrastructure vendor to enable enough internal parallelism to run the application at scale; this is often difficult for the vendor.
- Applications whose compute time scales with the amount of data received will often be good candidates for parallelism. For instance, the resolution of numerous sensors (e.g., computer tomography, mass spectrometers, and video cameras) is growing very quickly, and often the algorithms to process the data are highly parallel. These applications will readily consume huge amounts of cores.
- In my experience, organizations don't tend to rewrite existing applications for new architectures. Rather, they write new applications with new architectures in mind. The impact in the current world is that, if you're writing an application that you want to be useful for 10 years, you will probably be able to buy 1,000 times more cores than you do today, for the same price. (A separate question is whether your application needs 1,000 times more compute power, or whether it needs (say) 100 times more compute power at 1/10th today's price.) So you need to structure the program consistently with those needs. If there's not enough parallelism, expect not to have performance improvements over time. If there is enough parallelism in the abstract problem, you need tools that let you expose the parallelism fully. (Insert shameless plug for our own Star-P product here.)
Sperling is exactly right, of course, that most of today's existing applications have not been designed with parallelism in mind, and will need substantial rework to perform well in a many-core world. And while tools can help that rework, don't expect it to be a push-button transformation of today's existing code base. Instead, expect tools that help programmers express parallelism at an appropriate level, but probably in different languages/approaches than programmers are used to. In my view, this is unavoidable. The change to parallel-everywhere is fundamental and cannot be papered over.
Article has 0 comments. Click To Read/Write Comments
Greg Wilson, a software productivity researcher and long-time acquaintaince now at UToronto, forwarded me the attached invitation, which I thought you may want to accept as well. A group of us are running a survey to find out how scientists actually use computers in their day-to-day work. The blurb we're sending out is included below, and I'd be happy to provide more information. We've promised first crack at the results to "American Scientist", but will be making the data generally available. We'd be very grateful if you could spread the word through mailing lists, your blog, your web site, or whatever --- we'd like to get as many people to respond as possible. Thanks, Greg ----------------------------------------------------------------- Computers are as important to modern scientists as test tubes, but we know surprisingly little about how scientists develop and use software in their research. To find out, the University of Toronto, Simula Research Laboratory, and the National Research Council of Canada have launched an online survey in conjunction with "American Scientist" magazine. If you have 20 minutes to take part, please go to: http://softwareresearch.ca/seg/SCS/scientific-computing-survey.html Thanks in advance for your help! Jo Hannay (Simula Research Laboratory) Hans Petter Langtangen (Simula Research Laboratory) Dietmar Pfahl (Simula Research Laboratory) Janice Singer (National Research Council of Canada) Greg Wilson (University of Toronto)
Article has 0 comments. Click To Read/Write Comments
We spotted this very optimistic piece of news yesterday: "New solar cell material achieves almost 100% efficiency, could solve world-wide energy problems." It reminded me of something we've been noticing recently about a class of Star-P users. Very often it seems like our users are inventing brand new materials, processes or other technology. Of course they want to use computer simulation to do their work, but existing tools are not sufficient for their needs. And that's what brings them to Star-P; a fast way to implement new simulations with good speed and efficiency.
Article has 0 comments. Click To Read/Write Comments
Just a quick note, Viral and I did a webinar for the SGI developer's program. It's the first time I've ever used a movie poster in a presentation. Of course it being a webinar, its hard to know if the movie poster was appreciated! You can see a replay of the webinar on SGI's Website It has a basic overview of Star-P and then a discussion of how Star-P has been used recently for knowledge discovery.
Article has 0 comments. Click To Read/Write Comments
Intel recently revealed more information about its upcoming Larrabee chip, apparently intended to have GPU functionality but also be suitable for general-purpose numerical calculations. How is one (esp. one keenly interested in numerical computing) to make sense of Larrabee and have some insight into its likely impact on the market? From my point of view, Larrabee is a predictable innovation given the larger forces at work in the market. The first of these forces is the meteoric growth in the number of gates possible per die; this leads to a constant ability to put on a next-generation general-purpose chip something that previously was too big, complex, or different to go on-die. (Precedents: floating-point co-processors, I/O interfaces, multiple cores, and memory controllers.) This allows AMD and Intel to contemplate extending their x86 cores with GPU features, with the prospect of much higher performance. The second force is the ability of a general-purpose software environment to appeal to a bigger market, which often means that the x86 (really, x86-64) instruction set architecture (ISA) has a major advantage over non-x86 ISAs, because it has more software available for it. (Precedent: even at the very high end, Cray has shifted its OS focus from the special-purpose Catamount to the general-purpose Linux, specifically for this reason.) The market rule is that, in the case of an approximate tie, the more-general-purpose component usually wins.
In this context, the GPUs have grown up as distinct chips, with performance unattainable from x86 chips, but with significant ease-of-use shortcomings relative to x86 chips as well. The GPU vendors want to improve their ease of use to reach a broader market; note Rob Farber's description of NVIDIA's CUDA evolution. Intel and AMD want to extend their already-general-purpose chips with GPU-like performance. I think of this simplistically via the diagram in Figure 1.  Figure 1. Performance / Programmability trade-offs of x86, GPU, and Larrabee chips
So, how could an intelligent observer tell how this is likely to play out? I believe two key metrics will tell us almost all we need to know.
- Sustainable memory bandwidth: Much of the superior performance that today's GPUs deliver compared to x86 sockets is due to their superior bandwidth to memory (100+GB/s versus ~10 in the latest Xeon sockets). When Larrabee comes out in 2009-2010, does it deliver 80-90% of the memory bandwidth of the then-available GPUs, or 20-30%? (I.e., where does Larrabee fall on the vertical axis?)
- Vector/threaded performance from nearly-vanilla C++/C/Fortran code: Larrabee's being based on the x86 instruction set will be small comfort to application developers if they have to restructure their code to the extent necessary with the GPU programming interfaces to get good performance. Intel's challenge is to get GPU-like performance with only slight changes to the original source code. (I.e., where does Larrabee wind up on the horizontal axis.) Note that some people might argue for no source changes, but I don't believe that's practical, given the typical state of applications vis-à-vis exposing vector/threaded constructs, and the state of commercial compilers.
And one further issue that appears to be overlooked(**) in the excitement to get many-core GPUs running fast: What's a model for parallel execution on these chips that will be stable over the next several years? For the x86 CPUs and, according to IBM, for its Cell also, OpenMP is the right on-chip parallel programming interface. But will this persist as a chip-wide parallel model? I claim not, as its effective implementations to date depend on cache coherence. Today the multicore x86 sockets all support chip-wide cache coherence. As the number of cores per chip grows exponentially, this won't be practical. (Witness what happened at the system level, where even SGI's heroic efforts to support system-wide cache-coherence have had to retreat in the face of astronomic core counts.) I believe what's likely to happen is that some balance of technical cost and difficulty and economic value will result in a cache-coherent building block of (say) 32 cores. Then those building blocks will be replicated on a die, with some non-globally-coherent interconnect. (Chip architects from major vendors privately confirm this approach.) At that point, parallel models will need to cope with distributed memory at least somewhat differently, and today's OpenMP will not be sufficient. IBM's support of OpenMP on Cell charts a new path, supporting OpenMP without hardware cache-coherence; the success of that path is still undetermined. Developers of GPU- or Larrabee-based programs, at least developers who don't want to rewrite their codes in 3 years, should be asking hard questions about the parallel models being exposed for these chips.
I believe that Intel's delivery of a Larrabee that delivers 80-90% of the performance of its contemporary GPUs, with much higher programmability, will be extremely difficult technically to achieve, but I think it's possible, and if pulled off would be a major change to the technical computing landscape.
Your thoughts?
(**) Though see an excellent discussion of related issues in the last 2 pages of a conversation with Kurt Akeley and Pat Hanrahan.
Article has 0 comments. Click To Read/Write Comments
The other week, I was interviewed on "Parallel Programming Talk Radio" by Aaron Tersteeg and Clay Breshears -- the show is 15 minutes and there are a few minutes of introduction before and a few after. The resulting ~10 minutes or so felt like an awfully short time to talk about Star-P! If you're new to Star-P, then this is perhaps a fast way to get an introduction, but if you already know what we're about, then there may not be enough new to make it worthwhile. On the other hand, since all of you are obviously parallel processors, why not play the interview, do your email and simultaneously run simulations in parallel on Star-P?
You can listen to the interview right here (see below) or if you'd like to click through BlogTalkRadio (and see what else goes on there), click here.
I attended IDF this past week and while there was quite a bit of noise about Nehalem and Intel's new 3GL parallel development tools, you can read about that stuff anywhere. I suppose you can also read somewhere about Jeffrey Katzenberg of Dreamworks giving a talk during the software keynote (about this)-- wow, I can't wait until we have that kind of marketing budget. There were polarized glasses under all our chairs and they slid out a movie theater sized screen to show 3D scenes from Kung Fu Panda. Kind of a different feel from Werner's Intel Cluster Ready talk. On entering the exhibition area, as usual, I went directly to the little booths at the back. The nice thing about the little booths is that you probably haven't seen what they are showing before and usually, you get to talk to people that are the ones directly working with the technology. The exhibit I found most interesting was one put on by an Intel "lablet" located somewhere in or near CMU. The overall project is called Diamond and it is directed at interactive search of non-indexed data. Richard Gass and Mei Chen were demonstrating an application built on top of Diamond called Interactive Search Assisted Decision Support (ISADS) which allows very large numbers of images to be searched to find ones similar to an image of interest. At the show, they had a special camera which they used to capture images of moles (yes, a charming photo of a mole on my forearm was taken...). The system searches a large database of other images to find a similar match. The search infrastructure lets the main application send "searchlets" to a distributed set of nodes that each search locally for matches. The fancy part is some interactivity on the balance between accuracy and number of matches. You don't get high performance without a good balance between CPU, memory and i/o. Often, the CPU side gets too much attention. It's always good to see people working on the full problem. Hopefully readers of this blog already know that Star-P supports parallel file i/o!
More info on the Diamond project can also be found here.
Article has 0 comments. Click To Read/Write Comments
Our partner in Israel, Emet Computing sent us this link. (Thanks Yoel!) High-Performance Computing Considered Harmful In this interview Greg Wilson talks about issues related to making HPC accessible to scientists, engineers and people who want to spend time solving problems with correct answers instead of pushing for peta-exa-zetaflops. I'll have to find an excuse to meet Greg at some point...
Article has 0 comments. Click To Read/Write Comments
Previous Page | All Posts | Next Page
Error sending email
Email sent successfully
|