Even restricting ourselves to scientific, engineering, and mathematical users, as Star-P does today, there are many types of applications exhibiting many types of parallelism. The last 30+ years have seen many academic and industrial projects that created programming constructs for parallel processing. (Prominent examples being MPI, OpenMP, Unified Parallel C (UPC), Co-Array Fortran (CAF), and DCT.) Star-P cannot hope to incorporate all of these any time soon, but we clearly have to support common types of parallelism or Star-P won’t be very useful. This series of posts will describe the types of parallelism we support today (and some uses of those that may not be obvious), and important types of parallelism we don’t support now that we want to support soon.
Task parallelism (sometimes known as control parallelism) is probably the most general type of parallelism, in that all of the cooperating processes can be doing anything, with no expectation of coordination or homogeneity; i.e., one thread can control your toaster while another thread calculates your bank balance. We’ll come back to task parallelism in future postings.
Data parallelism is a special case of task parallelism, where the threads are not only executing the same operation at an instant, they’re also doing it cooperatively on portions of the same array(s). A simple example of data parallelism is scaling the values in an array by a scalar. For an array a and a scalar b, this could be expressed in Star-P’s MATLAB client as
c = a * b;
Many problems that depend on linear algebra for their solution wind up being readily expressed in this style, as do many FFT problems. Note that with this definition, the operation doesn’t have to be just element-wise; it could involve communication among the processors.
A common example is the sum operator, which adds together the elements in one dimension of an array, and whose result is one dimension smaller than the input. In the case of a sum in a distributed dimension of an array, that involves communication among the processors that own any elements of the array. All of the Star-P versions of data-parallel operators do the necessary communication implicitly, relieving the user of the burden of thinking of that level of detail.
These concepts allow lots of common MATLAB code constructs to run unchanged, once the initial matrices or arrays are defined as distributed. You might find it useful if your problem size has outgrown your desktop, say if you’re doing finite-element analysis and you wind up solving a matrix that’s grown very big. An example we frequently use is finding the eigenvector of a matrix, in this case a random matrix.
n = 10000*p; % explicitly parallel with *p
A = rand(n, n); % implicitly parallel from here
x = rand(n, 1);
y = zeros(size(x));
while norm(x-y) / norm(x) > 1e-11
y = x;
x = A*x;
x = x / norm(x);
end;
In this example, the dimension n is multiplied by the symbolic variable p (hence “*p” or “Star-P”), denoting distribution across processors of the parallel server. The following lines, creating input arrays and doing linear algebra with them, could operate on either local or distributed data. A and x wind up being distributed because they’re based on n, which has the distributed attribute. y winds up distributed because the size function on the distributed array x returns the size with the distributed attribute (inherited from n). As A, x, and y are distributed, the rand, zeros, /, and * functions operate on the distributed data and create distributed objects (new versions of x and y) as output.
norm is a little different, as it produces a scalar result from any dimension input, so it can be used in conditional tests the way it is here. You can see how the original distributed attribute of n propagates to arrays that are created later. From a user point of view, just one identification of distribution or parallelism has enabled the entire code segment to run in parallel. Obviously most real programs don’t run on random data :), so initial data read in via I/O can also be done so that it’s distributed, with the same propagation effect as above.
Sparse matrices useful for less structured calculations
It may be easy for you to see how data parallel works for these common operations on dense matrices. Star-P also supports data parallel operations on sparse matrices, which can be very useful. For example, replacing the initial constructors of A and x above by sprand (the sparse random constructor) would result in all the arrays in the loop being distributed sparse. Sparse matrices are very useful for representing the stiffness matrices of finite element calculations and graphs for combinatorial calculations. We’ll revisit the usefulness of sparse matrices in more depth in a future posting.
- Are sparse matrices important for your applications?
- What do you use them for?
- What operations do you need to work well on them?
Indexing results in communication on a parallel system
So far, the data parallel operations we’ve talked about have fixed patterns of using array elements, so they may seem somewhat formulaic. Indexing into arrays is a very general way to choose which elements will be used for an operation. And note that, in a parallel system, indexing can cause communication among the processors, which might be handy in certain situations. Consider an example from image processing where a q by r rectangular object resides in the image, and its coordinates are known. Another array with just the object can be created just by indexing into the larger distributed array, i.e.
object = image(i:i+q-1, j:j+r-1);
Similarly, indexing enables you to select elements that meet some criteria; for instance, we may want to select pressure values that correspond to temperatures above some threshold.
ndx = find(temperature, temperature>threshold);
selected_pressure = pressure(ndx);
Indexing has an interesting dual character; we can think of it just at the abstract level, where it does the intuitive thing we’d expect from the serial version of MATLAB, but we can also think a level deeper about how it moves data within the parallel computer. The point for now is that each distributed array is spread across all the processors in the parallel server, so the portion of image indicated by the indices might reside on just a few processors. Creating object redistributes the data to be spread across all the processors again, such that subsequent operations can have all processors participate since they’ll all have data, and similarly for ndx and selected_pressure. Indexing will be very useful for some future advanced topics.
- Does your application readily map onto data-parallel constructs like these?
- Are there special constraints or features you’d need to make them work for you?
Summary: In this posting we defined task- and data-parallelism and gave examples of data-parallelism from today’s Star-P constructs. Using these, many common MATLAB code sequences will work with no changes, once the input arrays are designated as distributed.