Parallel Iterative Algorithms: From Sequential to Grid Computing (Chapman and Hall/CRC Numerical Analy and Scient Comp. Series)
Jacques Mohcine Bahi, Sylvain Contassot-Vivier, Raphael Couturier
This book addresses a kind of computing that has become common, in terms of physical resources, but that has been difficult to exploit properly. It's not cluster computing, where processors tend to be homogeneous and communications have low latency. It's not the "SETI at home" model, with extreme heterogeneity and long latencies. Instead, it's the Grid model: long latencies between heterogeneous subnets, but cluster-like speeds within the subnet. This creates unique demands, but also addresses a number of other two-level systems beyond those the authors discuss.
Synchronous algorithms work well within clusters, where communication latency lies below the computation time of one step on each node, and where each node can be expected to run at roughly the same speed as each other. Such algorithms have a fair literature of their own, and are addressed only in the prepratory chapters. Grid communication is different. Processor speeds lie in the nanosecond range these days. Intra-cluster communications range up many microseconds, and intercluster latencies range from milliseconds to seconds. The network of networks is a different beast, and this book addresses that strange creature.
These authors address algorithms that expect to iterate repeatedly and an unpredictable number of times between communication with neighboring sub-problems. In particular, the authors address iterative solution of sparse linear systems - perhaps a loss of generality, but not a loss of practical value. They present their approaches methodically and rigorously - expect to go through this book slowly, and maybe even go back to Strang once in a while. They also address the fact that distributed determination of convergence is at least as demanding as distributed agreements of many other kinds. As an interesting bonus, the asynchronous algorithms also grant some degree of fault tolerance in the presence of intermittent communication failure, such as packet loss due to network congestion.
This book's strengths, for me at least, lie in two areas. The first is its emphasis on pseudocode for critical algorithms - not cut&paste material, but clearly illustrative. The second lies in its progression from cluster-scale synchronous algorithms to Grid-scale asynchronous ones. This can also describe hardware-accelerated nodes within a cluster: fast communication with the accelerator, but orders of magnitude slower and less predictable communication betwee accelerated nodes. The absolute time scales and distances differ, but the ratios of local to non-local communication time and computation time hold up well.
Only the most dedicated readers will invest the time and effort needed to extract this book's value. Those readers, however, will be richly rewarded.
-- wiredweird
Ссылка удалена правообладателем
----
The book removed at the request of the copyright holder.