A series of four lectures is presented by Prof. Robert van de Geijn of the University of Texas at Austin, on the following topics.
Derivation of Dense Linear Algebra Algorithms
We discuss how a broad class of dense linear algebra algorithms can be systematically derived using Goal Oriented Programming techniques already advocated by Dijkstra and others in the early 1970s. The approach, which we call the FLAME methodology, is highly practical: it yields, from specification of an operation, families of high-performance algorithms via an eight step process that fills out a “worksheet”. The algorithm that maps best to a specific architecture can then be chosen. More than a thousand routines that are part of the libflame library (a modern alternative to LAPACK) and Elemental library (a modern alternative to ScaLAPACK) have been written by first deriving the algorithms in this fashion.
From Algorithms to Code
The FLAME methodology derives correct algorithms. The problem is that “bugs” can be easily introduced when translating an algorithm into code. As part of the FLAME project, we have created Application Programming Interfaces (APIs) that allow the code to closely resemble the algorithm. In a matter of minutes a correct algorithm can then be translated into a correct, high-performance implementation. We demonstrate this by implementing a few common dense linear algebra algorithms into MATLAB and C code. It is these APIs that are used for the implementation of both libflame (a modern alternative to LAPACK) and Elemental (a modern alternative to ScaLAPACK). (A participant who missed Lecture 1 will have no problems understanding Lecture 2.)
Distributed memory dense linear algebra libraries (and many other applications) cast most if not all communication in terms of collective communications (broadcast, allgather, reduce, etc.). For this reason, it is important for a practitioner to understand efficient algorithms for collective communications and their costs. Yet more than 30 years after the introduction of distributed memory architectures most still think that the Minimum Spanning Tree broadcast is the most efficient way of implementing the broadcast operation. In this lecture, we teach the fundamentals of which, we believe, all practitioners should be aware.
Distributed Memory Dense Linear Algebra Principles
In this lecture we show how dense linear algebra algorithms and collective communication combine to yield high-performance scalable dense linear algebra algorithms.
Video of the lecture (TU only)