The Future of Application Development and Exascale Computing

Thomas Sterling


The advent of the Chinese Tianhe-2 supercomputer marks the halfway point between Petascale and Exascale computing achieving greater than 32 Petaflops for the Linpack benchmark. Yet, this apparent proximity to the next tri-decade performance milestone fails to address the major challenges faced by application developers in the exascale era of the next decade. It is expected that extensions to conventional practices will be employed by 2020 on the first generation of exascale systems which themselves will be elaborations of the heterogeneous architectures used by many of the fastest machines today. In spite of this near term reliance on older, albeit proven, methods future exascale systems are likely to reflect aggressive innovation in architecture, programming methods, and system software to address key challenges to effective use. The ParalleX execution model that provides governing principles for dynamic adaptive computing systems will be described as an exemplar of the class of advances anticipated to enable broad generality and ease of use in the coming future of high performance computing. This presentation will identify the dominant challenges and describe the limitations they are imposing. It will then present the emerging class of dynamic adaptive methods being pursued and describe the changes to computer architecture and programming models that will result even as Moore’s Law is coming to an end. The presentation will conclude with a brief glimpse of truly alien strategies to supercomputing only now being imagined for far into the future.

Thomas Sterling is Professor of Informatics and Computing at Indiana University. He serves as the Executive Associate Director of CREST and as its Chief Scientist. Since receiving his Ph.D from MIT as a Hertz Fellow in 1984, Dr. Sterling has conducted research in parallel computing systems in industry, academia, and government centers. He is most widely known for his pioneering work in commodity cluster computing as leader of the Beowulf Project for which he and colleagues were awarded the Gordon Bell Prize. Professor Sterling currently leads a team of researchers at IU to derive the advanced ParalleX execution model and develop a proof-of-concept reference implementation to enable a new generation of extreme scale computing systems and applications. He is the co-author of six books and holds six patents.


NWChem: quantum chemistry across spatial, energy, and time (to solution) scales

Karol Kowalski


In this presentation we will discuss the development of scalable and unique computational chemistry capabilities for modeling and simulation in NWChem, and we will demonstrate its performance on large scale computing platforms. NWChem is DOE’s premier quantum chemistry software developed at the Environmental Molecular Science Laboratory at Pacific Northwest National Laboratory, and is available to the scientific community through the open-source Educational Community License. We will discuss several new parallel algorithms for many-body methodologies, which are capable of taking advantage of existing peta-scale architectures. Special emphasize will be given to the novel algorithms which can utilize the aggregate power of heterogeneous computer architectures.

Dr. Kowalski’s current research focuses on developing accurate many-body methodologies to describe electron correlation effects in molecular systems and materials. One of the most accurate way of characterizing these effects is provided by the coupled cluster (CC) formalism which offers a compact way of encapsulating many-body collective phenomena.  Dr. Kowalski has contributed to the development of renormalized CC methods, equation-of-motion CC approaches, linear response CC formulations, and multi-reference CC methods. Since coming to PNNL, Dr. Kowalski has been implementing CC methods into the NWChem program package. NWChem is a computational chemistry package for parallel computers developed at PNNL. Most recent work has focused on taking coupled cluster capabilities to extreme scale parallel computing platforms and heterogeneous computer architectures.  In addition to code development, Dr. Kowalski has been applying first principle methods to a variety of projects, including the description of excited states in light harvesting systems and materials, characterization of non-linear optical properties in delocalized molecular systems, and studies of quasi-degenerate electronic states.


OpenMP for Exascale Computing

Michael Wong


As the number of threads found in supercomputers continues to increase, there is a growing pressure on applications to exploit more of the available parallelism in their codes, including coarse-, medium-, and fine-grain parallelism. OpenMP has been one of the dominant shared-memory programming models but is evolving beyond that with a new Mission Statement making it well suited for exploiting medium and fine grain parallelism. OpenMP research has focused on application tuning, compiler optimizations, programming-model extensions, and porting to distributed memory platforms. OpenMP 4.0 exhibits many of these features to support the next step in exascale computing. I will discuss a few of these key features including Affinity, Accelerator, and Tools support as well as other low-overhead, scalable algorithms for creating parallel regions. I will also give an indication of the future direction of OpenMP 5.0.

Michael Wong is the CEO of the OpenMP Corporation, a consortium of 26 member companies that hold the de-facto standard for parallel programming specification for C/C++ and FORTRAN. He is the IBM and Canadian Head of delegation to the C++ Standard, and Chair of the WG21 Transactional Memory group. He is the co-author of a number of C++/OpenMP/TM features and patents. He is the past C++ team lead to IBM´s XL C++ compiler, C compiler and has been designing C++ compilers for twenty years.  Currently, he is leading the C++11 deployment as a senior technical lead for IBM. His current research interest is in the area of parallel programming, C++ benchmark performance, object model, generic programming and template metaprogramming. He is a frequent speaker at various technical conferences and serves on the Programming Committee of Boost, and IWOMP. He holds a B.Sc from University of Toronto, and a Masters in Mathematics from University of Waterloo. 


Exploiting Asynchrony for Materials in Extreme Environments

Timothy Germann


Within the Exascale Co-design Center for Materials in Extreme Environments (ExMatEx), we have initiated an early and deep collaboration between domain (computational materials) scientists, applied mathematicians, computer scientists, and hardware architects, in order to establish the relationships between algorithms, software stacks, and architectures needed to enable exascale-ready materials science application codes within the next decade. We anticipate that we will be able to exploit hierarchical, heterogeneous architectures to achieve more realistic large-scale simulations with adaptive physics refinement, and are using tractable application scale-bridging proxy application testbeds to assess new approaches to resilience, OS/runtime and execution models, and power management.  The current scale-bridging strategies accumulate (or recompute) a distributed response database from fine-scale calculations (tasks), in a top-down rather than bottom-up multiscale approach. I will demonstrate this approach and our initial assessments, using simplified proxies to encapsulate the expected scale-bridging workload and workflow.

Timothy C. Germann is Director of the DOE/ASCR “Exascale Co-Design Center for Materials in Extreme Environments,” and Chair of the American Physical Society (APS) Division of Computational Physics.  Since joining LANL in 1997, he has used large-scale classical MD simulations to investigate shock, friction, detonation, and other materials dynamics issues, and led the high strain-rate team in the DOE/BES “Center for Materials in Mechanical and Irradiation Extremes,” an Energy Frontier Research Center. Tim earned Bachelor of Science degrees in Computer Science and in Chemistry from the University of Illinois at Urbana-Champaign in 1991, and a Ph.D. in Chemical Physics from Harvard University in 1995, where he was a DOE Computational Science Graduate Fellow. He has received the Gordon Bell Prize (1998; also a finalist in 2005 and 2008), three LANL Distinguished Performance Awards (2005, 2007, and 2009), two NNSA Defense Programs Awards of Excellence (2006 and 2007), the LANL Fellows' Prize for Research (2006), the LANL Distinguished Copyright Award (2007), and an R&D 100 Award (2013); and is a Fellow of the American Physical Society (2011).


Toward Predictive Modeling of Nuclear Reactor Performance:

Application Development Experiences, Challenges, and Plans in CASL

Douglas Kothe


The Consortium for Advanced Simulation of Light Water Reactors (CASL) is the first U.S. Department of Energy (DOE) Energy Innovation Hub, established in July 2010 for the modeling and simulation (M&S) of nuclear reactors. CASL applies existing M&S capabilities and develops advanced capabilities to create a usable environment for the high fidelity predictive simulation of light water reactors (LWRs). This environment, designated the Virtual Environment for Reactor Applications (VERA), integrates components based on science-based models, state-of-the-art numerical methods, modern computational science and engineering practices, and rigorous verification and validation against data from operating pressurized water reactors (PWRs), single-effect experiments, and integral tests. The CASL M&S technology is being designed for efficient execution on today’s leadership-class computers, advanced architecture platforms now under development, and design engineering workstation clusters. CASL vision’s is to predict, with confidence, the performance of nuclear reactors through comprehensive, science-based modeling and simulation technology that is deployed and applied broadly throughout the nuclear energy industry to enhance safety, reliability, and economics. To achieve this vision, CASL’s mission is to provide coupled, high fidelity, usable capabilities needed to address light water reactor operational and safety performance-defining phenomena associated with nuclear fuel and the reactor vessel and internals. CASL is focused on a set of specific Challenge Problems (CPs) that encompass the key phenomena currently limiting the performance of PWRs, with the recognition that much of the capability developed will be broadly applicable to other types of reactors. CASL defines a Challenge Problem as one whose solution is (1) important to the nuclear industry and (2) amenable to or enabled by M&S.

After giving a brief overview of CASL’s goals and strategies, I will dive into the current computer and computational science technologies and methodologies embodied within CASL’s Virtual Environment for Reactor Applications (VERA). VERA is not a single simulation tool but rather a collection of capabilities for scalable simulation of nuclear reactor core behavior - it is a flexible toolkit of components that can be exercised in various combinations for different Challenge Problems and for varying fidelity requirements or computational resources. Scaling challenges for the VERA toolkit – some overcome and some not – will also be highlighted with illustrative examples.

Douglas B. Kothe (Doug) graduated Summa Cum Laude in 1983 with a Bachelor in Science in Chemical Engineering from the University of Missouri - Columbia, followed by his Masters in Science and Doctor of Philosophy (PhD) in Nuclear Engineering at Purdue University in 1986 and 1987, respectively. He conducted his PhD research at Los Alamos National Laboratory (LANL) from 1985-1987 as a Graduate Research Assistant, where he developed the models and algorithms for a particle-in-cell application designed to simulate the hydrodynamically unstable implosion of inertial confinement fusion targets. In May 2010, Doug led a multi-institutional, multi-disciplinary team known as “CASL” (Consortium for Advanced Simulation of Light Water Reactors; www.casl.gov) in winning a $122M, five-year award from the DOE for its first Energy Innovation Hub. As a result of this award, Doug is now Director of CASL and the CASL Division Director at ORNL.

Doug’s research interests and expertise is focused on development of physical models and numerical algorithms for the simulation of a wide variety of physical processes in the presence of incompressible and compressible multiphase fluid flow.


Managing Application Resilience: A Programming Language Approach

Pedro Diniz


System resilience is an important challenge that needs to be addressed in the era of extreme scale computing. High-performance computing systems will be architected using millions of processor cores and memory modules. As process technology scales, the reliability of such systems will be challenged by the inherent unreliability of individual components due to extremely small transistor geometries, variability in silicon manufacturing processes, device aging, etc. Therefore, errors and failures in extreme scale systems will increasingly be the norm rather than the exception.  Not all the errors detected warrant catastrophic system failure, but there are presently no mechanisms for the programmer to communicate their knowledge of algorithmic fault tolerance to the system.

In this talk we present a programming model approach for system resilience that allows programmers to explicitly express their fault tolerance knowledge. We propose novel resilience oriented programming model extensions and programming directives, and illustrate their effectiveness. An inference engine leverages this information and combines it with runtime gathered context to increase the dependability of HPC systems. The preliminary experimental results presented here, for a limited set of kernel codes from both scientific and graph-based computing domains reveal that with a very modest programming effort, the described approach incurs fairly low execution time overhead while allowing computations to survive a large number of faults that would otherwise always result in the termination of the computation.

As transient faults become the norm, rather than the exception, it will become increasingly important to provide the user with high-level programming mechanisms with which he/she can convey important application acceptability criteria. For best performance (either in terms of time, power, energy) the underlying systems need to leverage this information to better navigate the very complex system-level trade-offs to still deliver a reliable and productive computing environment. The work presented here is a simple first step towards this vision.

Pedro C. Diniz is a Research Associate in the Computational Sciences Division at the University of Southern California's Information Sciences Institute. Dr. Diniz has 20 years of experience in the areas of computer architecture, high-performance computing and compilation, program analysis and optimization. He has been a principal participant in major research programs funded by DARPA and DoE’s Office of Science. He has collaborated with universities, national laboratories and industry as prime contractor and sub-contractor. Dr. Diniz received a B.S. in Computer and Electrical Engineering and a M.S. in Electrical Engineering from Technical University of Lisbon in 1988 and 1992 and a Ph.D. from the University California at Santa Barbara in 1997. His current research focuses on program analysis for software resiliency and high-performance and reconfigurable computing.


HPX -- The Futurization of Computing

Thomas Heller


The advent of increasingly heterogeneous architectures with new features such as multi- and many-core processors present computer programmers with significant challenges in the world of parallel programming. This talk will establish HPX as a solution which will help application developers seamlessly and efficiently exploit the hardware available to them.  HPX is a general purpose parallel runtime system which exposes a uniform programming model for applications of any scale. This talk will cover a short introduction of our C++11 standard compliant API and will provide an overview of how HPX extends the C++ standard to unify local and remote operations. In addition, I will illustrate the algorithmic methods we have developed to work along with our new paradigm of parallel computing.


Thomas joined the STE||AR Group in October 2011 to pursue his master’s thesis. He shortly after graduated from the Friedrich-Alexander-University Erlangen-Nurnberg (FAU) with an MSc in Computer Science. He is now working toward his PhD at the Chair of Computer Science 3 - Computer Architecture at FAU. He is still actively engaged in the STE||AR Group and one of the main contributors to HPX and the surrounding developer community. His current interests are in programming languages, programming paradigms and high performance computing.


MPI for Exascale Systems

Pavan Balaji
MPI has long been considered the de facto standard for parallel programming.  One of the primary strengths of MPI is its continuously evolving nature that allows it to absorb and incorporate the best practices in parallel computing in a standard and portable form.  The MPI Forum has recently announced the MPI-3 standard and is working on the MPI-4 standard to extend traditional message passing into more dynamic, one-sided and fault tolerant communication capabilities.  Nevertheless, given the disruptive architectural trends for Exascale computing, there is room for more.  In this talk, I’ll first describe some of the capabilities that have been added in the recent MPI-3 standard and those that are being considered for the upcoming MPI-4 standard.  Next I’ll describe some research efforts to extend MPI to work in massively multithreaded and heterogeneous environments for highly dynamic and irregular applications.

Dr. Pavan Balaji holds appointments as a Computer Scientist at the Argonne National Laboratory, as an Institute Fellow of the Northwestern-Argonne Institute of Science and Engineering at Northwestern University, and as a Research Fellow of the Computation Institute at the University of Chicago.  He leads the Programming Models and Runtime Systems group at Argonne.  His research interests include parallel programming models and runtime systems for communication and I/O, modern system architecture (multi-core, accelerators, complex memory subsystems, high-speed networks), and cloud computing systems.  He has more than 100 publications in these areas and has delivered nearly 120 talks and tutorials at various conferences and research institutes.
Dr. Balaji is a recipient of several awards including the U.S. Department of Energy Early Career award in 2012, TEDxMidwest Emerging Leader award in 2013, Crain's Chicago 40 under 40 award in 2012, Los Alamos National Laboratory Director's Technical Achievement award in 2005, Ohio State University Outstanding Researcher award in 2005, six best paper awards and various others.  He has served as a chair or editor for nearly 50 journals, conferences and workshops, and as a technical program committee member in numerous conferences and workshops.  He is a senior member of the IEEE and a professional member of the ACM.  More details about Dr. Balaji are available at http://www.mcs.anl.gov/~balaji.


ALPS – the Applications and Libraries for Physics Simulations: Current state and future plans

Emanuel Gull


ALPS, the Applications and Libraries for Physics Simulations, are a general purpose package for simulating quantum systems. We introduce the ALPS libraries and the ALPS collaboration and present some of the recent physics results that were obtained using ALPS. We then focus on challenges and motivate future plans for the ALPS libraries, especially with regards to future exascale applications.

Emanuel Gull received his PhD. working the Institute for Theoretical Physics at ETH Zurich, Switzerland.  Since then he has held positions at Columbia University and the Max Planck Institute in Dresden, Germany.  He currently works as an assistant professor at the University of Michigan in Ann Arbor.  Dr. Gull is the recipient of several distinctions such as the SCES early career (Mott) prize in 2013, the Department of Entergy Early Career Award in 2013, and was recognized as a Sloan Research Fellow in 2014.