TABLE OF CONTENTS

 PROJECT SUMMARY

  I. Introduction

  II. Resources and Background

      A. The UCLA AGCM

      B. The GFDL Modular Ocean Model (MOM)

      C. The UCLA Coupled GCM

      D. The UCLA Advanced CATM

          1. Atmospheric Photochemistry Model (APM)

          2. Polar Stratospheric Cloud Model (PSCM)

          3. Ames Tracer Model (ATM)

          4. UCLA Chemical/Aerosol Tracer Model (CATM)

      E. The LLNL Atmospheric Chemistry Model (LACM)

      F. The UCLA Coupled GCM in a Distributed Computer Environment

      G. Sequoia 2000

      H. Recipe Management for Scientific Programming

  III. Proposed Research

      A. Computation Challenges

          1. Revision and improvement of the parameterization of sub-grid scale processes

          2. Parallelization of the calculations in the different components of the ESM

          3. Algorithm parallelization and load balancing

          4. The ESM and Data Base Management System (DBMS)

          5. Development of a new recipe management architecture

      B. Earth Science Challenges

          1. Seasonal cycle and interannual variability of the atmosphere-ocean system

          2. Global Distribution of Greenhouse Gases

          3. Ozone Perturbations

  IV. Industrial Support to this Project

  V. Project Personnel

 BIBLIOGRAPHY

 BIOGRAPHICAL SKETCH

 BUDGET CURRENT AND PENDING SUPPORT APPENDIX A – F

 SUBCONTRACTS/COLLABORATIONS


Project Summary

    We will develop a coupled model of atmospheric and oceanic circulations, and chemical tracers. The coupled model, formulated for parallel computer environments, will be applied to study problems in climate, climate change, and climate/chemistry interactions, including the general circulation of the coupled atmosphere/ocean system, global distributions of greenhouse gases, and global ozone perturbations.
    The proposed model development utilizes the following major components: 1) The UCLA general circulation model of the atmosphere (AGCM), 2) the GFDL/Princeton University general circulation model of the ocean (OGCM), 3) the NASA Ames/UCLA chemical/aerosol tracer model (CATM), and 4) the Lawrence Livermore National Laboratory (LLNL) atmospheric chemistry model (LACM). Versions of these component models are currently operational, and preliminary coupling of the components has been carried out.
    We propose to assemble a fully coupled model with a highly modular structure, in which components can be interchanged or added with a minimum of computational difficulty. The code will be highly optimized, including local parallelization within each module as well as between modular components. Finally, the coupled model will be configured to work as well in a distributed (heterogeneous) computer environment through high-speed networks. Concerning the computational challenges, we will focus on issues related to distributed computation and algorithm optimization. The AGCM, OGCM, CATM and LACM are all grid-point models, which is an advantage in parallel architectures. Other numerical problems arise, however. For example, the atmospheric flow in polar regions is difficult to simulate; we are developing specific solutions to this and related problems of resolution and numerical stability. The issue of load balancing among processors may become more critical as the parameterizations of physical and chemical processes grow in sophistication and the computational load between model regions becomes more variable.
    We also propose to apply the coupled model to study the climate system, climate change, and climate/chemistry interactions. We will divide this work into studies of the fundamental coupled dynamics of the global atmosphere-ocean system, and the global chemistry of the atmosphere. We will simulate the interannual variability of the coupled atmosphere and oceans, including the seasonal cycle and transients up to decadal time scales. We plan to couple the AGCM and CATM to study global ozone depletion, including the effects of atmospheric aerosols on the chemistry of the upper atmosphere. These latter simulations will be based on work already completed in studies of the ozone hole chemistry, which provides a degree of validation of the coupled model. The operational atmospheric chemistry model at LLNL (LACM) will be used to develop and test photochemistry and heterogeneous chemistry algorithms to be used in the coupled AGCM/CATM. The LACM will also be employed for tropospheric chemistry simulations using dynamical fields from the coupled AGCM/OGCM.
    The model development and applications proposed here will lead to important algorithm development for future large-scale Earth System Science simulation, and will provide unique scientific insights into coupled dynamical/chemical/microphysical processes that contribute to global climate change and ozone depletion.

I. Introduction

    This proposal responds to the NASA Research Announcement in connection with the High Performance Computing and Communications (HPCC) Program call for Grand Challenge science applications involving massively parallel computations. Computer simulations of the global climate (using general circulation models, GCMs) and of global chemistry, including ozone depletion, address fundamental issues that affect the environment and epitomize the challenge of Earth sciences to computer technology.
    GCMs that describe atmospheric and oceanic dynamics (AGCMs and OGCMs, respectively) play a key role in the study of climate. These models explicitly solve the equations governing fluid motion on a rotating sphere, including parameterizations of physical processes at sub-grid scales (e.g., cloud convection, turbulent diffusion), and thus can be used to study nonlinear interactions and feedbacks between different components of atmosphere and ocean circulations. Moreover, when atmosphere and ocean general circulation models are coupled, one of the most important interconnections in the climate system can be studied. Examples of outstanding problems that may be addressed with coupled GCMs are El Niño-Southern Oscillation (ENSO) events and the role of the oceans in moderating the greenhouse warming effect of carbon dioxide and other gases.
    Chemical tracer models describe the detailed composition of the atmosphere, couple chemistry and climate processes, and simulate the behavior of the ozone layer. An advanced tracer model couples three-dimensional dynamics, multi-species photochemistry, and multi-component particulate microphysics in a general way. Such a chemical/aerosol tracer model (CATM) can be driven by an AGCM to investigate a wide range of problems such as the formation of the Antarctic "ozone hole" and the impact of volcanic eruptions on the ozone layer and on climate. A CATM connected to a coupled AGCM-OGCM can be used to analyze the chemistry of the marine atmosphere, including the marine sulfur cycle, long-range transport and transformation of nitrogen and other species, and the impacts of marine chemistry and microphysics on the climate system. The development of efficient and accurate algorithms for chemical and microphysical processes in multi-dimensional models requires considerable preliminary analysis with models of lower dimension.
    The goal of the research proposed here is to develop a state-of-the-art coupled Earth Systems Model (ESM) that can be applied to problems of global climate change and atmospheric chemistry. The ESM will be constructed using operational versions of the UCLA AGCM, the GFDL OGCM, and an advanced version of a CATM. The Earth Systems Model will simulate coupled global atmosphere-ocean processes, including the chemistry of tracers in the system. Development of the ESM will involve the formulation and application of new algortihms for use on massively parallel computers.
    The ESM will be constructed using three principal component models: 1) The UCLA AGCM; 2) the GFDL/Princeton University OGCM; and 3) the NASA AMES/UCLA CATM. Existing research versions of these models have been developed over a number of years under funding by NASA, NSF and other federal agencies. In addition, the LLNL atmospheric chemistry model will be used for algorithm design and testing, and for validating simulations.
    By combining the model components listed above, we will produce a model with the following desirable characteristics:
        a) A highly modular structure, so that modules can be interchanged or added with a minimum of computational              restructuring. For example, a unified radiative transfer scheme coupling the atmosphere and oceans, accounting for              predicted clouds and trace gases and aerosol distributions, will be added.
        b) Optimized for high-speed computation, including parallelization within each module as well as between modular              components. The component codes (UCLA AGCM, GFDL OGCM, and AMES/UCLA CATM) are all based on finite              difference methods, which are highly adaptable for distribution in massively parallel computer systems.
        c) Tested capability for displaying and analyzing output in real-time.
        d) Collateral implementation in the Thinking Machines Corporation (TMC) CM-5 in a fashion that allows the models to be              ported to other MIMD machines, such as the CRAY shared-memory multiprocessors and emerging massively parallel              distributed memory systems.
        e) Designed for distributed computations in heterogeneous computer environments.
The proposed model development will include revision of physical parameterizations and computational techniques, and consideration of issues related to distributed computation and algorithm optimization. For example, grid-point models require careful treatment of the atmospheric flow in polar regions of the Earth. Current solutions to this problem, which involve Fourier decomposition and truncation, are highly non-local. Hence, new algorithms are being designed to solve this problem for applications in a distributed computer environment. There is also the issue of load balancing among processors, which will become more critical as the parameterizations of physical and chemical processes become more sophisticated and, therefore, the computational burden is distributed more heterogeneously between model domains.
    The ESM resulting from this work will be applied to study current problems in climate, climate change, and climate/chemistry interactions, including the general circulation of the coupled atmosphere/ocean system, and the global distributions of greenhouse gases. We also propose to build into the ESM the capability for simulating global atmospheric chemistry, including tropospheric and stratospheric photochemistry and heterogeneous chemical processes. Such a model could be applied to a diverse set of problems, including the depletion of stratospheric ozone, particularly at high latitudes in both hemispheres associated with particulates, and the coupling of ozone to atmospheric dynamics and climate forcing.
    In the proposed work, the well-established atmospheric chemistry model of the Lawrence Livermore National Laboratory (the LACM) will be employed as a test-bed for numerical algorithms, and as a validating code for three-dimensional simulations. The LACM has a complete photochemistry and transport scheme in two dimensions, and its use complements the applications of the CATM. For example, initial simulations of tropospheric chemical cycles can be carried out using the LACM with appropriately averaged dynamical fields.

II. Resources and Background

    The proposed research is built on a foundation of existing sophisticated numerical model components and extensive research and applications in the area of large-scale computing, including parallel processing elements. In this section, we discuss the computational tools and research projects of the investigators that will support and enhance the proposed work. Our resources include well-established models of the atmosphere, oceans and chemical/aerosol tracers, experience with parallel architectures and algorithms, model applications on massively parallel machines, and broad expertise in the necessary scientific disciplines. The principal model components and other computational assets consolidated under this proposal are described below.
    The essential ingredients for this project comprise the long-term research tools developed by the participants under funding from several sources. These assets and activities include:
      a) Versions of a Coupled GCM suitable for vector computer architectures are being used to study the dynamics and           predictability of the coupled atmosphere-ocean system under funding from DOE, NASA, NOAA, NSF, and ONR.
      b) An effort to revise and parallelize the UCLA AGCM code for massively parallel computer environments is being carried out           with support from DOE (CHAMMP).
      c) A study on distribution of the Coupled GCM in both homogeneous and heterogeneous computer environments is being           performed under funding from NSF and DARPA through the Corporation of National Research Initiatives Gigabit Network           Project/CASA Testbed.
      d) The physics and chemistry algorithms in an advanced CATM for chemical tracer transport and transformation are under           continuing development under funding from DOE, EPA, NASA, and NSF.
      e) The model codes to be used in this project are running, or will soon be running in CRAY Y-MP, and Intel Touchstone Delta           computers.
      f) Members of the Research Team are participating in the University of California/Digital Equipment Corporation Sequoia 2000          Project, which addresses major issues in storage, management and visualization for Global Change research.
    The key project elements are discussed in greater detail below.

A. The UCLA AGCM

    The atmospheric component of the ESM is the UCLA AGCM. This model has been developed under the direction of A. Arakawa (Arakawa and Lamb, 1977). The current version of the AGCM has been used since the early 1980s at UCLA and Colorado State University (Randall et al., 1985). The distinctive feature of this latter version is the treatment of the planetary boundary layer (PBL). This is considered as well-mixed and represented by the model's bottom layer, whose variable depth is predicted (Suarez et al., 1983). A more detailed description of the UCLA AGCM is given in Appendix A.
    A major effort is being carried out at UCLA – in close collaboration with computational physicists and computer scientists at the Lawrence Livermore National Laboratory (LLNL) – to develop, document, and optimize a new version of the UCLA AGCM designed for execution in massively parallel computing environments, and to reconfigure it for efficient coupling with other major components comprising an ESM. The AGCM code under development incorporates fully three-dimensional data structures. Additionally, the code is highly modular with identifiable portions of the finite difference algorithm broken up into their own subroutines. Recent LLNL work in parallel geophysical fluid dynamics modeling suggested the feasibility of multi-dimensional horizontal domain decomposition of such a code. To run the AGCM in parallel using the two-dimensional domain decomposition paradigm a message-passing shell code was developed at LLNL. For the purposes of parallelization, the domain is partitioned in latitude-longitude using a rectangular decomposition, with each sub-domain corresponding to a process. Variables associated with a given sub-domain are local to that process, and the various processes communicate information by sending messages. Variables are allocated dynamically, and provision is made to dynamically vary the decomposition to achieve load-balance among the processors. The AGCM presently runs under this message-passing driver on the 126-processor BBN TC2000 system at LLNL. Because only distributed-memory constructs are explicitly used in this implementation, and because message-passing on the BBN is based on a portable communications library, porting to other message-passing parallel processors is expected to be relatively straightforward.

B. The GFDL Modular Ocean Model (MOM)

    The oceanic component of the ESM is the GFDL Modular Ocean Model (MOM) developed at Princeton University GFDL (Geophysical Fluid Dynamics Laboratory) by R. Pacanowski, K. Dixon, and A. Rosati. It is the successor to the code written by M. D. Cox (Cox, 1984) based on the work by K. Bryan (Bryan, 1969). An outline of the Bryan-Cox model is given in Appendix B.
    Except for the use of namelists, MOM is written in standard Fortran 77. C-preprocessor directives are used to enable/disable different options for parameterizations of physical processes (such as turbulent mixing in the horizontal and the vertical), boundary conditions, numerical schemes used, and optimizations. The MOM code, therefore, is highly portable and easy to configure. Its modular design and the use of C-preprocessor directives make it easy to accommodate alternative physical parameterizations and improvements as new modules or options.
A parallel version of Bryan-Cox code has been developed at Los Alamos National Laboratory (LANL) for the massively parallel CM-2 Connection Machine. This version incorporates the efforts by Chervin and Semtner (1988) to utilize multiple processors on CRAY computers. Further improvements made at LANL include the solution of surface pressure instead of the barotropic streamfunction for efficiency on parallel machine architectures, and a different data structure that is more suitable for a parallel environment.

C. The UCLA Coupled GCM

    The UCLA AGCM and the GFDL OGCM are the components of the UCLA Coupled GCM. The AGCM provides the wind stress, heat and fresh water fluxes to the OGCM, and the OGCM returns sea surface temperature (SST) to the AGCM (Mechoso et al., 1991a). At the initialization stage both models pass their grid systems to coupling routines. Throughout the integration, the models exchange updated boundary conditions through the coupling routines, which perform the interpolations required by the difference in model grids. Currently, the OGCM has a Global version, and a Tropical Pacific version with enhanced resolution in the equatorial region (Philander and Pacanowski, 1980). The Tropical Pacific version of the Coupled GCM produces a realistic simulation of the seasonal cycle (Mechoso et al., 1991a, 1991b). Figure 1 shows the simulated time series of equatorial SST. There is no evidence of significant climate drift, which is a major concern in modeling of the coupled atmosphere-ocean system without flux corrections (Neelin et al., 1991).

D. The UCLA Advanced CATM

An advanced version of a chemical/aerosol tracer model (CATM), originally developed at the NASA Ames Research Center in collaboration with UCLA scientists, is currently being tested and validated at UCLA. The model consists of several components that can be linked together in various combinations to study the transport and transformations of chemically active gas and aerosol tracers in the atmosphere or oceans. The key components of the CATM are described below.

1. Atmospheric Photochemistry Model (APM)

    An accurate and efficient photochemical model has been developed at UCLA for applications in multidimensional models such as the CATM (Elliott et al., 1991a, 1992). The solution scheme utilizes a well-developed "family" technique (Turco and Whitten, 1974, 1977, 1978) with a new implicit projection scheme for species concentrations. The technique proves to be quite accurate with a one hour time step when compared to highly-accurate fine-time-step solutions. The APM has been tested under the wide range of conditions that would be encountered over a three-dimensional global grid (e.g., Cicerone et al., 1991).
    The APM is specifically designed for multi-dimensional applications, and incorporates a new efficient algorithm to calculate photodissociation rates (X. Zhao and Turco, 1991, 1992), using simple empirical functions of the solar-path-integrated column concentrations of O2 and O3 and a wavelength-dependent albedo factor (e.g., Luther and Gelinas, 1976). For tropospheric photochemical calculations, the details of surface reflection, Rayleigh scattering, and aerosol and cloud scattering and absorption are explicitly treated using an efficient two-stream code (Toon et al., 1989b). The radiative code is already incorporated into the Ames Tracer Model (ATM) described later, which provides the framework for integrating transport, chemistry and microphysics algorithms in the CATM. The APM will serve as one of the photochemical modules for the CATM, and thus can utilize the powerful radiative treatment in the Ames tracer model.

2. Polar Stratospheric Cloud Model (PSCM)

    The UCLA model for polar stratospheric clouds (PSCs) is based on a series of earlier models designed to study aerosols in planetary atmospheres (e.g., Turco et al., 1979a,b; Toon et al., 1979a,b). The PSC physics treated in the model is described in a number of papers (Hamill et al., 1988; Toon et al., 1989a; Drdla and Turco, 1991). The resulting PSCM treats a multicomponent aerosol consisting of sulfate aerosols, nitric acid trihydrate crystals (Toon et al., 1986) –– type-I PSCs –– and ice particles –– type-II PSCs. The type-I PSCs are nucleated on sulfate aerosols, and type-II PSCs on type-I particles (Hamill et al., 1990). The subsequent microphysics is dominated by condensation/evaporation processes and particle sedimentation (Turco et al., 1989; Toon et al., 1990). The PSCM predicts the detailed behavior and properties of the clouds, including subtle aspects of simultaneous denitrification/dehydration processes, which in turn influence the formation of the ozone hole. Model predictions have been validated against observational data (Drdla and Turco, 1991; Drdla et al., 1992a).
Heterogeneous chemistry is included in the PSCM in the form of measured sticking coefficients (Drdla et al., 1991, 1992a). The PSCM provides a detailed representation of the particle surface areas available for chemical reaction, and hence the heterogeneous chemical reaction rates. The PSCM heterogeneous chemistry model has recently been coupled to the APM. Trajectory calculations have been carried out with coupled microphysical, heterogeneous chemical and photochemical processes. The trajectories were determined by back-tracing from an aircraft sampling track during the Airborne Arctic Stratosphere Expedition II (AASE-II) mission in the winter of 1991/92. The simulated chemical transformations corresponded closely to those measured by the aircraft (Drdla et al., 1991, 1992b).

3. Ames Tracer Model (ATM)

    An advanced tracer advection model has been developed at NASA Ames through an ongoing collaboration with UCLA and San Jose State University (Hamill et al., 1977; Turco et al., 1979a,b; Toon et al., 1979a,b). The Ames tracer model (ATM) simulates the distributions of atmospheric gas and aerosol trace constituents under the influence of fluid motions in one, two or three dimensions (Turco et al., 1989; Toon et al., 1988). The principal features of the ATM are: transport algorithms that are non-diffusive and mass conservative; a complete treatment of aerosol microphysics; an automated package for photochemical calculations; and an accurate and fast radiative transfer scheme for photodissociation and atmospheric heating rate calculations. Details of model physics and numerical techniques are discussed by Toon et al. (1988, 1989a,b) and Turco et al. (1979a, 1989). The ATM acts as the framework for assembling the CATM from the chemistry and microphysics modules discussed earlier.
The ATM tracer transport scheme has been successfully coupled to atmospheric general circulation models and mesoscale models (Malone et al., 1986; Toon et al., 1988; Westphal et al., 1988; Kao et al., 1990; Westphal and Toon, 1991; Young et al., 1992). The algorithms for microphysical and chemical processes have been extensively tested in one-dimensional models to verify their accuracy and efficiency. The tracer model is unique in that it can be used to couple three-dimensional dynamics, multi-species photochemistry, and multi-component particulate microphysics in a general way. The tracer code is highly vectorized and optimized for coupled chemical/microphysical calculations. The model is also modular and can accommodate a variety of photochemical and microphysical packages that are designed to study polar stratospheric clouds, air pollution, the marine sulfur cycle, volcanic eruption clouds, noctilucent clouds, and cirrus and stratus clouds.
4. UCLA Chemical/Aerosol Tracer Model (CATM)
In the version of the CATM that has been assembled at UCLA (based on the ATM), the following processes are treated: gas-phase photochemistry, homogeneous and heterogeneous binary vapor nucleation, multi-component aerosol coagulation, aerosol growth and evaporation by vapor transfer, thermodynamic chemical equilibrium of multiple components in aqueous solutions, including vapors over solid condensates, chemical transformations of aqueous components in solution, aerosol and gas dry and wet deposition, atmospheric dynamics including advection by winds and diffusion by turbulence, boundary layer dynamics, convection, humidity and energy balance, horizontal and vertical transport of all chemical tracers and aerosols, and solar and infrared radiation transfer, including spectral intensities, heating rates, photorates, and visibility.
Gas-phase inorganic and organic chemistry is treated using a new formulation of a matrix inversion-based family technique (Jacobson et al., 1991, 1992) that derives from earlier work on the APM. The chemistry module is generalized to accept any photochemical mechanism. The aerosol algorithms treat any number of pure aerosols, and two-component aerosols, and one generalized mixed aerosol with any number of components. The size distribution of each aerosol type can be divided into an arbitrary number of discrete size-bins, covering any specific size range. Homogeneous homomolecular and heteromolecular nucleation processes are explicitly treated using the standard classical theory (Hamill et al., 1982; J.-X. Zhao and Turco, 1992). Heterogeneous homomolecular and heteromolecular nucleation are also included (Hamill et al., 1982; J.-X. Zhao and Turco, 1992). Condensational growth and evaporation rates are calculated using a numerical scheme that suppresses artificial "numerical diffusion" across size bins (analogous to numerical diffusion in advection calculations) (Toon et al., 1988; Turco et al, 1979a,b). Coagulation is solved using the unique semi-implicit numerical technique of Turco et al. (1979a,b) (see also Toon et al., 1988). Self-coagulation rates of similar aerosol types and coagulation rates between different aerosol types are calculated. Sedimentation velocities are calculated every time step at each model grid point corresponding to the local environmental conditions, and aerosol properties (i.e., mean density). Aerosol dry deposition velocities are calculated using an algorithm derived from the work of Giorgi et al. (1986). All of the numerical algorithms conserve aerosol number and mass exactly, and are always numerically stable. In addition, the equilibrium thermodynamics of multicomponent aerosol solutions is determined in manner of Pilinis and Seinfeld (1987) using an algorithm due to Villars (1959) and gas/liquid/ion/solid equilibrium relations determined by the ZSR method (Robinson and Stokes, 1965) or the MK method (Kusik and Meissner, 1978), including simultaneous specification of the multicomponent activity coefficients (Bromley, 1973) through binary activity coefficients (e.g., Pitzer and Mayorga, 1973). Aqueous chemical reactions are determined through a set of coupled rate equations for the reactant and product species in solution in each aerosol size-bin.
E. The LLNL Atmospheric Chemistry Model (LACM)
The atmospheric chemistry models developed at LLNL have been applied in a wide range of studies of tropospheric and stratospheric processes, and the impacts of human activities. Many of these studies have been related to global ozone perturbations and the effects of chemistry on climate. The ozone work has included assessments of chlorofluorocarbon release (e.g., Hammitt et al., 1987; Kinnison et al., 1988; Wuebbles, 1990), solar flux variations (e.g., Wuebbles et al., 1991), and aircraft emissions (e.g., Johnston et al., 1989; Wuebbles and Kinnison, 1990). Other studies have focused on the recent trends in ozone concentrations (e.g., Reinsel et al., 1987, 1988; DeLuisi et al., 1989; Wuebbles et al., 1991). Livermore scientists introduced the concept of Ozone Depletion Potentials (ODPs) (Wuebbles, 1988; Connell and Wuebbles, 1989), which is now used throughout the world in ozone assessments. The Livermore group has also carried out extensive simulation of the impacts of chemistry on climate (Wang et al., 1986; Wuebbles and Edmonds, 1988, 1991; Penner et al., 1990; Wuebbles et al., 1989; Penner, 1990; Lacis et al., 1990). The chemistry modeling has been extended to three dimensions, including studies of the tropospheric nitrogen cycle (Atherton and Penner, 1988, 1990; Penner et al., 1991), and the role of aerosols in climate change (Walton et al., 1988; Ghan et al., 1988; Kreidenweis et al., 1990; Ghan et al., 1990; Erickson et al., 1990; Penner and Mulholland, 1990; Ghan and Penner, 1990; Penner, 1990).
The LLNL zonally averaged two-dimensional chemical-radiative-transport model currently calculates the atmospheric distributions of 54 chemically-active species in the troposphere and stratosphere. The model domain extends from pole to pole, and from the surface to 60 km altitude. The vertical resolution is 1.5 km in the troposphere and 3 km in the stratosphere. The photochemical package includes all of the relevant processes for the oxygen, nitrogen, hydrogen, chlorine and bromine systems, as well as methane and products. For photodissociation calculations, a two-stream radiative transfer code with 126 spectral intervals is used to compute local irradiances. The LLNL atmospheric chemistry model has been ported to the MIMD NCUBE and iPSC/860 machines. Hence, parallelization of the LACM algorithms has been carried out. Moreover, Livermore personnel are working to port a three-dimensional chemical tracer model to massively parallel computers under the DOE CHAMMP project.
During the first year of the project, the two-dimensional LACM will be used as a test bed for various numerical schemes and algorithms that are being designed at LLNL and UCLA to handle stiff photochemical rate equations accurately and efficiently. Additional testing of radiative transfer and aerosol microphysics modules will be conducted. In subsequent years, the applications will expand to include the generation of initialization data for three-dimensional simulations, initial sensitivity studies of the ozone depletion and greenhouse gas distribution problems, and validation runs for the global chemistry model.
F. The UCLA Coupled GCM in a Distributed Computer Environment
Several characteristics of the UCLA Coupled GCM make it both an ideal and difficult application for a distributed computer environment. The model is computation intensive and generates large amount of data that have to be stored for analyses. There are both vector and scalar codes in different parts of the model. Ideally, one would like to have massively parallel processors with vector capabilities, and large volume, high-speed data archiving systems and visualization hardware working seamlessly together.
A pilot program that explores the distribution of the Coupled GCM across high-speed (gigabit per second) networks is underway at UCLA. This study is an integral part of the Corporation for National Research Initiatives (CNRI) Gigabit Testbed Initiative, which is funded by NSF and DARPA (see Appendix C). In running the GCM in a distributed fashion, we have the following objectives:
i) Explore the possibility of superlinear speedup of concurrent computation through the use of heterogeneous computer architecture.
ii) Enhance graphics capabilities by allowing for remote real-time animation of model results.
iii) Increase available resources by allowing for the utilization of geographically separated computers, and to guarantee continuous availability of resources even when a site is temporarily unavailable.
iv) Facilitate closer collaboration among researchers specializing in different parts of the model, by providing a system in which modules under development at different institutions can be easily exchanged for the purpose of performance evaluation.
The principal issues being addressed in this research include hiding network latency with computation, and the mechanisms for exchanging data among processes.
Depending on the level of parallelism one wishes to achieve, the coupled GCM can be decomposed at different levels. The first (coarsest) level of decomposition is based on the difference in tasks each component of the model carries out (task decomposition). The AGCM and OGCM are two well-separated entities interconnected by coupling routines. Within the AGCM itself, there are also two relatively well-defined components. One is the AGCM/Physics, which computes the effect of subgrid-scale processes on grid-scale motions. The output of the AGCM/Physics is supplied to the other component – AGCM/Dynamics – as forcing terms in the primitive equations.
Based on these considerations, we have decomposed the Coupled GCM into three tasks: AGCM/Physics, AGCM/Dynamics, and OGCM. This decomposition requires a Master Control Program (MCP) to provide the user interface and supervise communications between different processes. The large datasets produced by the coupled GCM require a Dataset Manager to collect model output from different locations and dispose data to Mass Storage Subsystems (MSS) or to process data for real-time visualization. The resulting distributed coupled GCM application is shown in Fig. 2. So far, the interprocess communication is done on a message-passing basis, which is carried out by Berkeley sockets or similar utilities provided by EXPRESS or PVM. With the Coupled GCM decomposed in this manner, the AGCM/Dynamics and the OGCM can be run concurrently. This is because all the boundary conditions required by the OGCM are available after AGCM/Physics is completed. In this way, a substantial reduction in wall-clock time can be achieved when running the Coupled GCM.
The decomposition in Fig. 2 allows us to explore the possibility of nonlinear speedup by running different modules in computers with architectures that are most efficient for each of them. The nature of the dynamics and physics codes in the AGCM is very different. The dynamics code is highly vectorizable, while the physics code can be easily distributed because most calculations are made in atmospheric columns. In particular, we expect that the wall-clock time required to run the AGCM/Physics will be greatly reduced by using a massively parallel architecture.
A higher level of decomposition, which enables the overlapping of communication with computation in a parallel environment consisting of either a single or multiple computers, is discussed in the Proposed Research section of this proposal.
G. Sequoia 2000
Achieving the goals of this project will depend not only on improved codes, but also on improved data systems for manipulation of large-scale model output. The synergistic interactions between observations and model-based simulations require massive amounts of diverse information to be stored, organized, accessed, distributed, visualized, and analyzed. Refinements in computing - specifically involving storage, networking, file systems, extensible data base management, and visualization - are needed. The University of California/Digital Equipment Corporation Sequoia 2000 Project seeks to develop large capacity object servers and visualization techniques for Global Change research (see Appendix D). The University of California Berkeley (UCB) and UCLA are principal participants in Sequoia 2000.
There are considerable shortcomings in current information systems. The SEQUOIA 2000 research project is executing a coordinated attack on these issues:
i) Current storage management system technology is inadequate to store and access the massive amounts of data required for Earth System Science research.
ii) Current I/O and networking technologies do not support the data transfer rates required for browsing and visualization of satellite data or output from models of Earth processes.
iii) Current visualization software is too primitive to allow for useful interactive viewing of data on scientific workstations.
iv) Current database systems are inadequate to store the diverse types of information required, such as point-data for specific geographic points, vector, raster, and text data.
v) It is extremely difficult to share the objects noted above with other interested researchers.
Sequoia 2000 plans to extend the next-generation data base management system (DBMS) POSTGRES (Stonebreaker et al., 1990) to manage effectively Global Change data. The project is also addressing visualization issues, and plans to produce a seamless interface between POSTGRES and a variety of visualization packages.
H. Recipe Management for Scientific Programming
Scientific programming has been traditionally performed by coding directly in Fortran 77. Recently, several scientific visualization systems have been developed to try to move scientific programming to a higher, more productive level. Examples of such programming environments are AVS, built by Stardent and licensed to many vendors, Explorer, marketed by Silicon Graphics, and Khoros, from the University of New Mexico. It is expected that GCMs and other remote sensing applications will be moved to such programming systems to take advantage of their module sharability and reusability and their built-in output visualization software. We call such scientific visualization systems, recipe managers, because they describe a recipe by which a collection of inputs, read from files can be cooked to produce a desired visualization output.
There are, however, several serious disadvantages of current recipe managers. First, recipe managers are file-oriented. The only way to get data into a recipe is to read it from a file. There is no integration between the recipe manager and a modern DBMS. Second, current recipe managers are main-memory oriented. The output of each recipe step is passed to the input of its successors through shared main memory. In extensions to distributed recipe managers, information flows between recipe steps through the interprocess communication (IPC) system supported by the operating system of the vendor involved. For recipes that require a very large amount of data to be passed between steps, this use of shared memory or IPC will prove inefficient. Third, there is only a limited notion of time in current recipe managers. Because they support rendering only a single data set, a user cannot "flicker" between two data sets. Put differently, there is only a limited notion of animation in any of the packages. Fourth, there is no notion of version control. In current systems, when a recipe is modified, the older version is discarded. There is no notion of a time sequence of versions or the ability to return to the recipe of a previous time. Named alternatives, popular in source code control systems such as SCCS (Tichy, 1982) and RCS, are similarly missing.
III. Proposed Research
The proposed research can be logically separated into two specific categories. The first category refers to computational challenges of the proposed research in the context of the HPCC. These challenges in turn may be subdivided into tasks focusing on revision and improvement of the parameterizations of sub-grid scale processes, both from the physical and computational points of view, the parallelization of calculations within different components of the coupled Earth Systems Model, and communications between the components in a parallel computing environment.
The second category of research refers to the scientific challenges to be addressed with the coupled Earth Systems Model. The three exemplary Earth science issues to be studied under this proposal are: the seasonal cycle and interannual variability of the coupled atmosphere/ocean system; the distribution of greenhouse gases in the atmosphere; and depletion of the stratospheric ozone layer.
A. Computation Challenges
1. Revision and improvement of the parameterization of sub-grid scale processes
We propose to revise and further develop the AGCM, OGCM, and CTM codes both from the physical and computational points of view. Our development efforts will include a version of the Arakawa-Schubert cumulus parameterization that makes use of a prognostic cumulus kinetic energy. This version permits more realistic coupling between convective and stratiform clouds, drastically simplifies the computational algorithm, and significantly improves the computational speed of the model. Moreover, the parameterization is highly amenable to parallelization, and is considerably more portable than the original version. Our revisions will also include an improved planetary boundary layer (PBL) parameterization, and an improved parameterization of land-surface processes, including a simple but explicit model of photosynthetic carbon exchange. This work is being performed under EOS sponsorship (P. Sellers P.I., D. Randall, Co-I.).
In addition, we plan to couple the ESM to spatially distributed watershed models over rugged terrain to simulate the spatial distribution of snowmelt and snow chemistry processes within the snowpack. These models are driven by large-scale analyses of Earth radiation budget calculated from satellite (Dozier 1989; Dozier and Frew, 1990; Shi et al. 1991), and precipitation, surface wind stress and sensible heat flux produced by the ESM. The satellite data sets used by the watershed models are large: for example, a single Landsat frame (185 x 185 km) is 266 megabytes, and an AVIRIS (Airborne Visible and Infrared Imaging Spectrometer) image is 140 megabytes. They also have high dimensionality: Landsat images have 7 spectral bands, AVIRIS images have 224; AIRSAR (synthetic aperture radars on aircraft) images have 3 frequencies with 4 polarizations each. Work with these datasets will motivate development of fast inverse methods for estimating geophysical properties on parallel machines, distributed management of large data bases, integration of inverse methods into a next-generation data base system (Postgres), and coupling of visualization tools (IDL, AVS, and Khoros) to a data base management system.
We also propose to implement the coupling of the chemistry and physics of trace species to the UCLA Coupled GCM. This coupling will provide a new dimension to global modeling capabilities, which is needed to study in sufficient detail the coupled atmosphere, ocean and chemical tracer interactions controlling the Earth's climate system and the chemical processes of the ozone layer. The CATM can be run in two configurations with the dynamical models: 1) off line, driven by dynamical history tapes; 2) coupled and in parallel with the dynamics model. For the latter configuration, the tracer model grid will be adapted to the AGCM and OGCM grids and boundary conditions.
The atmospheric tracer fields will be initialized using data that are available for specific tracers, or model predictions that have been produced for lower dimensionality (e.g., 2-D model simulations of atmospheric composition). The chemical tracer species can be divided into a number of categories that determine their mode of initialization: long-lived source gases with mainly vertical variations; gases that vary significantly in space and time and that have both chemical and dynamical influences, such as ozone; and photochemical-equilibrium species that can be derived by simple photochemical analysis from the first two types. Using rough initial conditions, model simulations will be bootstrapped to test for stability and accuracy and to obtain a library of initial states for later simulations. The velocity and temperature fields, and any other parameters, required to drive the tracer model will be obtained from the GCMs.
The dynamical and tracer models will be fully coupled and run in parallel. In this configuration, the tracer model will provide detailed gas and particle distributions and radiation fields to the dynamics models, from which heating (and cooling) rates can be calculated. For example, the CATM will predict variation in radiatively-active gases such as ozone and methane. The CATM aerosols can likewise be coupled to the AGCM radiation algorithm, and the microphysical properties of the clouds predicted by the AGCM inferred from the aerosol properties. In turn, the cloud fields predicted by the AGCM can be used to determine the convective pumping and heterogeneous chemical processing of the tracers (Turco et al., 1989).
2. Parallelization of the calculations in the different components of the ESM
We propose to investigate the parallelization of the Coupled GCM. Our goal is to reduce the wall-clock time required to run the Coupled GCM to that required to run the AGCM/Physics only. Such a parallelization involves communications between model components. Overlapping of communication with computation, therefore, becomes an important issue.
A higher level of decomposition than that discussed in Section II.F of this proposal is based on physical domains (spatial or domain decomposition). In this method, the region of model simulation is divided into sectors (domains), and calculations for the sectors are carried out concurrently. This decomposition is straightforward for our codes since both the UCLA AGCM and GFDL OGCM are grid-point models.
The exchange of data between different components of the coupled GCM can also be carried out in I/O subdomains smaller than the entire globe. Namely, instead of sending all the boundary data after one component (AGCM/Physics, AGCM /Dynamics, or OGCM) completes calculation for the entire model domain, the data for each I/O subdomain is sent to the other components as soon as calculation in that subdomain is completed. In this way the transmission for this data can be masked by the computation for the next I/O subdomain in the corresponding component or task.
The I/O decomposition can be used to run in parallel the three components of the Coupled GCM. A possible scheme is shown schematically in Fig. 3, which illustrates the case of four I/O subdomains. In such a scheme, the data produced by the AGCM/Physics for an I/O subdomain is transferred to the AGCM/Dynamics and OGCM. These components advance the models prognostic variables one time step as AGCM/Physics is computing the next I/O subdomain for the previous time step. The AGCM/Physics advances one time step when AGCM/Dynamics and OGCM return updated data. Spatial decomposition is used for calculation inside each I/O subdomain. Further considerations on this decomposition are given in Appendix E.
We also propose to investigate the parallelization of the CATM. This parallelization will involve a reconfiguration of the model and re-ordering of the solutions steps. The code will be considered in terms of its distinct photochemical component and aerosol microphysics components. Initialization of the gaseous species and reaction processes in the photochemical component demands only a very small fraction of the total run time, and will not be affected. The solution of the individual species continuity equations requires a number of specific computational steps, which are time-split in the model: i) horizontal advection; ii) vertical transport; iii) radiative transfer/photodissociation coefficients; iv) photochemical kinetics. The horizontal advection and vertical transport, including any diffusion, will be parallelized in the same manner as for the solution of the continuity equations in the dynamical model. Indeed, the solutions of the continuity equations for air, and the tracers in air, require information only for the local grid points.
The photochemical kinetics calculation involves a number of steps that must be carried out at each grid point individually, and thus can be done in parallel at all grid points simultaneously. The steps within this calculation include: retrieval of meteorological data from the dynamics simulation, such as air temperature , density and humidity; calculation of chemical reaction rate coefficients (which are functions of the above parameters); calculation of the individual chemical reaction rates are needed (i.e., a rate coefficient multiplied by the appropriate species concentrations); determination of the photodissociation coefficients from a calculation for the radiation field or using information on the distribution of absorbers along optical ray paths; assembly of all the photochemical processes into total production and loss rates for the individual species or “family” rate equations; application of an efficient and stable numerical solver to the chemical rate equations; determination of the final species concentrations by partitioning of the families. In the existing CATM, these steps have been extensively optimized for vector operations. Accordingly, the use of vector processors at the nodes within a parallel architecture is ideal for the CATM. In addition, porting of the CATM to a vectorized processor will require minimum algorithm redesign.
For the simulation of aerosol microphysics, the problem –– again for the non-transport terms –– is a local calculation. Particle sedimentation is included in the vertical transport (diffusion and convection) algorithm. The aerosol microphysical processes of primary interest are (see the section on the advanced version of the CATM): nucleation, condensation, evaporation, coagulation, thermochemical equilibrium, and nonequilibrium chemical transformation. The aerosols are also incorporated into the radiative transfer calculations. To represent these aerosol processes accurately, the size distribution of the particles is divided into a set of discrete size bins (the ratio of the particle volumes in adjacent bins is fixed, with the ratio defined as , with ). Inclusion of different materials in the aerosols, and the possibility of several distinct types of particles, increases the number of individual aerosol tracers that must be treated in the model.
The microphysical processes typically cause the aerosols to move across the size grid from smaller to larger sizes (with the exception of evaporation, which reverses this direction). Hence, these processes are analogous to advection across a spatial grid. In the case of the CATM, special techniques have been developed to suppress numerical diffusion for particle growth, as must be done with spatial advection. The mathematical structure of the aerosol physics allows a unique linearized semi-implicit solution to be formulated (Turco et al., 1979a,b). This solution, which combines the nucleation, growth and coagulation processes, is accurate and stable for all conditions of interest. Moreover, the structure of the numerical algorithms allows efficient solutions through a vectorized tridiagonal solver. Thus, again, the individual grid-point calculations can be optimized for aerosol physics by employing vectorizable processors.
3. Algorithm parallelization and load balancing
We propose to investigate issues related to algorithm parallelization and load balancing. These issues are important because realistic ESM simulations will only be possible if we can map the enormous amounts of parallelism available in the model components to the parallelism becoming available on massively parallel computers. The experience gathered by thoroughly understanding this important application can then be used to design general purpose computing tools for use in other applications.
Code development in this project will be performed in a hierarchy of computer environments. Basic development and code debugging will be carried out in a network of scientific workstations, which will represent Digital Equipment Corporation's direct contribution to this effort. Development and test runs of a massively parallel MIMD version of the code will be carried out on the CM-5 at UC Berkeley. This 64-node CM-5 will have a maximum performance of 8 Gflops and 2 Gbytes of main memory; the developed code will be executable on a 1024-node machine with 128 Gflops peak performance and 32 Gbytes of main memory.
In addition to having one of the first three CM-5 installations in the country, members of the Computer Science Division at UCB are engaged in close cooperation with TMC to develop programming and mathematical software tools for the CM-5. There are also several related parallel software development activities at UCB that will contribute to the research in this proposal, and we have had experience in parallelizing several large applications. We have also done a preliminary study of the UCLA AGCM Physics code. We will discuss this background work briefly, and then discuss implications for UCLA AGCM-Physics.
LAPACK, a project headed by Demmel, Kahan and other, recently released the most complete, portable and optimized linear algebra library available for shared memory vector and parallel machines (Anderson et al., 1992). We are currently extending this work to distributed memory machines such as the CM-5, under funding from NSF and DARPA. The first version of LAPACK only targeted dense and band matrices, whereas the new project is targeting sparse matrices as well. We expect the LAPACK experience to be quite useful since much computing time in the ESM is spent in solving tridiagonal linear systems, as well as block structured linear systems within the stiff ODE solver. The experience with parallelizing the ESM will likely influence the future LAPACK design and the next TMC math software library CMSSL as well.
Another project at Berkeley, led by Graham and Yelick with DARPA funding aims to enable realistic computationally intensive programs to be run on massively parallel computers. The focus, therefore, is on making it easy to express the parallelism inherent in such applications, and on compiling the applications to deliver high sustained performance without requiring excessive work on the part of the programmer. The methodology of the group is to parallelize real, large-scale scientific codes in close collaboration with researchers in other fields.
Applications are sped up through a combination of automatic techniques provided by compilers and tools; manual restructuring of the code; and changes in the numerics identified as sources of large-scale improvements and implemented by the collaborators. In each study, the collaborating scientists have benefited by the development of greatly sped-up versions of their code; the Berkeley computer scientists have developed many new ideas in parallel languages, compilation, and run-time techniques.
In the area of language design, these projects led to the development of a "coordination language" called Delirium. In this language, the synchronization and communication patterns of an application can be described (Lucco and Sharp, 1990). The computation itself is expressed in a conventional language such as FORTRAN, making it easy to convert existing sequential programs to run in parallel. In the area of run-time support for scientific computations, the work led to the design of scheduling algorithms, a memory-object layer, and a new parallelization technique.
The scheduling algorithms are applied when the load balance of the iteration space of a program is not known at compile-time: the iteration space is sampled and work is redistributed dynamically to ensure even load. This allows effective parallelization of a much broader class of applications than methods that rely on static or random scheduling (Lucco, 1992). The memory object layer was developed to address the problem of managing memory objects required by dynamic parallel applications. Tarmac is a mobile memory-object layer that allows memory objects to be moved and addressed in a location-independent manner. Because all communication is expressed in terms of these memory objects, the scheduler can redistribute them at will without changing any of the communication code. Tarmac has been implemented on the CM-5, for which we developed the fastest bulk-memory transfer protocol available on the CM-5 (Bacon and Lucco 1992, Lucco and Anderson, 1990). Finally, a run-time technique called "optimistic parallelization" was developed for those cases when future states of a computation can be guessed with high probability; in this case the future computation can be performed without waiting for the current computation to complete, thereby allowing parallelization even when data- and control-dependencies exist (Bacon and Strom, 1991).
Preliminary work on the UCLA AGCM-Physics code has shown that our run-time scheduling techniques will offer substantially better performance than other scheduling methods. This is because the Physics portion of the computation has widely varying computational load: a grid element with cumulus clouds in the tropics requires a great deal more computation than a grid point with a clear sky. The result is that an optimal implementation can not simply assign the same number of grid points to each processor; the assignment must be varying and dynamic. Our Tarmac run-time system for the CM-5 supports exactly this type of time-varying decomposition; Tarmac is also being ported to run on networks of workstations such as the DECstations.
4. The ESM and Data Base Management System (DBMS)
We propose to explore the advantages of a close coupling between the ESM and a Data Base Management Systems(DBMS). The purpose of this coupling is twofold:
i) Storing model output in a DBMS will allow users to query output from previously run models looking for trends of interest. Complex queries to model output are anticipated, as users will browse through model output, and compare model output with observational datasets (i.e., satellite imagery).
ii) Storing model output in a DBMS as it is being generated will allow users to examine model output as the model is running. As such, if the model is not producing desired effects, the user can end the simulation run and save valuable computer resources.
To satisfy these two needs, we propose to explore three different research thrusts. First, each ESM output variable can be considered as a four dimensional array of the variables – longitude, latitude, elevation, time. Since GCMs have vectors of model outputs, they generate a collection of 4-D arrays or a single 5-D array, where the last dimension is an index for the model variable. Providing sophisticated query capabilities amounts to storing this 5-D array in a way that ad-hoc queries can be run against it with good response time.
We plan to examine the advantages of organizing model output into tiles. Each tile would store a range of values in each of the five dimensions contiguously. For example, 10 longitude, 10 latitude, 3 elevation, and 5 time values could be put in a tile for all model variables. With multiple tiles, response to a mix of queries can be tuned by adjusting both the total number of tiles and the number of values from each dimension placed into a tile. We propose to investigate maintaining statistics on the mix of queries being run and then dynamically adjusting the tiling parameters to achieve best possible average performance. Furthermore, it is possible to implement two different tiling systems simultaneously if redundant secondary storage is allowable. With multiple tilings, further optimization of response time can be performed.
A second thrust stems from the realization that model output will typically reside on tertiary memory. Initial thoughts on tertiary memory optimization, expanded bookkeeping, and optimization of expensive functions is presented in Stonebreaker (1991). We propose to continue with this work and to expand the resulting optimizer with knowledge of our tiling approach to array storage discussed above. We expect to develop and implement these techniques within the context of the POSTGRES next-generation DBMS (Stonebreaker, 1990; Stonebreaker et al., 1991, Mosher et al. 1991), under construction at the University of California, Berkeley.
A third thrust is to support the high rate of data insertion associated with collecting data while an ESM model is running. As a result, we propose to explore high throughput insertion schemes that could be added to a DBMS to accommodate the rate of output generation associated with model execution on next generation massively parallel systems. Specifically, we propose to explore lightweight protocols that could support data entry at very low CPU cost. Moreover, we propose to explore solutions to the synchronization and consistency problems that arise with parallel data entry into a DBMS.
5. Development of a new recipe management architecture
We propose to explore a different architecture for recipe management that couples it closely to data base systems. This alternate architecture is discussed below.
Many of the objects visualized by the scientific community are the values of regular arrays of cells. Such objects abound in GCMs as well as in remote sensing applications. Such large objects are best supported in DBMSs. If put in a DBMS, then standard data base services are automatically available such as the query language, automatic query optimization, alternate views of data, a sophisticated rules system, etc. Such capabilities are valuable in building recipe management systems, and as a result, we believe that a recipe management architecture should be DBMS-centric.
Considerable effort has been spent by the DBMS research community in constructing next generation DBMSs that are extendible, i.e. that support user-defined types, functions and access methods. Example data managers in this class are POSTGRES (Stonebraker et al., 1990), IRIS (Wilkinson et al., 1990), Starburst (Haas et al., 1990), and Orion (Kim et al., 1990). The second cornerstone of our proposal is that such type extension facilities can be used to advantage to define a very sophisticated recipe management system.
Based on these observations, we propose to explore the following methodology:
i) Store all data on which recipes operate in a next-generation DBMS
ii) Register all functions with the DBMS which implement the recipe steps in the visualization system.
iii) As a result of 2), any user recipe can be compiled into one or more query language commands. This collection of commands is then optimized by the DBMS query optimizer and run to produce the desired result.
We also propose to build a prototype of this architecture using the next generation DBMS, POSTGRES. Our specific research goals are the following.
First, since a recipe is a data base object and represents one or more queries to a DBMS, it has points in common with the traditional notions of views (Stonebraker, 1975) and stored query plans. Hence, we propose to explore both the similarities and differences between recipes, views and query plans
A common operation in recipe management is to run a recipe, browsing the output as desired, and then change a run-time parameter somewhere in the recipe. Continued browsing of the altered recipe is then expected. A reasonable optimization is to place recipe execution into a state where all data that flows along each arc is captured by the DBMS and retained as temporary data. Then, if the user alters the recipe, the whole recipe does not need to be rerun. Instead, the saved data that is input to the first recipe step which has been changed is re-inputted to that recipe step. Any previous recipe steps do not need to be repeated.
This caching of intermediate results has been advocated in Sellis (1986); however, it is interested in the optimization of multiple queries in a query stream and hopes that a previous result can be useful as a part of a subsequent query. In our environment, when a recipe is changed, we can avoid recreating the whole recipe by using this caching technique. We propose to explore the utilization of this technique in a recipe context.
Furthermore, optimization of the query or queries which results from the compilation process described above is an issue to be studied. It is expected that such queries will have a large number of cascaded functions performing recipe steps. One of the key operations in optimizing such queries will be moving data base selections through such functions to restrict the amount of data on which they operate. We propose to explore these and other optimization tactics that a recipe compiler can use. Our basic approach will be to extend the pioneering work of Selinger (1979) in this direction.
One capability required for recipes has been termed data lineage. Users wish to focus a point or a region in a visualized object. Generally, the indicated data is obviously incorrect. As a result, the user wishes to trace backward through the recipe looking for defective input data or classification errors. In effect, the user wishes a "debugger" that can move backward through recipe execution. This capability has been termed data lineage by the scientific community.
There are several ways to construct the required lineage. First, if a recipe step is invertible, and the person who registered the function involved in the step also provided the inverse function, then, the recipe manager can simply pass the defective data to the inverse function, thereby generating the defective input data to the function. Iteratively performing this step would allow a user to trace any given data back to the beginning of the recipe.
Unfortunately, recipe steps are rarely invertible, As a result, we propose to explore other capabilities. The first is a forward marking system which we call dye. The user would be allowed to mark any subset of the data in a recipe step with dye of a color of his choosing. When recipe execution is resumed, the recipe manager must propagate the dye to output data elements that are computed from dyed input data elements. A user can thereby guess offending input values, dye them, watch the dye appear in the output and prove or disprove his hypothesis about the lineage of offending data. We propose to explore efficient and general techniques for supporting this capability.
The other capability we plan to explore for supporting lineage is a lineage function. When a user registers a function, he can optionally provide a lineage function. In this case, the user does not need to back up to the input to a recipe step and guess appropriate input to dye to check his lineage hypotheses. Instead, he dyes the defective data directly. The recipe manager then passes the dyed data to the lineage function and visualizes the output of this function. Of course, the dye is propagated to the output as described above. If the lineage function provides an approximation to the inverse of the recipe function, then it will provide significant information about the actual data lineage. Hence, our second approach to data lineage utilizes a lineage function that a user can write to provide as much information as possible about the real lineage situation.
Lastly, we propose to build a prototype visualization system using these ideas. We expect to utilize the user interface code form Khoros, interfacing it to POSTGRES (Mosher, 1991) as the basic building block to explore provision of the above capabilities.
B. Earth Science Challenges
The Earth Systems Model described in the previous sections will be applied to investigate important problems related to coupled climate dynamics and chemistry. Three specific problems that will be addressed are outlined below. The applications phase of this proposal serves at least two purposes. First, it provides a means of validating the model accuracy and performance in relation to realistic environmental problems. Second, it offers a unique analysis of these critical problems using a new powerful coupled predictive model.
1. Seasonal cycle and interannual variability of the atmosphere-ocean system
We propose to analyze the simulated seasonal cycle and interannual variability of the coupled atmosphere-ocean system. The development of an ESM that can produce a realistic simulation of the seasonal cycle of the coupled atmosphere-ocean system is one of the major goals of this project. Achieving this goal implies an interdisciplinary effort that brings together observational and theoretical investigations of a broad spectrum of processes, in the atmosphere, the oceans, on land, and in polar regions.
Improved predictions of El Niño/Southern Oscillation (ENSO) – the major feature in the interannual variability of the Tropical Pacific – depend upon better understanding of the seasonal cycle, and on the ability to simulate it with GCMs. The current models have considerable predictive skill if forecasts start in June or July but do poorly if the forecasts start in February. It appears, therefore, that the seasonal cycle modulates the interannual variability. At this time we have a poor understanding of the relation between the annual and interannual variations, and are unable to explain many aspects of the seasonal cycle. Why, for example, is an annual cycle dominant in some equatorial zones (the eastern Pacific) whereas a semi-annual cycle is dominant in others (the central Indian Ocean)? How do land and oceanic conditions compete for the location of major convective zones in the Indian, Atlantic and Pacific sectors?
The seasonal cycle is a very large and accessible periodic global climate change. It is therefore troublesome that the models that predict future climate changes (in response to higher CO2 levels) have difficulty in coping with the seasonal cycle. (At present the models have to resort to "flux" and other "corrections" to simulate the seasonal cycle). The development of a climate model capable of an accurate simulation of the seasonal cycle is of critical importance.
A study of the seasonal cycle is important both for "short-term" purposes (predicting El Niño and, more generally, interannual variability) and for "long-term" purposes (predicting climate changes during the next century). To assess the simulated seasonal cycle of the coupled atmosphere-ocean system requires multi-year simulations with the ESM; the seasonal cycle in any one year is always anomalous, so that analyses of the simulated seasonal cycle will automatically provide data about simulated interannual variations. We will explore the hypothesis that models able to simulate the seasonal cycle will automatically be capable of simulating interannual variability. The two phenomena, the seasonal cycle and interannual variations, are inextricably linked even though they are distinct: the one is a forced response, the other involves natural variability of the seasonal climate system.
Interannual and interdecadal oscillations in the climate system are important both in their own right and because they can mask detection of anthropogenic climate change. Coupled modeling of these phenomena may eventually lead to the possibility of predicting short-term climate fluctuations, as is now experimentally being done for ENSO. We emphasize the modeling of interannual and interdecadal variability for two reasons. First, there are important insights into the climate system that can be obtained from such a study. Second, there are a number of processes in current climate models that are poorly known, and which limit the confidence which can be placed, for instance, in greenhouse warming simulations. In attempting to simulate coupled interdecadal variability, we confront the models with phenomena which are at longer time scales than they have previously been used for. This provides a an additional level of testing for the models, which minimizes the possibility of accidental tuning of the model during its design. At some future stage, a more thoroughly tested version of the model might then be considered a more reliable vehicle for greenhouse warming studies.
2. Global Distribution of Greenhouse Gases
We propose to study the global distributions of greenhouse gases, including the chlorofluorocarbons, methane and ozone, including sources and sinks for these gases, photochemical transformations, and removal processes, using the CATM as a means of integrating these various processes; the simulations will provide a calibration of the dynamical predictions by comparing observational data on global distributions against model predictions. This project will result in a generalized atmospheric dynamics/tracer model that can be applied to a number of problems involving transport, chemistry and radiation. The basic algorithms for treating chemical processes (Turco and Whitten, 1974, 1977, 1978), microphysical processes (Turco et al., 1979a,b; Toon et al., 1988, 1989a), and radiative processes (Toon et al., 1989b) have been developed by the project participants over two decades of research. Moreover, these algorithms have been extensively used in to study the causes of global ozone depletion and to develop models of the underlying physical/chemical mechanisms (e.g., Hamill et al., 1977, 1982, 1988, 1990; Toon et al., 1986, 1987, 1990; Turco and Hamill, 1992; Turco et al., 1982, 1989).
The construction of a three-dimensional chemical tracer model, and its coupling with a global climate model such as that described earlier, represents a major advancement toward a practical, accurate prognostic climate simulation. With such a model, forecasts can be used to forestall environmental degradation and to design effective approaches for preserving the global environment. The development of three-dimensional atmospheric tracer models has been slow. The most recent attempts incorporate limited sets of chemical species or employ low spatial resolution (e.g., Kaye et al., 1989; Cariolle et al., 1990; Kao et al., 1990; Rood et al., 1990). No three-dimensional models yet include aerosols and their effects (Charlson et al., 1987), involving radiative and chemical processes (Drdla et al., 1992a). It is recognized, for instance, that the generation of tropospheric sulfate aerosols from anthropogenic sulfur emissions are the likely reason for the pause in the northern hemisphere warming trend during the 1950's and 60's (Charlson et al., 1990). Impediments to the development of practical 3-D tracer chemistry/microphysics models include the lack of efficient numerical algorithms for complex mechanisms, the enormous computational burden imposed by 3-D chemical/aerosol tracer simulations, and the limited motivation to construct highly sophisticated codes for narrow disciplinary studies. However, recognition of the Earth's climate system as a fully coupled atmosphere/ocean/land/biosphere system that is under stress has created an urgent need for coupled predictive simulations of the climate system.
The present research team includes specialists at UCLA and LLNL who have produced efficient and accurate numerical treatments for atmospheric chemical and microphysical processes. The team also has access to the latest generation of supercomputers, including previously experimental parallel architecture machines that will soon provide sufficient computational speed for the problem. Moreover, the initial steps toward achieving a fully coupled tracer-dynamics model have been successfully taken by the research team (e.g., Erickson et al., 1990; Jacobson et al., 1992a; Lu and Turco, 1991; Penner et al., 1991; Zhao et al., 1992a). These studies, although preliminary to the work proposed here, represent essential experiments that validate the approach adopted for modeling complex dynamical/ chemical/microphysical systems.
The most straightforward application of the coupled dynamical/tracer Earth Systems Model is the simulation of the distributions of global greenhouse gases, particularly the chlorofluorocarbons (CFCs). These gases have well-defined sources, and their concentrations have been monitored worldwide for over a decade. Accordingly, detailed comparisons between model predictions and observations provides an excellent validation scheme for the simulated atmospheric transport. To date, models used to calculate the rates of global tracer transport have used simplified atmosphere and ocean dynamical models. The coupled AGCM and OGCM proposed here represents one of the most advanced coupled dynamical simulations available. On the other hand, the details of the coupled simulations will offer a rigorous test of the coupled model dynamics, and will provide insight into the feasibility and fidelity of coupled models.
In the case of methane and ozone, photochemical processes are a primary influence. The atmospheric lifetimes of methane and other greenhouse compounds are determined by the concentrations of hydroxyl radicals, which in turn are controlled by a complex sequence of chemical reactions. The proposed coupled Earth systems model will be capable of predicting OH concentrations as a function of environmental conditions, and therefore will provide a flexible analytical tool for evaluating the impact of methane emissions.
The Lawrence Livermore National Laboratory atmospheric chemistry model (LACM) includes most of the chemical reactions of interest to this problem. Accordingly, the LACM will be a test bed for the inclusion of complex photochemistry in the CATM. The distribution of trace gases can be simulated with the LACM prior to a full treatment in the CATM. This will allow an initial evaluation of the key processes and expected sensitivities to physical and chemical parameters.
3. Ozone Perturbations
We propose to investigate the global depletion of the stratospheric ozone layer, particularly the high latitude ozone depletions associated with aerosols in both hemispheres, and the coupling of ozone to atmospheric dynamics and climate change. The global depletion of the ozone layer is well established through extensive satellite and in situ observations (Stolarski et al., 1991; Anderson et al., 1991). The massive ozone losses associated with the "ozone hole" over Antarctica (Farman et al., 1985) are attributable to chemical processes that are catalyzed by the presence of ice particles in the stratosphere (Solomon et al., 1986; Crutzen and Arnold, 1986; McElroy et al., 1986). These ice particles - polar stratospheric clouds (PSCs) - are composed of nitric acid and water ices (Toon et al., 1986). However, the stratosphere also holds a ubiquitous layer of sulfuric acid aerosols (e.g., Turco et al., 1982) that may also cause ozone destruction (Hofmann and Solomon, 1989; Turco and Hamill, 1992).
The issue of ozone depletion is significant for several reasons. First, it is a paradigm for global environmental problems, involving many aspects of atmospheric dynamics, chemistry and physics that interact to produce the final effect. Second, the ozone problem requires an accurate predictive capability, to project future potential ozone depletions before they occur so that effective policy actions and control measures can be designed. Third, the magnitude of the effect is highly nonlinear in the basic parameters. For example, the effects of heterogeneous (ice) chemistry in causing large ozone decreases does not occur unless the temperature of the atmosphere falls below a certain threshold value; once below this value, rapid ozone depletion can set in. Hence, temperature changes of only a few degrees in the stratosphere may have important consequences for ozone not predicted by uncoupled dynamical/chemical models.
In most current models, the homogeneous photochemistry of the atmosphere is calculated independently of heterogeneous processes. This is practical because adequate methods to treat heterogeneous chemistry have not been devised and thus are not generally available, and the computer resources required for a full chemical treatment remain prohibitive. We have studied polar ozone depletions with a coupled polar stratospheric cloud microphysics, heterogeneous chemistry, and photochemistry simulation driven by the UCLA version of the U. K. Met Office Stratosphere/Mesosphere Model (Drdla et al., 1991, 1992a). In one case (Drdla et al., 1992a,b), back trajectories for air masses sampled during the Airborne Arctic Stratosphere Expedition-II (AASE-II) were obtained, and the complex microphysical evolution of the PSCs was simulated along these trajectories. The corresponding heterogeneous chemical processing rates were calculated, and inserted into the homogeneous chemical mechanism of the coupled APM. The resulting species concentrations predicted at the aircraft track were compared against ER-2 aircraft data. The good agreement demonstrates the feasibility and usefulness of coupling microphysical and chemical simulations (also see Jones et al., 1990). Another example is offered by recent simulations of volcanic eruption clouds (Turco et al., 1991; J.-X. Zhao et al., 1992a). In this case, sulfur dioxide photochemistry and sulfate aerosol microphysics have been coupled as subroutines in the CATM (Toon et al., 1987), and the dynamical fields were obtained from a stratospheric general circulation model, which uses a version of the CATM to advect passive tracers (Young et al., 1992).
IV. Industrial Support to this Project
In addition to the indirect support to this project provided by Digital Equipment Corporation through Sequoia 2000, DEC has committed direct support (see Appendix F). DEC is interested in the use of networked workstations as a platform for developing and testing models such as those motivating this proposal. The Sequoia workstation network will provide a prototype configuration for testing a distributed ESM, and will provide valuable evaluation debugging and optimization of the algorithms that are distributed.
V. Project Personnel
The Principal Investigators on this project form a multidisciplinary team consisting of atmospheric and oceanic dynamicists and atmospheric chemists, as well as computer scientists. The development of the ESM –– which combines the UCLA AGCM, GFDL OGCM, and Ames/UCLA CATM –– involves the participation of Earth scientists, while the implementation of a high-performance model in an MIMD computer environment requires the participation of computer scientists. The Earth scientists provide technical information regarding the numerical algorithms, data structures, validation tests, and benchmark datasets. Computer scientists provide parallelization strategies and methods for parallel performance assessment, as well as computational resources in the form of parallel developmental platforms and software, and access to vector machines. Communication between the project team members will be greatly enhanced by a Picturetel Videoteleconferencing system that will link the University of California campuses participating in the Sequoia 2000 Project (UCLA, UCB, UCSD, and UCSB).
The leaders of the proposed research effort will be Professors Carlos R. Mechoso and Richard P. Turco. Professor Mechoso has been working with the UCLA coupled atmosphere/ocean GCM for a number of years; he will direct the effort to develop the parallelized version of the coupled GCM, as well as its application to the Earth sciences challenges. Professor Turco has participated in the design and application of the CATM; he will head the effort to develop a parallelized chemistry/microphysics model. Mechoso and Turco will both work toward the development of a fully couples dynamics/tracer model. The other team members and their research specialties are:
Akio Arakawa-Atmospheric numerical modeling, dynamics, and parameterization;
James W. Demmel - Numerical analysis and parallel scientific computing;
Jeff Dozier - Hydrologic models and of satellite and aircraft data analysis.
David Halpern - Physical oceanography and satellite data analysis;
George S. H. Philander - Ocean-atmosphere dynamical systems and modeling;
Michael E. Stonebraker - Data base management systems and communications;
Donald J. Wuebbles - Atmospheric chemistry modeling and analysis
The Co-Investigators on the project are:
William P. Dannevik - Computational hydrodynamics and turbulence
Susan L. Graham - Software tools to aid high performance parallel computing;
Joyce L. Penner - Atmospheric chemistry modeling and analysis;
David A. Randall - Atmospheric physics and dynamics modeling;
Douglas Rotman - Atmospheric chemistry modeling and analysis.
The institutions involved in this project are:
Colorado State University - Department of Atmospheric Sciences (Randall);
Jet Propulsion Laboratory - Earth and Space Sciences Group (Halpern);
Lawrence Livermore National Laboratory - Atmospheric and Geophysical Sciences Division (Wuebbles, Dannevik);
Princeton University - Program in Atmospheric and Oceanic Sciences (Philander);
University of California Berkeley - Computer Sciences (Demmel, Graham, Stonebreaker);
University of California Los Angeles - Department of Atmospheric Sciences (Arakawa, Mechoso, Neelin, Turco).
University of California Santa Barbara - Center for Remote Sensing and Environmental Optics (Dozier)
The budget includes support for two postdoctoral researchers, two students and 50% of a programmer at UCLA. Travel funds are also allocated to support one trip per year to UCLA by each non-UCLA PI and one trip per year for each to a scientific conference for each PI. At UCLA, one postdoc will be assigned to perform parallelization of and simulations with the dynamical models (AGCM and OGCM) under the direction of Mechoso, Arakawa and Neelin. A second postdoc will work between UCLA and LLNL on the atmospheric photochemistry algorithms, parallelization of the chemistry codes, and atmospheric chemistry simulations, under guidance from Turco and Wuebbles. One UCLA student will focus on the dynamical modeling of the seasonal cycle, as formulated by Mechoso, Arakawa and Neelin. The second student will work on the problem of aerosol microphysical and chemical simulation using the CATM, with Turco as an advisor. The third student will work on the hydrological models with Dozier as an advisor.
Subcontracts (attached) provide detailed budgetary information concerning the collaborative research between UCLA and UC Berkeley, Princeton University, Colorado State University and Lawrence Livermore Laboratory. The UC Berkeley subcontract supports the computer science elements of the proposal, while the Princeton and CSU subcontracts support both the computational and science elements, as described in body of the proposal.
APPENDIX A
The UCLA Atmospheric GCM (AGCM)

The UCLA AGCM predicts the values of horizontal velocity, potential temperature, water vapor and ozone mixing ratios, surface pressure, and ground temperature. In an approach unique to the UCLA model, the planetary boundary layer (PBL) is treated as well-mixed and represented by the variable-depth bottom layer of the model. The depth of this layer is also predicted by the model.
The AGCM includes parameterizations of PBL processes using bulk assumptions for the description of turbulence (Suarez et al., 1983). Surface fluxes of sensible heat, moisture and momentum are modeled using the bulk parameterization proposed by Deardorff (1972). The model also includes parameterizations of cumulus convection and its interaction with the PBL (Arakawa and Schubert, 1974), stratus clouds, and solar and infrared radiative heating (Katayama, 1972 and Harshvardhan et al., 1987, respectively). The cloudiness used in the radiation calculation is predicted. A parameterization of orographic gravity wave drag similar to that developed by Palmer et al (1986) is included in the model. Efforts are under way to develop improved cloud formation parameterizations, especially for convective and cirrus clouds, based in part on the use of explicit ice and liquid water variables.
In the vertical, the model is based on a coordinate system for which the lower boundary, the PBL top, and isobaric surfaces above a prescribed pressure level (100 mb) are coordinate surfaces (Arakawa ans Suarez, 1983). The top of the model atmosphere is assumed to be a material surface. The vertical finite-differencing used above the PBL guarantees conservations of the global mass integrals of potential temperature and total potential plus kinetic energy under frictionless adiabatic processes.
The equations are horizontally discretized using a staggered atitude-longitude “C” grid (Arakawa and Lamb, 1977). The scheme for the horizontal advection terms in the momentum equation conserves potential enstrophy and gives fourth-order accuracy for the advection of potential vorticity. The horizontal advection scheme used for the potential temperature is also fourth-order and conserves the global mass integral of its square. The scheme for the horizontal advection of water vapor and ozone does not allow the occurrence of negative values. In all other terms, including the continuity equation, the pressure gradient force and the definition of absolute vorticity, the differencing is of second-order accuracy.
The geographical distributions of surface albedo and ground wetness are interpolated from prescribed monthly means based on the observed climatology.
At present, the UCLA AGCM has a tropospheric version with the top at 50 mb and a tropospheric-stratospheric version with the top at 1 mb. These versions can be configured to run at low and high vertical resolutions: 9- and 17-layer for the tropospheric version, 15- and 29-layer for the tropospheric-stratospheric version. For each vertical resolution, there are two horizontal resolutions. The coarse (standard) horizontal resolution has a grid of 5° longitude by 4° latitude, and the fine horizontal resolution version has a 2.5° by 2° grid.
The GCM has been evaluated in a variety of studies including long-term simulations of monthly mean fields (Suarez et al., 1983; Randall et al., 1985), experimental medium-range (10-day) predictions (Mechoso et al., 1985, Mechoso et al., 1986}, and assessments of the impact of SST anomalies on the atmospheric circulation (Mechoso et al., 1990).
APPENDIX B
The GFDL Ocean GCM (OGCM)

The OGCM is based on that developed at the NOAA Geophysical Fluid Dynamics Laboratory (GFDL)/Princeton University by K. Bryan and M. D. Cox (Bryan, 1969; Cox, 1984). The OGCM predicts the horizontal velocity, temperature, and salinity. Density is determined from the temperature and salinity using Knudsen's equation of state. The model uses depth as the vertical coordinate. The top of the model is assumed to be a rigid lid. In the horizontal, the equations are discretized using a staggered longitude-latitude “B” grid (Arakawa and Lamb, 1977).
At present, we are using two versions of the OGCM: a) the Global-OGCM (G-OGCM), which covers the ocean in the latitude belt from 60°S to 60°N; and b) the Tropical Pacific-OGCM (TP-OGCM), which covers the Pacific Ocean in the latitude belt from 28°S to 50°N. The northernmost and southernmost parts of the domains are relaxed towards the observed climatology in both salinity and temperature fields. Incorporation of a sea-ice module is also planned. When this is complete, the southern boundary of the G-OGCM will be extended to the periphery of the Antarctic continent. Also, we are incorporating to the G-OGCM a module that simulates the microphysics and photochemistry of constituents under the influence of fluid motions in a three-dimensional field (Turco et al.,1989; Toon et al., 1989).
The G-OGCM incorporates realistic bottom topography. The horizontal resolution is 1° longitude by 1° latitude, and there are 15 unevenly spaced levels in the vertical. The TP-OGCM has 27 levels in the vertical, with 10 levels equally-spaced over the upper 100 m. The ocean depth is assumed to be constant at approximately 4,150 m. In longitude, the resolution is 1° in latitude, the mesh size is 1/3° between 10°S and 10°N and increases gradually toward the poles. Table 1 gives a summary of OGCM timings on a CRAY Y-MP using one processor.
A crucial part of the OGCM is the parameterization of vertical transports by sub-grid turbulence. These transports play a major role in distributing heat and momentum from surface to the deep ocean. In the original configuration of the TP-OGCM, representation of turbulence terms in the governing equations is based on first-order turbulence closure (K-theory), in which the vertical mixing coefficients are taken to be a function of the local Richardson number (Pacanowski and Philander, 1981). We have implemented alternative formulations of the mixing processes, including the the Mellor-Yamada level (2–1/2) second-order turbulence closure scheme (Mellor and Yamada, 1974, 1982). This scheme adds to the model two additional prognostic equations for turbulence-related quantities.
We have performed a series of multi-year simulations with the uncoupled TP-OGCM using both the first- and second-order turbulence closure schemes described above (Ma et al., 1991). These simulations generally produce realistic structure for the ocean currents and temperature field. The second-order scheme produces in general deeper mixed layers and sharper thermoclines than the first-order scheme, particularly in the eastern equatorial Pacific.
APPENDIX C
The CNRI Gigabit Testbed Initiative
The CNRI Gigabit Testbed Initiative is a three-year project for research on very high speed (gigabit per second) communication network. The major goals of this research are to develop architectural alternatives for consideration in determining the possible structure of a wide-area gigabit network serving the research and education communities, and to understand the utility of gigabit networks by the end user.
CNRI's role is to lead a testbed-based research effort consisting of collaborators from universities, national laboratories, supercomputer centers, and major industrial organizations. The major activities revolve around a set of five testbeds: AURORA, BLANCA, CASA, NECTAR, and VISTANET.
The principal research organizations in the CASA wide-area testbed are the Los Alamos National Laboratory (LANL) in Los Alamos, New Mexico; the California Institute of Technology (Caltech) and the Jet Propulsion Laboratory (JPL) in Pasadena, California; and the San Diego Supercomputer Center (SDSC) in conjunction with UCLA. The carriers collaborating in the CASA testbed are MCI, Pacific Bell, and U S West. The testbed will connect JPL, Caltech, SDSC and LANL.
The CASA testbed investigates whether distributed supercomputing over wide-area high-speed networks can provide new levels of computational resources for leading-edge scientific problems. In distributing the GCM code we explore the methodology and performance issues for decomposing scientific simulations to run concurrently on computers of different architectures.
Appendix E
Network Bandwidth Requirements
The organization of computation and communication shown in Fig. 3 is valid only when the time steps of all three modules are equal. The time steps of the AGCM/Physics and OGCM, however, are usually much larger than that of AGCM/Dynamics. Let us focus on the AGCM. For this model, the AGCM/Dynamics goes through time steps before the next time step of AGCM/Physics starts. The finite-differencing in the AGCM/Dynamics implies that to advance a domain time steps requires information from neighboring domains for the same time steps. The scheme in Fig. 3, however, does not advance neighboring domains more than one time step.
To account for different time steps of the AGCM components, we divide the AGCM/Dynamics in ( - 1) I/O domains, and latitude bands for each I/O domain. The resulting organization of the calculation is schematically shown in Fig. E1, in which we have taken = 4. Arabic characters in Fig. E1 indicate the time steps in AGCM/Dynamics being executed for the corresponding latitude band. Here, we have also assumed that the additional data required by the finite-differencing in a domain of the AGCM/Dynamics is one extra latitude band, and that there are no communication delays. If is also the ratio between the time required to compute one AGCM/Physics time step and one AGCM/Dynamics time step, then there will be no idle time for the processors assigned to those model components. A similar procedure can be applied to the OGCM. This method, therefore, allows for the parallelization of the three components of the Coupled GCM, so that the only wall clock time required to run the model is that corresponding to the AGCM/Physics. The method can be extended to a geographically distributed computer environment. In this case, the number of I/O subdomains has to be increased to account for network delays .
The efficiency of an application distributed across a network is affected by network delays. If the network covers a wide geographical area, the time required for communication between remote locations, in particular the time required for message to traverse long distances, can become a major issue. When designing algorithms intended for applications using a wide-area network, it is imperative to overlap data transmission with computation so that the cost of communication can be minimized.
Because large amounts of data need to be exchanged over the network within a limited time, there is a minimum requirement on the bandwidth of the network. To study the bandwidth requirement for the distributed GCM, we have to estimate the flow of data, the network latencies and the time within which data transmission has to be carried out. Here we obtain an estimate by considering a simple model of the network and a scenario based on two assumptions:
1. The AGCM/Physics is considered as the slowest component of all three modules and
2. the time to communicate boundary data for one I/O subdomain is masked by the AGCM/Physics computation for the next I/O subdomain.
We will also assume that the data produced by AGCM/Dynamics and OGCM for a particular subdomain are received by AGCM/Physics before the computation of next time-step for that subdomain begins. This can usually be satisfied by making , the number of I/O subdomains, large enough. Therefore, this assumption implies the existence of a limit on the minimal that can be used.
Assumptions (1) and (2) can be expressed as an inequality
(E1)
where is the time required to integrate AGCM/Physcis one timestep for the whole globe. The other terms in Eq. (1) are defined by:
(transmittal time)
(round-trip latency)
(error correction)
(contention delay)
(CPU overhead)
where is the total size of boundary data of the climate model, and is the bandwidth. The network related parameters and their estimated values for the projected gigabit network are listed in Table E.1 (Moore 1991). Equation (E1) determines the minimum bandwidth required for a particular model resolution and network configuration.
Table E.1. Parameters for the gigabit/second network.
variable definition units value
d






distance
speed in fiber
packet size
packet error rate
routing delays
contention delays
fractional protocol overhead
system overhead m


errors/packet






*


Solving Eq. (E1) for , one obtains the minimum bandwidth required

(E2)
if
(E3)
And the bandwidth efficiency in this case is given by
(E4)
For the current coarse-resolution AGCM coupled to the global OGCM, . Using one processor on the Cray Y/MP, is approximately . If we take the number of I/O subdomain to be 10, and neglect the error rate, the estimated minimum bandwidth is 7 Mbits. This is less than the bandwidth of a T3 network.
As the computing power increases or as the model code is further parallelized, the minimum bandwidth will increase. To consider the dependence of required bandwidth on the computing power, we introduce the relationship
(E5)
where is the number of floating point operations per AGCM/Physics step, and is the execution rate. Examination of Eq. (E4) reveals that the bandwidth efficiency for a given network configuration ( fixed) is approximately constant when remains constant. Therefore we consider the case for which the model resolution will be increased in such a way that will always remain constant. With Eq. (E5) this implies that increases linearly with . The size of data transmitted has a fixed relationship with since both are functions of model resolution. The number of operations does not have a simple linear dependence on model resolution because vectorization tends to make the increase less than linear, whereas some operations (matrix inversion, subgrid-scale physics) tend to increase more than linearly with model resolution. We have assumed their relationship to be
(E6)
We also assume that the CPU overhead decreases like
(E7)
since the system overhead is expected to decrease with faster computers. Using Eqs. (E5), (E6), (E7) and assuming to be constant, we can obtain and as functions of execution rate (Fig. E2). In the figure we also indicated the estimate for running the high resolution (2.5° longitude by 2° latitude) GCM. It is clear that with this configuration not only is it necessary to have a gigabit network but also that the climate model can use it at an efficiency of 90%.
We have conducted several experiments to analyze the performance of the
distributed climate model using existing networks. In particular we attempted to quantify the following type of latencies:
• round-trip delay,
• I/O access delays – time spent waiting for I/O requests to be processed through the CRAY I/O Subsystem,
• CPU access delays – time spent waiting for the UNICOS CPU scheduler to connect a process already in memory to a CPU, and
• memory access delays – time spent waiting for the UNICOS memory scheduler to swap the distributed task into memory from disk.
A series of experiments were made to quantify the latencies associated with level of access. The task-decomposed GCM was first run on the SDSC CRAY Y-MP during dedicated time. The same run was then duplicated during production use of the SDSC Y-MP, illustrating the impact of memory access delays. Finally, the distributed application was run using both the SDSC and NCAR CRAY Y-MPs across a T1 link. For the dedicated run inside the CRAY used a total CPU time of 210.22s for a one day simulation. The total wall clock time used was only 167.80s. Therefore 42.42s was saved for the one-day run by having the AGCM/Dynamics and OGCM running in parallel.
Interpretation of the results required additional tests. The seemingly low speed of internal interprocess communication on the CRAY Y-MP was analyzed by comparing the achieveable bandwidth as a function of message size, kernal buffer size for the TCP/IP protocol, and the number of messages in flight allowed between acknowledgements (window scaling factor). Table E2 gives a series of analyses for both dedicated use of the CRAY Y-MP and for production use on a fully utilized system.
Table E.2a
Bandwidths as a function of message size and buffer size on a dedicated system (MBytes/sec).

Message size buffer size
(kB) 16 kB 32 kB 64 kB 128 kB 256 kB
64 24.8 42.5 53.8 69.3 69.4
32 24.6 42.4 47.3 46.6 54.8
16 24.5 37.8 43.0
8 17.9 30.0 31.0

Table E.2b
Bandwidths as a function of message size and buffer size on a heavily loaded system (MBytes/s)
Message size buffer size
(kB) 16 kB 32 kB 64 kB 128 kB 256 kB
64 6.3 12.5 16.0 14.6 17.9
32 6.3 14.5 13.3 12.6 14.6
16 7.1 8.8 8.0
8 4.4 6.3 6.6 Table E.2c
Bandwidths as a function of window size and buffer size for a 64 kB message on a dedicated system (MBytes/s)
window scaling buffer size
factor 64 kB 128 kB 256 kB 512 kB
4 48.9 67.4 83.4 76.5
2 48.9 67.1 82.1 80.3
1 48.9 66.9 66.8 66.4
0 53.8 69.3 69.4
Table E.2d
Bandwidths as a function of window size and buffer size for a 64 kB message on a heavily loaded system (MBytes/s)
window scaling buffer size
factor 64 kB 128 kB 256 kB 512 kB
4 10.6 15.4 15.8 15.0
2 11.1 15.9 15.5 17.3
1 13.1 16.9 13.4 15.6
0 16.0 14.6 17.9
The bandwidth achieved in the GCM experiments also showed dependencies on the message size. An average bandwidth of 13.2 MBytes/sec was obtained for a message of size 5 kB, and 40 MBytes/sec for a message of 270 kB for the dedicated run. These are consistent with findings in the simple experiments for the default buffer size of 32 kB.
For the production runs, the communication time has increased by a factor of 1725, which is much larger than the expected increase of a factor of 3-5. The magnitude of the deterioration in performance is dependent on the size of the program being executed. On a heavily loaded system, UNICOS will swap jobs out of memory that are waiting for I/O to complete. The length of time needed to do the the swap is proportional to job size. The time the job remains on disk is also dependent on the availability of space in memory for swapping the jobs back in. The test program was much smaller in size than the GCM. Thus the job waited on disk longer for the I/O to complete, producing a drastically lowered effective bandwidth. Of interest is the fact that window size scaling provided no effective enhancement in bandwidth on a heavily loaded system. At SDSC the I/O load to disk average 17 MBytes/sec. This contention for I/O resources completely dominates any time saved by keeping multiple messages in flight between acknowledgements.
The time spent swapped out of memory is greatly increased during the production run. This additional time represents the contention for access to memory when time sharing is done.
The wall clock time of the distributed run is dominated by the time required to transmit the data over a heavily loaded T1 link. Transmission rates of 10-15 kBytes are observed even for very large files sent across the link. Simple experiments showed that store-and-forward delays accounts for a large percentage of this delay.
In order to achieve high effective bandwidth utilization across a gigabit/second link with the present CRAY supercompuer technology, the distributed application will have to be run on dedicated systems. The development of sophisticated job scheduling algorithms will be needed to avoid memory access delays and I/O resource contention delays when such applications are run in competition with production job mixes.
Several questions can be raised against the ultimate viability of our distributed application in a realistic environment of geographically distributed computers. The distributed application will have to compete with other processes on each individual machines the application is distributed. The initialization of processes and the synchronization between them require sophisticated enough interprocess communication tools, as well as special considerations from system administrators. These problems will have to be addressed if distributed computing across wide-area networks is to become an established way of scientific research.