Coupling Parallel Models:

Scientific model developers often need to combine or couple existing models. In this feature story, Linux Magazine's "Extreme Linux" columnist Forrest Hoffman shows how to couple parallel models without a toolkit.


One of the challenges that scientific model developers frequently encounter is the need to combine two or more existing simulation codes to produce a higher-order systems model. Earth scientists often find themselves in this predicament because models for the many natural, interacting processes that operate on many different scales have been developed by a wide variety of researchers. Moreover, there is a strong desire to scale everything up to understand how the entire Earth system works and to be able to make predictions of weather, climate, and natural hazards.

Coupling parallel models can be a significant challenge for even the most seasoned programmer because the parallel decomposition in each of the submodels adds a level of complexity that’s difficult to overcome. Of course, science issues must also be considered: Is it valid to couple the models in question? Do they operate on temporal and/or spatial scales that are compatible? Is coupling really necessary, or are the feedbacks slow enough that they could be just as effectively run separately in off-line mode?

As model coupling has become more common, toolkits and modeling frameworks are being developed to help simplify the process. While still a young technology, these software infrastructures provide a number of services to handle data exchange among submodels, space and time averaging and integration, and utilities for managing domain decomposition.

One package that is starting to be adopted into software projects is the Model Coupling Toolkit, or MCT. Developed by some fearless programmers at Argonne National Laboratory, MCT supports generalized spatial grids, provides averaging and accumulation buffers, and has communications schedulers for parallel MxN submodel data transfer. Although MCT is still being developed, it already has enough functionality to support many model coupling needs.

Other developing technologies to keep an eye on are the Common Component Architecture (CCA), a set of standards for high-performance component frameworks, and the Earth System Modeling Framework (ESMF), a NASA-funded collaboration with the goal of building a high-performance, flexible software infrastructure to support Earth science applications. ESMF is just starting to get off the ground, and is being formulated through scientific community input.

Assuming all of the science issues have been dealt with, what does it take to glue two parallel models together without using a toolkit?

Let’s look at an example with two fake submodels — one representing Earth’s atmosphere and the other the land surface — coupled together through the exchange of a flux value. In this example, let’s assume that the two submodels operate with the same time step and share the same grid at the surface of the Earth. However, each submodel has a different parallel domain decomposition of this grid. A single comment line stands in place of any scientific code; let’s just focus on the glue. The similarity of this example to any real code is purely coincidental.

Style Counts!

You’ve gotta have style — coding style, that is. Good programming style makes code more readable and far easier to maintain. Functionality should be isolated into separate code pieces to obtain good modularity, to protect data structures from being manipulated promiscuously in other sections of the model, to promote subroutine reuse, and to avoid code replication. Modules are used in Fortran 90 to achieve these goals. Here are some other elements of style used in the sample code: individual model components or submodels have interface subroutines to initialize and finalize them; parameters passed to subroutines and functions are declared using the intent(in), intent(out), or intent(inout) declarations; and message passing routines are wrapped to make it easier to port the code to another message passing architecture.

Building Up Coupling Code

With these principles in mind, let’s build up some code for our land/atmosphere coupling example. (Only parts of the source code for the coupling example are included here. Space constraints leave little room for lengthy listings and no room for the kind of error checking warranted. However, all of the sources, including a Makefile and the land mask data file, are available on the Linux Magazine website at http://linux-mag.com/download/2003-06/extreme.)

Listing One shows grid.f90, which contains the source code for a module called grid. This module defines the size of the longitude/latitude grid used by the land and atmosphere submodels. Additionally, it creates the land mask used by the submodels to know whether or not each grid point contains land. The land mask is read from a disk file by the first parallel process and is subsequently broadcasted out to all processes.

Listing One: The grid module, grid.f90

module grid

use comm_interface, only: iam, comm_bcast_2d_int
implicit none
integer, parameter, public :: nlon = 128 ! longitude points
integer, parameter, public :: nlat = 64 ! latitude points
integer, dimension(nlon, nlat), public :: land_mask
public land_mask_init

subroutine land_mask_init()
implicit none
integer :: i, j

! Initialize the land mask
if (iam == 0) then
do j = 1, nlat
do i = 1, nlon
read(11,*) land_mask(i,j)
end do
end do
end if

call comm_bcast_2d_int(land_mask, (nlon*nlat), 0);

end subroutine land_mask_init

end module grid

The grid module uses and depends upon one other module, comm_interface, which is the wrapper for MPI (hence the use statement at the top). Although not necessary for compiling the code, good coding practice dictates that we explicitly list the data structures, subroutines, and functions this module requires from the used module in an only: clause in the use statement. This practice makes it much easier to track inter-module dependencies, and it forces the programmer to consider the overall modularity and structure of the model when adding calls to functions and subroutines.

Within grid, iam is the process rank and comm_bcast_ 2d_int() is a subroutine that broadcasts a two-dimensional integer array to all processes (by calling MPI_Bcast() when using MPI). The process rank and the wrapper subroutine are used in the land_mask_init() subroutine. The implicit none statement prevents variables from having implicit types.

Next, are the number of longitude (nlon) and latitude (nlat) points the shared grid possesses. These two integers are parameters (meaning their values are set at compile time), and are declared public so that other modules can access them. The land mask (stored in land_mask) is a two-dimensional integer array that’s also declared public. The only subroutine in this module, land_mask_init(), is also declared public so that it can be called from an external module or program.

After the SAVE statement, which gives these variables a static storage class, comes the private statement, which declares that any other objects without an explicitly declared scope will be accessible only within the present module. After the CONTAINS statement comes the land_mask_init() subroutine. This subroutine simply reads the land mask data from a file in the first process and calls the MPI wrapper routine, comm_bcast_2d_int(), to send the land_mask array of size (nlon*nlat) from process 0 to all processes. This subroutine is likely to be called only once near the beginning of a model run.

Submodel Modules

Listing Two contains the atm module. It uses iam, nproc, and comm_barrier() from our communications wrapper module. nproc is the total number of processes involved in the computation, and comm_barrier() provides the normal barrier functionality by calling MPI_Barrier() when using MPI. The atm module uses nlon, nlat, and land_mask from the grid module described above, and it has ten public subroutines and functions.

Listing Two: The atm module, atm.f90

module atm

use comm_interface, only: iam, nproc, comm_barrier
use grid, only: nlon, nlat, land_mask
implicit none
! initialize atmosphere domains
public atm_domain_init
public atm_flux_fill ! fill surface flux array
public atm_domain_print ! print surface flux array
public get_atm_owner_coord ! fill domain/owner info
public get_atm_local_domain_id ! return local domain id
public get_atm_flux_shape ! return shape of array
public get_atm_surface_flux ! return atm surface flux
public update_atm_flux ! update atm surface flux
public step_atm ! do an atm time step
public atm_domain_finalize ! finalize atm domains
! columns per domain
integer, parameter :: dom_cols = 128
integer, parameter :: ncols = (nlon * nlat) ! columns
integer, dimension(:), allocatable :: dom_cnt ! domains
type atm_domain
! actual number of columns in domain
integer :: ncols
! vector of longitude coordinates
integer :: lon(dom_cols)
! vector of latitude coordinates
integer :: lat(dom_cols)
! Process which owns this domain
integer :: proc
! local domain number
integer :: local_domain
end type atm_domain

! globally defined domains
integer :: natm_domains
type(atm_domain), dimension(:), allocatable :: atm_domains

type coord_domain
integer :: domain_id
integer :: column
end type coord_domain
type(coord_domain), dimension(:,:), allocatable :: coord_domains

real, dimension(:,:), allocatable :: surface_flux

subroutine atm_domain_init()
implicit none
integer :: i, j, d, p

natm_domains = ncols / dom_cols
if (natm_domains * dom_cols < ncols) &
natm_domains = natm_domains + 1
write (*,*) iam, ‘creating ‘,natm_domains,’ atm domains’
allocate(coord_domains(1:nlon, 1:nlat))
atm_domains(:)%ncols = 0

! Perform cyclic atmospheric column assignment to domains
d = 1
do j = 1, nlat
do i = 1, nlon
atm_domains(d)%ncols = atm_domains(d)%ncols + 1
atm_domains(d)%lon(atm_domains(d)%ncols) = i
atm_domains(d)%lat(atm_domains(d)%ncols) = j
coord_domains(i,j)%domain_id = d
coord_domains(i,j)%column = atm_domains(d)%ncols
d = d + 1
if (d == natm_domains + 1) d = 1
end do
end do

! Assign domains to processes in cyclic fashion
dom_cnt(:) = 0
p = 0
do d = 1, natm_domains
atm_domains(d)%proc = p
dom_cnt(p) = dom_cnt(p) + 1
atm_domains(d)%local_domain = dom_cnt(p)
p = p + 1
if (p == nproc) p = 0
end do

! Allocate local surface flux array
surface_flux(:,:) = 0.0

end subroutine atm_domain_init
subroutine atm_flux_fill()
implicit none
integer :: d, c
do d = 1, natm_domains
if (atm_domains(d)%proc == iam) then
do c = 1, atm_domains(d)%ncols
surface_flux(atm_domains(d)%local_domain, c) = &
atm_domains(d)%lon(c) * 1000 + &
end do
end if
end do

end subroutine atm_flux_fill
subroutine atm_domain_print()
end subroutine atm_domain_print
subroutine get_atm_owner_coord(ncells, lons, lats, & atm_ids, columns, owners)
implicit none
integer, intent(in) :: ncells, lons(ncells), lats(ncells)
integer, intent(out) :: atm_ids(ncells), & columns(ncells), owners(ncells)
integer :: i

do i = 1, ncells
atm_ids(i) = coord_domains(lons(i),lats(i))%domain_id
columns(i) = coord_domains(lons(i),lats(i))%column
owners(i) = atm_domains(coord_domains(lons(i), &
end do

end subroutine get_atm_owner_coord
integer function get_atm_local_domain_id(did)
implicit none
integer, intent(in) :: did

get_atm_local_domain_id = atm_domains(did)%local_domain

end function get_atm_local_domain_id
subroutine get_atm_flux_shape(ndomains, ncolumns)
implicit none
integer, intent(out) :: ndomains, ncolumns

ndomains = dom_cnt(iam)
ncolumns = dom_cols

end subroutine get_atm_flux_shape
subroutine get_atm_surface_flux(atm_flux)
implicit none
real, dimension(:,:), intent(out) :: atm_flux

atm_flux(:,:) = surface_flux(:,:)

end subroutine get_atm_surface_flux
subroutine update_atm_flux(atm_flux)
implicit none
real, dimension(:,:), intent(out) :: atm_flux

surface_flux(:,:) = atm_flux(:,:)

end subroutine update_atm_flux
subroutine step_atm()
implicit none

! This is where all the science stuff goes

end subroutine step_atm
subroutine atm_domain_finalize()
implicit none

deallocate(atm_domains, coord_domains)

end subroutine atm_domain_finalize

end module atm

Following the private statement are some variables and data structures that are only accessible from within the atm module. The nlon by nlat grid shared by the atmosphere and the land will be decomposed into separate domains, and these domains will be “farmed out” to processes in a cyclic (or round robin) fashion. The first step is to define these domains, and decide how large they should be. The dom_cols parameter, which is set to 128, defines how many of these grid points (called columns in the atmosphere submodel) will be assigned to each domain.

The ncols parameter uses nlon and nlat from the grid module to declare the total number of columns (the total grid size) for the atmosphere. The dom_cnt array will later contain the number of domains assigned to each process involved in the computation. Next, the atm_domain derived data type is defined. It contains a count of the columns in the domains (ncols); arrays of longitude (lon) and latitude (lat) indices (one per column); the process that owns the domain (proc); and the local domain number (local_domain). The natm_domains variable stores the total number of domains that will be defined. Then the atm_domain definition is used to declare an array of these domains called atm_domains.

Because it’s necessary to quickly and efficiently recover the domain and column number for any column given a longitude and latitude index, a reverse mapping data structure called coord_domains is created. Finally, a local array containing surface flux is declared. These data structures are completely private to the atm module, i.e., no other parts of the code know about them. To access information stored in these structures, external modules must call routines within the atm module. This methodology offers a clean and consistent mechanism for handling data in complex simulation models; a module owns its own data structures and is responsible for mediating access to the information contained therein.

Following the CONTAINS statement are the subroutines and functions that make up the atm module. As preached above, a submodel initialization routine (atm_domain_init()) and a submodel finalization routine (atm_domain_finalize()) are provided for the atmosphere. The initialization subroutine figures out how many domains to build, allocates memory for those domains, assigns individual columns to the domains in a cyclic fashion, assigns the now constructed and filled domains to parallel processes in a cyclic fashion (counting as it goes), and allocates memory for the surface flux array with a process-specific size. The atm_domain_finalize() routine frees the memory allocated in atm_domain_init().

The atm_flux_fill() subroutine fills the declared, allocated, and initialized surface_flux array with bogus values: the longitude coordinate times 1000 plus the latitude coordinate. Stuffing these values into the surface_flux array (to be sent to the land submodel) makes it easy to verify that the fluxes are ending up in the right place on the land side.

The atm_domain_print() routine, not shown in the listing, writes out coordinates and surface flux values to a file called atm.out so you can verify that values are being correctly exchanged between submodels. The get_atm_owner_ coord(), get_atm_local_domain_id(), get_atm_flux_ shape(), and get_atm_surface_flux() routines provide interfaces to information stored in the private data structures declared near the top of the module. The update_atm_ flux() subroutine updates the private surface_flux array. Subroutines contained in other modules must access the information owned by the atm module by way of these interfaces.

Finally, the step_atm() subroutine does nothing; it’s a stand-in for all of the physics involved in moving the atmosphere forward one time step. Since we are just interested in the coupling infrastructure, no science routines are shown.

A listing for the lnd module, which represents the land surface submodel, is not printed here, but it is available on the web site. It’s very similar to the atm module: it has initialize and finalize interfaces (lnd_domain_init() and lnd_domain_finalize()); an empty subroutine representing the science routines (step_lnd()); a routine for stuffing bogus values into its surface_flux array (lnd_flux_fill()); a routine for saving off surface_flux values (lnd_domain_print()); and a number of subroutines and functions for accessing the private data structures contained in the module.

The land submodel considers grid points to be land cells, and it’s set up to have 64 such cells in each of its domains instead of the 128 columns used in each atmosphere domain. Only grid points containing land cells are included in the land domain structures. On Earth, the land occupies only about one third of the total grid points at the surface. In the reverse mapping structure (coord_domains) used by the land submodel, non-land points have domain and cell index values of zero. Although the land submodel uses the same processes as the atmosphere submodel, there may be little overlap between atmosphere columns and land cells on the same process because the domain sizes are different and the land has far fewer cells than the atmosphere has columns.

The Coupling Module

With the data structures for the atmosphere and land submodels complete and with interfaces in place, you can now create a coupling module to provide the mapping between the columns in the atmosphere and the (two thirds fewer) cells on the land surface. This module will also contain subroutines that perform the actual data exchange between the atmosphere and the land since these mappings are private. Listing Three contains the coupling module that performs these functions.

Listing Three: The coupling module, coupling.f90

module coupling

use comm_interface, only: iam, nproc, comm_real_exchange
use atm, only: get_atm_owner_coord, & get_atm_local_domain_id, &
get_atm_flux_shape, get_atm_surface_flux, & update_atm_flux
use lnd, only: get_nlnd_domains, get_lnd_ncells_id, &
get_lnd_owner_id, get_lnd_coord_id, & get_lnd_local_domain_id, &
get_lnd_flux_shape, get_lnd_surface_flux, &
implicit none
public coupling_init ! Initialize submodel coupling
public coupling_finalize ! Destroy submodel coupling
public send_atm_to_lnd ! Send data from atm to lnd
public send_lnd_to_atm ! Send data from lnd to atm

type atm2lnd
integer :: lnd_local_id
integer :: cell
end type atm2lnd
type (atm2lnd), dimension(:,:), allocatable :: atm2lnds

type lnd2atm
integer :: atm_local_id
integer :: column
end type lnd2atm
type (lnd2atm), dimension(:,:), allocatable :: lnd2atms

! lnd->atm and atm->lnd send and receive buffers
real, dimension(:), target, & allocatable :: la_sendbuf, la_recvbuf
real, dimension(:), target, & allocatable :: al_sendbuf, al_recvbuf
! lnd->atm send/receive counts and displacements
integer, dimension(:), allocatable :: la_blkcnts
integer, dimension(:), allocatable :: la_sndcnts, & la_rcvcnts
integer, dimension(:), allocatable :: la_sdispls, & la_rdispls
! atm->lnd send/receive counts and displacements
integer, dimension(:), allocatable :: al_blkcnts
integer, dimension(:), allocatable :: al_sndcnts, & al_rcvcnts
integer, dimension(:), allocatable :: al_sdispls, & al_rdispls

real, dimension(:,:), allocatable :: atm_flux, lnd_flux

subroutine coupling_init()
implicit none
integer :: d, c, p, nlnd_domains, ncells, & max_ncells = 0, &
lnd_owner, atm_dom_cnt, atm_dom_cols, lnd_dom_cnt, & lnd_dom_cells
integer, dimension(:), allocatable :: lons, lats, & atm_ids, columns, atm_owners

allocate(la_blkcnts(0:nproc-1), la_sndcnts(0:nproc-1), &
la_rcvcnts(0:nproc-1), al_blkcnts(0:nproc-1), &
al_sndcnts(0:nproc-1), al_rcvcnts(0:nproc-1))
la_blkcnts(:) = 0
la_sndcnts(:) = 0
la_rcvcnts(:) = 0
al_blkcnts(:) = 0
al_sndcnts(:) = 0
al_rcvcnts(:) = 0

nlnd_domains = get_nlnd_domains()
do d = 1, nlnd_domains
ncells = get_lnd_ncells_id(d)
if (ncells > max_ncells) max_ncells = ncells
end do
allocate(lons(max_ncells), lats(max_ncells), &
atm_ids(max_ncells), columns(max_ncells), &

! Count up cells and columns on each process
do d = 1, nlnd_domains
lnd_owner = get_lnd_owner_id(d)
ncells = get_lnd_ncells_id(d)
call get_lnd_coord_id(d, ncells, lons, lats)
call get_atm_owner_coord(ncells, lons, lats, &
atm_ids, columns, atm_owners)
do c = 1, ncells
if (lnd_owner == iam) then
p = atm_owners(c)
la_blkcnts(p) = la_blkcnts(p) + 1
end if
if (atm_owners(c) == iam) then
p = lnd_owner
al_blkcnts(p) = al_blkcnts(p) + 1
end if
end do
end do

! Allocate the coupling structures
allocate(atm2lnds(0:nproc-1, 1:maxval(al_blkcnts)), &
lnd2atms(0:nproc-1, 1:maxval(la_blkcnts)))
atm2lnds(:,:)%lnd_local_id = 0
atm2lnds(:,:)%cell = 0
lnd2atms(:,:)%atm_local_id = 0
lnd2atms(:,:)%column = 0

! Actually fill the coupling structures
la_blkcnts(:) = 0
al_blkcnts(:) = 0
do d = 1, nlnd_domains
lnd_owner = get_lnd_owner_id(d)
ncells = get_lnd_ncells_id(d)
call get_lnd_coord_id(d, ncells, lons, lats)
call get_atm_owner_coord(ncells, lons, lats, &
atm_ids, columns, atm_owners)
do c = 1, ncells
if (lnd_owner == iam) then
p = atm_owners(c)
la_blkcnts(p) = la_blkcnts(p) + 1
atm2lnds(p,la_blkcnts(p))%lnd_local_id = &
atm2lnds(p,la_blkcnts(p))%cell = c
end if
if (atm_owners(c) == iam) then
p = lnd_owner
al_blkcnts(p) = al_blkcnts(p) + 1
lnd2atms(p,al_blkcnts(p))%atm_local_id = &
lnd2atms(p,al_blkcnts(p))%column = columns(c)
end if
end do
end do

deallocate(lons, lats, atm_ids, columns, atm_owners)

al_sndcnts(:) = al_blkcnts(:)
la_rcvcnts(:) = al_blkcnts(:)
la_sndcnts(:) = la_blkcnts(:)
al_rcvcnts(:) = la_blkcnts(:)

allocate(al_sendbuf(0:sum(al_sndcnts)-1), &
allocate(la_sendbuf(0:sum(la_sndcnts)-1), &
allocate(la_sdispls(0:nproc-1), la_rdispls(0:nproc-1), &
al_sdispls(0:nproc-1), al_rdispls(0:nproc-1))

! Build displacements for data exchange
la_sdispls(0) = 0
la_rdispls(0) = 0
al_sdispls(0) = 0
al_rdispls(0) = 0
do p = 1,nproc-1
la_sdispls(p) = la_sdispls(p-1) + la_sndcnts(p-1)
la_rdispls(p) = la_rdispls(p-1) + la_rcvcnts(p-1)
al_sdispls(p) = al_sdispls(p-1) + al_sndcnts(p-1)
al_rdispls(p) = al_rdispls(p-1) + al_rcvcnts(p-1)
end do

! Create surface fluxes
call get_atm_flux_shape(atm_dom_cnt, atm_dom_cols)
call get_lnd_flux_shape(lnd_dom_cnt, lnd_dom_cells)
allocate(atm_flux(atm_dom_cnt, atm_dom_cols))
allocate(lnd_flux(lnd_dom_cnt, lnd_dom_cells))

end subroutine coupling_init
subroutine coupling_finalize()
implicit none

deallocate(la_blkcnts, la_sndcnts, la_rcvcnts, &
al_blkcnts, al_sndcnts, al_rcvcnts)
deallocate(atm2lnds, lnd2atms)
deallocate(al_sendbuf, la_recvbuf, &
la_sendbuf, al_recvbuf)
deallocate(la_sdispls, la_rdispls, &
al_sdispls, al_rdispls)
deallocate(atm_flux, lnd_flux)

end subroutine coupling_finalize
subroutine send_atm_to_lnd()
implicit none
integer :: p, n, atm_id, column, lnd_id, cell, &
lnd_dom_cnt, lnd_dom_cells

call get_atm_surface_flux(atm_flux)

do p = 0, nproc-1
do n = 1, al_blkcnts(p)
atm_id = lnd2atms(p,n)%atm_local_id
column = lnd2atms(p,n)%column
al_sendbuf(al_sdispls(p)+(n-1)) = atm_flux(atm_id, column)
end do
end do

call comm_real_exchange(al_sendbuf, al_sndcnts, &
al_sdispls, al_recvbuf, al_rcvcnts, al_rdispls)

do p = 0, nproc-1
do n = 1, la_blkcnts(p)
lnd_id = atm2lnds(p,n)%lnd_local_id
cell = atm2lnds(p,n)%cell
lnd_flux(lnd_id, cell) = al_recvbuf(al_rdispls(p)+(n-1))
end do
end do

call update_lnd_flux(lnd_flux)

end subroutine send_atm_to_lnd
subroutine send_lnd_to_atm()

end subroutine send_lnd_to_atm
end module coupling

This module uses the process rank and process count from the communications wrapper module, as well as a subroutine called comm_real_exchange() that performs an MPI_Alltoallv() to exchange data among processes. In addition, coupling uses all of the data structure access interfaces from atm and lnd to build its internal mapping. Initialization and finalization routines are provided along with two subroutines for exchanging fluxes between the atmosphere and the land: send_atm_to_lnd() and send_lnd_to_atm().

Two private, derived types are defined to provide the mapping: atm2lnd and lnd2atm. Two-dimensional arrays of these data types are defined and allocated in the initialization routine. Defined below these are allocatable buffers for data exchange, block counts, send/receive counts, and displacements into the buffers. These counts and displacements are needed by the call to MPI_Alltoallv(), so that it knows how many items in the buffer to send to each process and how far into the buffer to reach to get those items. Local arrays for atmosphere and land fluxes are also created since the coupling routine should not be allowed direct access to the surface_flux arrays in either submodel.

The coupling_init() subroutine allocates memory for all the block sizes and item counts, and initializes them to zero. Then it calls get_nlnd_domains() to obtain the number of land domains from the lnd module. These domains are then stepped through in order to discover what the maximum cell count is for any one domain. This requires multiple calls to get_lnd_ncells_id(), which checks the specified domain to find out out how many cells it contains. The maximum cell count is used to allocate memory for the local variables lons, lats, atm_ids, columns, and atm_owners which are used in the next loop.

The next loop steps through all the land domains again, this time counting up the number of cells and columns assigned to each process. The get_lnd_owner_id() routine returns the id of the process owning the domain in question. The number of cells for this domain is obtained by calling get_lnd_ncells_id(), as in the previous loop. Then get_lnd_coord_id() is called to retrieve (from the lnd module) the longitude and latitude indices (stored into lons and lats respectively) given the domain number (d) and cell number (c).

This batch of coordinates is then passed to get_atm_ owner_coord() along with ncells to obtain the corresponding atmosphere domain numbers (atm_ids), the column numbers in the domains (columns), and the processes owning the domains (atm_owners) from the atm module. Next, every cell in the batch is examined. If the land cell is owned by the running process (lnd_owner == iam), then the block count for items to be sent from the land to the atmosphere for the atmosphere process owning the corresponding column is incremented. Likewise, if the column is owned by the running process (atm_owners(c) == iam) then the block count for items to be sent from the atmosphere to the land for the land process owning the corresponding cell is incremented.

Now that the number of blocks to be exchanged in both directions is known, the two mapping data structures are allocated to the correct size and initialized to zero. The next loop, again over all land domains, looks the same as the previous loop, except that this time the atm2lnds and lnd2atms data structures are filled. It’s somewhat unfortunate the same loop is executed twice, but reallocating memory in Fortran 90 is not possible; you must count up the desired sizes of data structures, allocate them, and then fill them in a second loop.

Notice that these loops were over every cell in every land domain. That’s because only the grid points containing land are relevant to the coupling. If the loops had been over every column in every atmosphere domain, two thirds of them would have been ignored when establishing the mapping.

Next, the local batch variables are deallocated. Since only one flux value will be sent in either direction for any grid point, the four send and receive counts are set equal to the four block counts. If, on the other hand, five fluxes were sent for each grid point, the send/receive counts would be five times the block counts. Next, the send and receive buffers are allocated along with the send and receive displacement vectors. These displacements are then constructed by counting up the number of items sent to/received from each process.

Finally, the local flux arrays must be allocated. The sizes of the atmosphere and land surface flux arrays are obtained by calling get_atm_flux_shape() and get_lnd_flux_shape(), respectively. These sizes are subsequently used to allocate atm_flux and lnd_flux before coupling_init returns.

The coupling_finalize() routine deallocates all the memory set aside by coupling_init(). The send_atm_to_lnd() subroutine first calls get_atm_s urface_flux() to obtain a local copy of the flux from the atmosphere, and then uses the lnd2atms data structure to disassemble this flux array into a send buffer. Next, comm_real_exchange() is called in order to send data from all processes to all processes. Then the receive buffer is disassembled into lnd_flux, the local copy of surface flux for the land, using the atm2lnds data structure. The new land flux is then passed to the lnd module by calling update_lnd_flux(). The send_lnd_to_atm() subroutine, which is not shown in Listing Three, does the same thing as send_atm_to_lnd() in reverse.

Notice that the counts and displacements used in the calls to comm_real_exchange() were already established in coupling_init(). Once initialized, these counts and displacements don’t need to be recomputed every time a data exchange occurs. This speeds up the processing involved in exchanging data between the atmosphere and the land.

The comm_interface Module

Listing Four contains the code for the comm_interface module. Except for the initialization and finalization routines, you’ve already seen where these subroutines and the two variables (iam and nproc) are used. Since all the real message passing calls are isolated in this relatively small module, this is the only file which must be changed when transitioning to another message passing API.

Listing Four: The comm_interface module, comm_interface.f90

module comm_interface

implicit none
include “mpif.h”
public comm_init ! Initialize MPI
public comm_barrier ! MPI barrier
public comm_bcast_2d_int ! MPI broadcast 2d integers
public comm_real_exchange ! MPI_Alltoallv()
public comm_finalize ! Terminate MPI
integer, public :: iam
integer, public :: nproc
integer :: mpicom


subroutine comm_init()
implicit none
integer :: ier

call mpi_init(ier)
call mpi_comm_rank(mpicom, iam, ier)
call mpi_comm_size(mpicom, nproc, ier)

end subroutine comm_init
subroutine comm_barrier()
implicit none
integer :: ier

call mpi_barrier(mpicom, ier)

end subroutine comm_barrier
subroutine comm_bcast_2d_int(buffer, count, root)
implicit none
integer, intent(inout) :: buffer(:,:)
integer, intent(in) :: count, root
integer :: ier

call mpi_bcast(buffer, count, MPI_INTEGER, root, mpicom, ier)

end subroutine comm_bcast_2d_int
subroutine comm_real_exchange(sendbuf, sndcnts, sdispls, &
recvbuf, rcvcnts, rdispls)
implicit none
real, intent(inout) :: sendbuf(:), recvbuf(:)
integer, intent(in) :: sndcnts, sdispls, rcvcnts, rdispls
integer :: ier

call mpi_alltoallv(sendbuf, sndcnts, sdispls, MPI_REAL, &
recvbuf, rcvcnts, rdispls, MPI_REAL, mpicom, ier)

end subroutine comm_real_exchange
subroutine comm_finalize()
implicit none
integer :: ier

call mpi_finalize(ier)

end subroutine comm_finalize

end module comm_interface

The Main Program Unit

Now that all the necessary modules have been constructed, a high level program can simply use these routines as needed for a coupled model. Listing Five contains a program that uses these modules to test the data exchange between submodels.

Listing Five: The main program, model-test.f90

program model-test

use comm_interface, only: comm_init, comm_finalize
use grid, only: land_mask_init
use atm, only: atm_domain_init, atm_flux_fill, &
atm_domain_print, step_atm, atm_domain_finalize
use lnd, only: lnd_domain_init, lnd_flux_fill, &
lnd_domain_print, step_lnd, lnd_domain_finalize
use coupling, only: coupling_init, send_atm_to_lnd, &
send_lnd_to_atm, coupling_finalize
implicit none
integer :: time_step

call comm_init()
call land_mask_init()
call atm_domain_init()
call lnd_domain_init()
call coupling_init()

! Test sending from atmosphere to land
call atm_flux_fill()
call atm_domain_print()
call send_atm_to_lnd()
call lnd_domain_print()

! Test sending from land to atmosphere
!call lnd_flux_fill()
!call lnd_domain_print()
!call send_lnd_to_atm()
!call atm_domain_print()

call coupling_finalize()
call lnd_domain_finalize()
call atm_domain_finalize()
call comm_finalize()

end program model-test

First, communications are initialized with comm_init(), the wrapper for initializing MPI. Next, the land mask is loaded by calling land_mask_init(), and the atmosphere and land domain decompositions are performed through calls to atm_domain_init() and lnd_domain_init(). After the submodel domain decompositions have been performed, the coupling interface can be initialized by calling coupling_init().

To test the coupling, we fill up the atmosphere’s surface_flux array by calling atm_flux_fill(), and then save its coordinates and fluxes to a file by calling atm_domain_ print(). Then send_atm_to_lnd() is called to send the relevant flux values to the land submodel, and lnd_domain_ print() is called to save the coordinates and fluxes from the land submodel to a file. The next four calls are commented out, but may be uncommented to test sending from the land surface to the atmosphere. Only one test should be performed at a time. Finally, the coupling, the land submodel, the atmosphere submodel, and the communications are all finalized.

This program unit appears quite short. That’s because we created clean interfaces to all components in the model. These coding practices greatly simplify model maintenance and coupling.

When compiled and run, this code creates two output files, the first parts of which are shown in Figure One. As you can see, the coupled data exchange worked correctly because the bogus flux values contained in lnd.out match the longitude and latitude coordinates of the land cells. Passing flux values from the land to the atmosphere can be tested similarly by commenting out the first test, uncommenting the second test in model-test.f90, recompiling, and running the code again.

Figure One: Some output from the coupling program

[forrest@solar model_coupling]$ head atm.out
0 domain_id= 1 column= 1 lon= 1 lat= 1 flux= 1001.
1 domain_id= 2 column= 1 lon= 2 lat= 1 flux= 2001.
2 domain_id= 3 column= 1 lon= 3 lat= 1 flux= 3001.
3 domain_id= 4 column= 1 lon= 4 lat= 1 flux= 4001.
0 domain_id= 5 column= 1 lon= 5 lat= 1 flux= 5001.
1 domain_id= 6 column= 1 lon= 6 lat= 1 flux= 6001.
2 domain_id= 7 column= 1 lon= 7 lat= 1 flux= 7001.
3 domain_id= 8 column= 1 lon= 8 lat= 1 flux= 8001.
0 domain_id= 9 column= 1 lon= 9 lat= 1 flux= 9001.
1 domain_id= 10 column= 1 lon= 10 lat= 1 flux= 10001.
[forrest@solar model_coupling]$ head lnd.out
0 domain_id= 1 cell= 1 lon= 17 lat= 1 flux= 17001.
1 domain_id= 2 cell= 1 lon= 18 lat= 1 flux= 18001.
2 domain_id= 3 cell= 1 lon= 19 lat= 1 flux= 19001.
3 domain_id= 4 cell= 1 lon= 20 lat= 1 flux= 20001.
0 domain_id= 5 cell= 1 lon= 21 lat= 1 flux= 21001.
1 domain_id= 6 cell= 1 lon= 22 lat= 1 flux= 22001.
2 domain_id= 7 cell= 1 lon= 23 lat= 1 flux= 23001.
3 domain_id= 8 cell= 1 lon= 24 lat= 1 flux= 24001.
0 domain_id= 9 cell= 1 lon= 25 lat= 1 flux= 25001.
1 domain_id= 10 cell= 1 lon= 26 lat= 1 flux= 26001.

A more realistic program unit would have a time stepping loop which would step the atmosphere physics one or more times, send fluxes from the atmosphere to the land, step the land physics, and send the fluxes from the land back to the atmosphere. The two test blocks of code in Listing Five could be replaced (keeping the initialization and finalization blocks) with the following lines:

do time_step = 0,1000
call step_atm()
call send_atm_to_lnd()
call step_lnd()
call send_lnd_to_atm()
end do

Each iteration of the loop progresses the atmosphere submodel one time step (step_atm()), sends fluxes from the atmosphere to the land (send_atm_to_lnd()), progresses the land submodel one time step (step_lnd()), and sends fluxes from the land to the atmosphere (send_lnd_to_atm()). This program is called model.f90, and is available with all the other source code on the web site.

Obviously, coupling parallel models is a complex business. That’s why coupling toolkits and frameworks are such a good idea. Nevertheless, good coding practices and consistent programming style go a long way toward easing the burden of coupling and porting large and complex simulation codes.

Forrest Hoffman is a computer modeling and simulation researcher at Oak Ridge National Laboratory. He can be reached at forrest@climate.ornl.gov

Comments are closed.