netCDF consists of application programming interfaces (APIs) and self-describing file formats containing metadata and data all in a single file.
As clusters grow and computational applications scale to more and more processors, input and output (I/O) frequently becomes a performance-limiting bottleneck. While hardware manufacturers continue to improve I/O bandwidth between memory and disk, the distributed memory environment of Linux clusters poses unique challenges for obtaining high performance I/O.
Parallel and cluster file systems attempt to provide the needed scalability by distributing file data on disks around the cluster. In addition, parallel I/O is accessible to applications through MPI-IO, the I/O layer specified in the MPI-2 standard. However, most model developers want a high-level interface that can deal with the details of collective I/O and distributed file systems no matter what they are.
Parallel file systems are finally coming of age, and MPI-IO is available in both MPICH and LAM/MPI, the two most popular MPI implementations for Linux clusters.[ The Parallel Virtual File System (PVFS) and rudimentary MPI-IO use were first discussed in this column in July and August of 2002, available online at http://www.linuxmagazine.com/2002-07/extreme_01.html and http://www.linuxmagazine.com/2002-08/extreme_01.html, respectively.] Now PVFS2 is available for testing, and the ROMIO implementation of MPI-IO is available in MPICH2 (see sidebar). Other file system solutions are also available, including Lustre, the Global File System (GFS) from Red Hat, and CXFS from Silicon Graphics (SGI).
Parallel programmers will not, in general, want to use a programming interface to any one of these file systems since it ties their application to a specific file system for I/O. Instead, MPI-IO should be used to access these file systems, because the MPI-IO implementation provides native access to the file systems anyway. Using MPI-IO makes applications portable to any kind of parallel environment, while providing at least rudimentary parallel I/O.
In many cases, the model developer may prefer to perform I/O using an interface developed specifically for the file format in common use. These domain-specific file programming interfaces are typically written for sequential I/O, often for use only in serial applications. What’s needed is a parallel implementation of such file interfaces, preferably layered on top of MPI-IO so that they can take advantage of its collective operations and its interfaces to parallel, cluster, and global file systems.
Enter parallel netCDF and parallel HDF5, versions of the network Common Data Form and the Hierarchical Data Format, respectively. netCDF was developed by the University Corporation for Atmospheric Research (UCAR) and is maintained by its Unidata branch (http://www.unidata.ucar.edu/packages/netcdf). NetCDF is used by atmospheric scientists and climate modelers. HDF, developed by the National Center for Supercomputing Application (NCSA) at the University of Illinois at Urbana-Champagne (http://hdf.ncsa.uiuc.edu), is commonly used by the satellite and remote sensing communities.
Both HDF and netCDF consist of application programming interfaces (APIs) and self-describing file formats containing metadata and data all in a single file. HDF files contain various objects or files within them making it a more complex and potentially more powerful method for storing data. NetCDF, on the other hand, is simpler. It contains a header where metadata is stored and a subsequent data section containing the array-based data frequently used by earth scientists and modelers. Both formats are binary yet are portable to any computer platform. Only parallel netCDF will be described here.
The Standard netCDF Interface
The netCDF interface and file format has been around for many years. NetCDF data is self-describing since information about the data is contained in the same file. Many conventions have developed describing standard tags and attributes that should be contained in netCDF headers. NetCDF data is architecture-independent because data is represented in a standard binary format called eXternal Data Representation, or XDR.
Data can be accessed directly and efficiently through the netCDF API, and data can be appended to a netCDF dataset at a later time. Normally, one process can write to a netCDF file while multiple processes may simultaneously read the same file. However, for most parallel applications this is insufficient.
The standard API is available for AIX, HPUX, IRIX, Linux, Mac OS, OSF1, SunOS, UNICOS, and various versions of Windows. It can be used with C, C++, FORTRAN, FORTRAN-90, as well as other programming languages.
To use the standard netCDF API in a parallel application, one must usually ship data to a single process that performs all I/O operations. This is inefficient and cumbersome, particularly if the machine on which the process runs has insufficient memory to hold the otherwise-distributed data. In such a case, data must be brought in a block at a time and written to disk, thus slowing computational processing.
Listing One contains an example program called sequential.c that demonstrates how data generated in parallel is typically written to a netCDF file using the standard interface. (As usual for examples, very little error checking is done for the sake of brevity.) This program assumes it will be run on an even number of MPI processes so that data is evenly divided among them.
In sequential.c., MPI is initialized using MPI_Init() just inside main(). The process rank is collected using MPI_Comm_rank(); the number of processes involved in the computation is found using MPI_Comm_size(); and the processor name using MPI_Get_processor_name(). Then the first MPI process creates the netCDF output file using nc_create() and defines all the dimensions, variables, and attributes to be contained in the file. The call to nc_enddef() moves the file from define mode to data mode.