As clusters grow and computational applications scale to more and more processors, input and output (I/O) frequently becomes a performance-limiting bottleneck. While hardware manufacturers continue to improve I/O bandwidth between memory and disk, the distributed memory environment of Linux clusters poses unique challenges for obtaining high performance I/O.
Parallel and cluster file systems attempt to provide the needed scalability by distributing file data on disks around the cluster. In addition, parallel I/O is accessible to applications through MPI-IO, the I/O layer specified in the MPI-2 standard. However, most model developers want a high-level interface that can deal with the details of collective I/O and distributed file systems no matter what they are.
Parallel file systems are finally coming of age, and MPI-IO is available in both MPICH and LAM/MPI, the two most popular MPI implementations for Linux clusters.[ The Parallel Virtual File System (PVFS) and rudimentary MPI-IO use were first discussed in this column in July and August of 2002, available online at http://www.linuxmagazine.com/2002-07/extreme_01.html and http://www.linuxmagazine.com/2002-08/extreme_01.html, respectively.] Now PVFS2 is available for testing, and the ROMIO implementation of MPI-IO is available in MPICH2 (see sidebar). Other file system solutions are also available, including Lustre, the Global File System (GFS) from Red Hat, and CXFS from Silicon Graphics (SGI).
Parallel programmers will not, in general, want to use a programming interface to any one of these file systems since it ties their application to a specific file system for…
Please log in to view this content.
Not Yet a Member?
Register with LinuxMagazine.com and get free access to the entire archive, including: