Forget the glossy data sheets and single number benchmarks. Get the right information to make the right decisions.
Can I Virtualize I/O?
In a traditional HPC cluster two networks are used to avoid congestion. One network is used for storage and the other is used for MPI (Message Passing Interface) traffic. This method while function fails to take advantage of the true I/O capability of a high performance network. In addition, it adds a layer of cost and complexity (more place for things to fail) to the cluster. One solution to this problem is to place both storage and compute traffic on the same interconnect.
An example of storage/compute convergence is the QLogic VIC (Virtualized I/O Controller) technology. By implementing a unique multi-protocol VIC controller in their InfiniBand switches transparent access to either Fibre Channel or Ethernet networks can be achieved. Figures Two and Three illustrate the VIC technology for a database cluster. HPC clusters can enjoy similar advantages.
Figure Two: Database Clustering and I/O Deployment using Traditional FC and Dual GigE Network
Figure Three: QLogic DB Cluster and Consolidated I/O With VIC, multi-protocol controller
Is there Robust Software and Tool Support?
Support for high performance interconnects has always been a challenge. Producing drivers, diagnostics, and middleware is task that is prone to fragmentation and eventual user frustration. To address this issue, the Open Fabrics Alliance (OFA) was created. Its mission is to develop, distribute, and promote a unified, transport-independent, open-source software stack for RDMA-capable fabrics and networks, including InfiniBand and Ethernet. The open software stack was developed for many hardware architectures and operating systems, including both Linux and Windows. Upper-level protocols in the stack support IP, NAS, SAN, sockets, clustered file systems and database application environments. Support for MPI is also included. The Alliance also manages interoperability testing of the software stack across many vendor platforms and product types.
The OFA stack is the most robust and full featured software suite for any high performance interconnect. Ensuring that your vendor supports the OFA stack will provide more up-time and less configure/debug-time. In addition to providing protocol and MPI support, OFA also provides diagnostic, debugging, sub-net management tools all in one place. Figure Four illustrates the various components of the OFA stack plus enhancements like those from QLogic. (Note; the software is referred to as the OpenFabrics Enterprise Distribution OFED). All items in red are sourced from the OFED release. The optional value-added components from QLogic are in aqua and the community and third-party components are in blue.
Figure Three: OFED+ Components
As the above figure illustrates, vendor enhancements are not uncommon in the OFED stack, however, it is important that the base OFED functionality is available as well. The OFED stack can be broken down in to a number of categories outlined below. It is quite unlikely that any single installation will use all the features of the OFED stack, but the robustness of the suite makes it applicable across a broad range of market segments.
- OpenFabrics core and ULPs:
- IB HCA drivers (mthca, mlx4, ipath, ehca)
- iWARP RNIC driver (cxgb3, nes)
- IB core modules
- Upper Layer Protocols: IPoIB, SDP, SRP Initiator and target,iSER, Initiator and target, RDS, uDAPL, qlgc_vnic and NFS-RDMA
- OpenFabrics utilities:
- OpenSM (OSM): InfiniBand Subnet Manager
- Diagnostic tools
- Performance tests
- OSU MPI stack supporting the InfiniBand and iWARP interface
- Open MPI stack supporting the InfiniBand and iWARP interface
- OSU MVAPICH2 stack supporting the InfiniBand and iWARP interface
- MPI benchmark tests (OSU benchmarks, Intel MPI benchmarks, Presta)
- Extra packages:
- open-iscsi: open-iscsi initiator with iSER support
- ib-bonding: Bonding driver for IPoIB interface
- Sources of all software modules (under conditions mentioned in the modules LICENSE files)
- RPM packages
It should be noted that vendors like QLogic offers a single Linux driver that works with all QLogic adapters based on the TrueScale architecture. In addition, QLogic’s driver is distributed as part of OFED and is included in the Linux Kernel distributions. In addition to OFED, QLogic products provide the following additional optional capabilities:
- A QLogic accelerated MPI stack: This enhancement provides high performance MPI support for QLogic MPI, Open MPI, HP-MPI, Scali MPI, MVaPICH, and MPICH2.
- Support for QLogic’s InfiniBand Fabric Suite (IFS): This set of tools, which includes FastFabric™, QLogic’s Subnet Manager, and Fabric Viewer, simplifies installation and provides extensive troubleshooting capabilities from a centralized location.
- QLogic enhanced versions of the SCSI RDMA Protocol (SRP) and VNIC are also available, along with a variety of host-based utilities.
It is your turn to ask the questions. Selecting an InfiniBand based network is more than just clicking on web sites or reading product data sheets. The real test is your application(s). The questions posed above should help guide you in your quest for InfiniBand performance and beyond.