dcsimg

Scalable, Reliable and Open Supercomputer

The HPC cluster universe has shifted decidedly toward blade-based clusters, especially as blade technology has matured.

The HPC cluster universe in recent months has shifted decidedly toward blade-based clusters, especially as blade technology has matured.

In the HPC markets that Appro addresses, the innovation process is rapidly changing. Formerly, innovation was driven mainly by individuals working in single disciplines. Today, the biggest advances often come from multidisciplinary collaborations. For studying disease pathways through the body, knowledge is needed about physics, chemistry, biology, and in some cases, nanotechnology. Designing and manufacturing physical products increasingly requires the ability to look concurrently at interdependent factors to include functional efficiency, manufacturability, serviceability, and cost of production. In response to these changing requirements, HPC on Linux clusters can shrink time to insight, time to market, and time to competitive advantage.

A quick look at an Open Supercomputer Architecture – Appro Xtreme-Xâ„¢

Computational Density, Power and Cooling Efficiency

Appro’s approach is an air-cooled refinement that does not incur the cost of more R&D-intensive approaches found in other systems, such as board component. localized liquid cooling or chilled water cooling through the rack doors. While Appro’s blades can be placed in a standard rack, Appro’s directed air-cooling configuration uses a custom rack that is designed to save 30% in floor space compared with standard rack placement while cooling the system more efficiently. The key design elements of Appro’s directed air-cooling approach include custom racks paired in rows and married through a sealed, back-side enclosure; front-side, easy-access blades for maintenance; and an under-floor, back-to-front, directed chilled-air supply.

Flexible, High Interconnect Bandwidth, Large Memory Configurations

Appro’s Xtreme-X Supercomputer is based on quad-core Intel Xeon processors. The dual-socket blades allow up to 16 units per subrack with four subracks per rack/ 512 cores per 42U rack configuration. On the interconnect side, Appro gives buyers the option of interconnecting its Xtreme-X blade solution subracks in a fat-tree or 3D Torus to match their particular workloads. This is accomplished with pre-allocated space in each subrack for two standard 24-port InfiniBand switches. The blade’s front access panel includes two blade-integrated, 4x DDR InfiniBand ports (the second port is optional on the Intel blade), dual GigE, and USB ports. There is also a PCI Express 8x slot that accommodates any PCI Express.based adapter. While the blades local to each subrack assembly are fully switched within it in a fat-tree, the subrack switches are used as edge switches to further interconnect the subracks themselves into a fat-tree or a 3D torus system. The default fat-tree interconnect provides redundant, dual-rail bandwidth within each subrack and single-rail bandwidth between subracks. A fully dual-rail option is made possible by reducing the number of blades per subrack to 12. The torus configurations are also limited to 12 blades per subrack.

Reliability, Availability, and Serviceability

Through its directly cooled custom racks, redundant power factor.corrected power supplies, and redundant blade-resident fans, Appro has created a hardware layer to support RAS operation. In addition, Appro’s Xtreme-X blade solution organizes these reliable hardware components into a redundant, layered system of servers (management, storage, boot, and compute), networks, and storage peripherals called
the Appro Cluster Management Architecture (ACMA). This fail-resistant organization is animated and managed by Appro Cluster Engine (ACE) software. Both offerings are described in more detail in the following section.

ACE – Appro Cluster Engine Management System is designed specifically for Appro Xtreme-X Supercomputer

Appro’s cluster management system, consists of the Appro Cluster Management Architecture and Appro Cluster Engine software. Each component deserves a closer look to properly present the reliability and manageability features of Appro’s new Xtreme-X Supercomputer Series.

Appro Cluster Management Architecture

The Xtreme-X Supercomputer Series is built up in a carefully layered architecture designed to improve overall system manageability and reliability. A base layer of compute blades is allocated in groups underneath a second-tier layer of redundantly configured I/O blades responsible for file serving and for rapidly booting each compute blade group. The compute and I/O blades are interconnected with both a redundant, dual-rail Ethernet network and a dual-rail InfiniBand network. All of the nodes are connected to a NAS storage system. While the compute blades have local disk, they do not store the root file system; they are used for local, temporary storage only.

Having the NAS storage subsystem support root file systems that are then cached on the I/O nodes provides scalability and allows for rapid, standard Linux installs on large diskless systems (whether virtually or actually diskless). This architecture allows the Appro Xtreme-X Supercomputer to boot 100 or 1,000 blades in about the same time. In a layer above the I/O blades, there is a third tier of management blades from which the I/O blades are booted. The management blades are paired for automatic failover and each keeps a replicated copy of all system information using a Distributed Replicated Block Device (DRBD) architecture. A failure in the master management blade stimulates an automatic transfer of control to its partner

Appro Cluster Engine Software

The layered system architecture described earlier is controlled by a collection of service daemons from the Appro Cluster Engine’s graphical user interface. The daemon components include a server and network manager, a queuing system and scheduler, and an overall cluster manager daemon. In addition to typical job scheduling and workload management capabilities, ACE allows system administrators to define any
number of logical or virtual clusters (differing in size, underlying blade configuration, operating system version, and so forth) through a revision control.like system, and then allows the administrator to allocate resources from the physical cluster to animate any one of them. The ACE provides added flexibility by allowing a single physical resource to be divided at any time into multiple production and test systems that may be changed according to need or time of day. Appro’s management software is designed for “lights out” operation. The GUI is Web based, making it possible to control each Appro Xtreme-X system from any location.

Xtreme-X Supercomputer Ideal environment

For HPC demanding applications that require higher reliability and better system management delivering a balanced highly-available supercomputer cluster building block for a wide range of HPC applications such as Computational Fluid Dynamics, aerospace and automotive engineering simulations, petroleum exploration and production, scientific visualization for oil discovery and recovery, research in seismic, weather and environmental sciences, defense and classified research production.

Conclusion

In summary, designing and manufacturing physical products increasingly requires the ability to look concurrently at interdependent factors, including functional efficiency, manufacturability, serviceability, and cost of production. In response to these changing requirements, Appro Xtreme-X Supercomputer based on open cluster architecture can shrink time to insight, time to market, and time to competitive advantage.

Comments are closed.