Share and Share Alike: The State of Cluster Resource Management and Scheduling
It’s time you got your resources in order! Batch processing allows equitable access to the computing resource (making everyone more or less equally unhappy), but it also allows the system administrators to schedule the resource based on the goals and policies of the organization.
In a perfect world, everyone would have their own supercomputer. Sadly, we don’t live in a perfect world, and so clusters and other high performance computing systems tend to be shared resources. Unfortunately, once the number of simultaneous users on a system reaches double digits, scheduling methods that involve yelling down the hall “Is anybody using the machine now?” become impractical. One solution to this is to impose batch processing on the user community. This requires all users to submit their work to a central point of control that handles scheduling access to the system. Batch processing allows equitable access to the computing resource (making everyone more or less equally unhappy), but it also allows the system administrators to schedule the resource based on the goals and policies of the organization.
Resource Managers
First, some basic terminology. A resource is a finite quantity of something that can be used to do work: processors, memory, disk space, even software licenses. A job is a self-contained quantum of work which requires some resources for a period of time, while a job array is a set of of related tasks that can be treated as a unit. A resource manager (also known as a batch system) is software that manages the allocation of resources to jobs. It usually consists of at least two components: a queue server that accepts, classifies, and dispatches jobs; and one or (usually many) more execution servers that spawn jobs and monitors their execution. Most modern resource managers…
Please log in to view this content.
Not Yet a Member?
Register with LinuxMagazine.com and get free access to the entire archive, including: