login_logout

Eliminate Micromanagement of Project Job Queues

Job run tokens are often used in shared grids to ensure a controlled balance of grid resources for each project. As each job is dispatched to the worker node(s) a token is consumed. In a heavily used grid, this ensures that all projects have guaranteed access to their allocated resources. This requires project teams to micromanage their submission rate - they don't want to consume all tokens early in the day and then have no rights to grid capacity later in the day. Micromanaging job submission, dispatching and cancellation are time consuming tasks and require human intervention.

Because SmartSuspend is able to immediately suspend a running job, the need for human micromanagement of job queues is removed. Grid utilization is significantly increased while guaranteeing the project level SLAs, and removes the expensive human from the trivia of managing individual jobs. The key is to sponge spare capacity from the grid while immediately making room for any project job that has a scheduling token.

SmartSuspend accomplishes this efficiency through seamless integration with the queuing system. A lower priority "sponge" queue is created; any job executed from that queue does not consume a project token. A normal priority queue is used for jobs that consume a project token when the job is dispatched. Whenever a job runs from the sponge queue, it is making use of idle resources. When a normal job is ready to run, a sponge job is suspended to allow the normal job to run.

The project team can choose which queue to use at submission time - whether it is guaranteed to run using a project token, or will run only when there are spare resources. Project tokens are therefore used on the project's most important work, with other work being accomplished on a best effort basis. The project team can move jobs between the queues as required - for example towards the end of the day
to promote jobs from the sponge queue to the guaranteed queue to consume remaining tokens.