- Home
- About
- Research
- Education
- News
- Publications
- Guides-new
- Guides
- Introduction to HPC clusters
- UNIX Introduction
- Nova
- HPC Class
- SCSLab
- File Transfers
- Cloud Back-up with Rclone
- Globus Connect
- Sample Job Scripts
- Containers
- Using DDT Parallel Debugger, MAP profiler and Performance Reports
- Using Matlab Parallel Server
- JupyterLab
- JupyterHub
- Using ANSYS RSM
- Nova OnDemand
- Python
- Using Julia
- LAS Machine Learning Container
- Support & Contacts
- Systems & Equipment
- FAQ: Frequently Asked Questions
- Contact Us
- Cluster Access Request
Job accounting
Summary
In order to ensure that all research groups get their fair share of the cluster and to account for differences in hardware being used, we utilize Slurm's built-in job accounting and fairshare system. Since Condo cluster is now under Free Tier model, each group is assigned same Share as all other groups. The Fairshare score of a group is calculated based off of their Share versus the amount of the cluster they have actually used. This Fairshare score is then utilized to assign priority to their jobs relative to other users on the cluster. This keeps individual groups from monopolizing the resources, thus making it unfair to those groups who have not used their fairshare for quite some time.
Usage Reports
Slurm's sacct command provides accounting data for all jobs and job steps. Refer to command's man page for more information (man sacct).
slurm-usage.py command generates CPU usage reports for the specified time frame. Issue "slurm-usage.py -h" to see available options and an example.
Monthly Cluster Usage Reports are placed in /work/<group_working_directory>/ClusterUsage
Multi-Factor Job Priority Plugin
On Condo we use Multi-factor Job Priority plugin.The FairShare algorithm calculates job's priority taking into account multiple factors such as job's age, size, partition, as well as FairShare factor. The following are the weights for these factors:
PriorityType=priority/multifactor
PriorityDecayHalfLife=90-0
PriorityWeightFairshare=1000
PriorityWeightAge=100
PriorityWeightPartition=1000
PriorityWeightJobSize=10
PriorityMaxAge=14-0
PriorityWeightQOS=1
Slurm FairShare factor is mainly based on the ratio of the amount of computing resources the user's jobs has already consumed to the shares of a computing resource that a user/group has been granted. The higher the value, the less shares were used compared to what was granted, and the higher is the placement in the queue.
Job priority can be checked with sprio command. sshare command lists groups' shares.
The following slide deck provides more details about how job priority is calculated: Slurm Priority, Fairshare and Fair Tree .