- Home
- About
- Research
- Education
- News
- Publications
- Guides-new
- Guides
- Introduction to HPC clusters
- UNIX Introduction
- Nova
- HPC Class
- SCSLab
- File Transfers
- Cloud Back-up with Rclone
- Globus Connect
- Sample Job Scripts
- Containers
- Using DDT Parallel Debugger, MAP profiler and Performance Reports
- Using Matlab Parallel Server
- JupyterLab
- JupyterHub
- Using ANSYS RSM
- Nova OnDemand
- Python
- Using Julia
- LAS Machine Learning Container
- Support & Contacts
- Systems & Equipment
- FAQ: Frequently Asked Questions
- Contact Us
- Cluster Access Request
Nova
The Nova cluster began in 2018 with compute nodes using Intel Skylake Xeon processors, 1.5TB or 11TB of fast NVME local storage and 192GB / 384GB / 3TB of memory. Five of those nodes also have one or two Nvidia Tesla V100-32GB GPU cards. In 2021 the cluster was expanded with nodes using AMD processors, each having two 32-Core AMD EPYC 7502 Processors, 1.5TB of fast NVME local storage and 512GB of memory. The new GPU nodes in addition have four NVidia A100 80GB GPU cards. The 2022 expansion consists of 54 regular compute nodes (with two 32-Core Intel 8358 processors, 1.6TB of local storage and 512GB of memory each) and 5 GPU nodes with two 24-Core AMD EPYC 7413 processors, eight A100 GPU cards, 960GB of local storage and 512GB of memory each.
The three service nodes include login node, data transfer node and the management node.
Large shared storage consists of six file servers and twelve JBODS configured to provide either 338TB of backed up storage or 457TB non-backed up storage per server. Later 7 more file servers have been added. The total storage capacity on Nova is about 5PB.
All nodes and storage are connected via Mellanox EDR (100Gbps) switch.
Some of the above equipment is used for education in a special instruction partition - 8 regular nodes and 3 GPU nodes are dedicated for classes, with additional compute nodes shared between the instruction and scavenger partitions.
Detailed Hardware Specification
Number of Nodes | Processors per Node | Cores per Node | Memory per Node | Interconnect | Local $TMPDIR Disk | Accelerator Card | Job Constraint Flags |
---|---|---|---|---|---|---|---|
72 | Two 18-Core Intel Skylake 6140 | 36 | 192 GB | 100G IB | 1.5 TB | N/A | nova18, intel, skylake, avx512 |
40 | Two 18-Core Intel Skylake 6140 | 36 | 384 GB | 100G IB | 1.5 TB | N/A | nova18, intel, skylake, avx512 |
28 | Two 24-Core Intel Skylake 8260 | 48 | 384 GB | 100G IB | 1.5 TB | N/A | nova18, intel, skylake, avx512 |
2 | Two 18-Core Intel Skylake 6140 | 36 | 192 GB | 100G IB | 1.5 TB | 2x NVIDIA Tesla V100-32GB | nova18, intel, skylake, avx512 |
1 | Two 18-Core Intel Skylake 6140 | 36 | 192 GB | 100G IB | 1.5 TB | one NVIDIA Tesla V100-32GB | nova18, intel, skylake, avx512 |
2 | Two 18-Core Intel Skylake 6140 | 36 | 384 GB | 100G IB | 1.5 TB | 2x NVIDIA Tesla V100-32GB | nova18, intel, skylake, avx512 |
1 | Four 16-Core Intel 6130 | 64 | 3 TB | 100G IB | 11 TB | N/A | nova18, intel, skylake, avx512 |
2 | Four 24-Core Intel 8260 | 96 | 3 TB | 100G IB | 1.5 TB | N/A | nova18, intel, skylake, avx512 |
40 | Two 32-Core AMD EPYC 7502 | 64 | 512 GB | 100G IB | 1.5 TB | N/A | nova21, amd, epyc-7502 |
15 | Two 32-Core AMD EPYC 7502 | 64 | 512 GB | 100G IB | 1.5 TB | four NVidia A100 80GB | nova21, amd, epyc-7502 |
56 | Two 32-Core Intel Icelake 8358 | 64 | 512GB | 100G IB | 1.6TB | N/A | nova22, intel, icelake, avx512 |
5 | Two 24-Core AMD EPYC 7413 | 48 | 512GB | 100G IB | 960GB | eight NVidia A100 80GB | nova22, amd |
16 | Two 32-Core Intel Icelake8358 | 64 | 512GB | 100G IB | 1.5TB | N/A | nova23, intel, icelake, avx512 |
14 | Two 96-Core AMD EPYC 9654 | 192 | 768GB | 100G IB | 1.7TB | N/A | nova24, amd, epyc-9654, avx512 |
3 | Two 96-Core AMD EPYC 9684x | 192 | 768GB | 100G IB | 1.7TB | N/A | nova24, amd, epyc-9684x, avx512 |
HPC group schedules regular maintenances between academic terms to update system software and to perform other tasks that require a downtime.
The date of the next maintenance is listed in the message of the day displayed at login (when ssh-ing to the cluster).
Note: Queued jobs will not start if they cannot complete before the maintenance begins. In the output of the squeue command the reason for those jobs will state (ReqNodeNotAvail, Reserved for maintenance) . The jobs will start after the scheduled outage completes.