GRI proposal funded
Dec 2, 2018
Monitoring Health Status of High Performance Computing Systems
Monitoring data centers is challenging due to their size, complexity, and dynamic nature. This project proposes a visual approach for situational awareness and health monitoring of high-performance computing systems. The visualization requirements are expanded on the following dimensions: 1) High performance computing spatial layout, 2) Temporal domain (historical vs. real-time tracking), and 3) System health services such as temperature, CPU load, memory usage, fan speed, and power consumption. We demonstrate the developed prototype on a medium-scale data center of 10 racks and 467 hosts.
The work was developed in collaboration with both industrial and acadamic domain experts:
- Dr. Yong Chen, Department of Computer Science, Texas Tech University.
- Dr. Alan Sill, Managing Director of HPCC; Co-Director, NSF CAC.
- Jon Hass, SW Architect at Dell Inc.; Chairman of the board, DMTF.
- Ngan Nguyen, PhD student, Department of Computer Science, Texas Tech University.
- Ghazanfar Ali, PhD student. Department of Computer Science, Texas Tech University.