HPC Platform Engineer

  • Full Time
  • Malabe

Millennium IT

Job Description
HPC Platform Engineer
The firm is developing a cutting-edge high-performance computing (HPC) platform to support our portfolio managers, developers, quantitative analysts, and data scientists, enabling seamless scaling of compute capabilities both on-premise and in the cloud. We seek a senior, hands-on engineer who is customer-focused and an advocate for customer-driven solutions. The ideal candidate will have a strong understanding of physical and cloud-based infrastructure, experience in automating infrastructure, and proficiency in service and infrastructure lifecycle management. They will engage with teams to understand their requirements, drive development for our HPC platforms, and collaborate with other teams for integration. The candidate should also have expertise in Linux systems administration, container orchestration, networking, security, and infrastructure-as-code. Experience integrating, testing, and optimizing the integration of HPC with storage and data platforms is also essential.

Principal Responsibilities

Collaborate within a customer-focused team to design, develop, test, and deploy HPC infrastructure in alignment with business needs.
Foster strong relationships with quantitative, software engineering, and data science teams to ensure the HPC Platforms effectively meet their requirements.
Engage with business units to promote understanding and drive adoption of our HPC offerings.
Qualifications/Desired Skills

Deep understanding of Linux operating systems, with substantial practical experience in performance tuning, specifically related to HPC workloads.
Experience consulting with business units around the execution of HPC workloads
Experience with HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS
Experience with dynamically scaling, partitioning, and resource management within HPC environments
Experience with and a strong understanding of containers and container orchestration, Kubernetes, container runtimes, etc.
Experience contributing to a shared code base, including infrastructure as code.
Experience with configuration management and automation tools, such as Chef, Ansible, Salt, Packer
Experience with building monitoring and alerting on logs and metrics
Excellent written and verbal communications
Excellent troubleshooting and analytical skills
Self-starter able to execute independently, on a deadline, and under pressure

To apply for this job email your details to cv@ezjobs.online

Scroll to Top