All Jobs/HPC AI Engineer, Frontier, NSCC
A*STAR
A*STAR

HPC AI Engineer, Frontier, NSCC

National Supercomputing Centre

Location

Singapore

Department

National Supercomputing Centre

Posted

2 months before

Full Job Description

About the role

As our HPC AI Engineer, you will be a key expert supporting researchers in leveraging our new supercomputer system for large-scale artificial intelligence. You will support and optimise massive AI application workloads, working with performance engineers to profile AI applications and establish best practices. Your work will directly enable national-scale projects in multimodal AI, healthcare, and AI for Science.


RESPONSIBILITIES

  • Provide HPC and scientific domain advice to users of NSCC systems.
  • Engage and collaborate with new researchers, communities, and disciplines with computationally intensive requirements.
  • Support and optimise large-scale AI application workloads.
  • Work with HPC performance engineers to profile and build performance models of the AI applications and workflows.
  • Design, develop and implement HPC software best practices for AI applications and workflows.
  • Assist in the planning and design of future HPC systems, including benchmarking AI workloads on various platforms and recommending the most suitable architecture for the research community.
  • Analyse system and user job data for efficient resource allocation and management.
  • Develop HPC utilities, dashboards and automated testing tools for NSCC HPC systems.
  • Develop HPC user and best practice guides for NSCC HPC systems.
  • Get up-to-date with scientific domain research development, HPC system and software technology

QUALIFICATIONS

  • Bachelor degree in the field of computer science, computer engineering, or other relevant areas.
  • Proven working knowledge of models and algorithms in at least one area of generative models, computer vision, graph neural networks, or AI for Science applications. 
  • Ideally, 3 years of experience in developing codes for AI training and inference.
  • Experience in setting up AI software stacks, familiar with diversified AI software stacks.
  • Good knowledge in AI application performance optimisation and troubleshooting.
  • Strong programming skills in Python; familiar with C/C++ programming is a plus.
  • Familiar with the working and using of AI frameworks (e.g. PyTorch, Tensorflow, JAX) for research.
  • Familiar with GPU architectures and programming is highly desired.
  • Familiar with Linux environment, scripting languages, profiler and debugger tools.
  • Familiar with HPC job schedulers and container technologies.
  • Familiar with object storage (S3); familiar with HPC storage (Lustre) is a plus.
  • Demonstrated team player with strong problem-solving skills.
  • Demonstrated effective communication skills including the ability to articulate technical concepts to a diverse range of audiences.
  • Demonstrated ability and willingness to contribute novel ideas and approaches in support of the research community
  • Demonstrated passion for continuous learning and exploring new technologies or domains.