Report to Head of Information and Communication Technology (ICT).
Design and build the best-in-class technical infrastructure for delivering AI supercomputing systems according to the required technical architecture, performance, resilience, security, energy efficiency and service levels.
Configure, operate and maintain the storage and data backup infrastructure of the AI supercomputing systems.
Develop and maintain storage and data backup infrastructure policy, standards and procedures of AI supercomputing systems to align with industry standards, frameworks and good practices.
Collaborate with storage and data backup infrastructure suppliers to optimise AI frameworks and their operational environment and libraries to maximise performance and effectiveness for adopting the AI supercomputing systems.
Manage technical infrastructure outsourcing services and their contractual agreements, implement effective measures to improve effectiveness,compliance and quality of the service delivery, manage the relationship with the outsourcing partners.
Define storage and data backup infrastructure technical requirements and interconnect capabilities to optimise for the operations and performance of AI supercomputing systems.
Conduct performance analysis, benchmarking and modeling to identify performance bottlenecks, optimise system parameters and guide the technical infrastructure enhancements.
Evaluate emerging AI supercomputing technologies, including GPU processor, network fabrics, storage and interconnects for the continuous advancement of AI supercomputing systems based on technical requirements, performance characteristics and cost considerations.
Collaborate with subject domain experts to understand the specific requirements of scientific research and data-intensive workloads for using the AI supercomputing systems and propose appropriate technical infrastructure enhancements to manage the workloads.
Provide necessary support to conduct risk assessment, evaluate infrastructure control effectiveness and mitigate associated risks.
Monitor and analyse technical infrastructure events and alerts; identify and respond to infrastructure related risks, incidents and breaches.
Stay up-to-date with the latest advancements in AI supercomputing hardware, software and industry trends to guide future infrastructure design and technology adoption.
Provide appropriate technical guidance, training and support for effective use of the AI supercomputing systems.
Prepare management information, key matrices and reports for continuous improvement.
10+ year proven experience as storage and data backup infrastructure specialist, preferably in AI supercomputing environment.
Experience in AI supercomputing storage and data backup infrastructure design and implementation.
Experience in designing and optimising AI supercomputing infrastructure and systems for business, scientific, research, or data-intensive applications.
Experience in adopting hardware acceleration technologies such as GPU and NPU.
Understanding of performance analysis and optimization techniques for parallel computing, including profiling, tracing, and performance counters.
Familiarity with industry-standard storage interconnect, fabrics design and their impact on performance of AI supercomputing systems.
Knowledge of parallel programming models and frameworks and their application to AI supercomputing workloads.
Experience with AI supercomputing software stack components, such as compilers, runtime systems, job schedulers, and development libraries.
Experience in using deep learning framework such as TensorFlow, PyTorch, Caffee.
Understanding of programming languages such as Python, Java, C++, R and CUDA for building and implementing AI systems.
Good problem-solving abilities and the ability to analyse and address complex performance and scalability challenges.
Ability to adapt to a fast-paced and rapidly evolving technological landscape.
Strong communication and collaboration skills to work effectively with cross-functional teams and subject domain experts.
Proficiency in written and spoken English and Chinese.
Passion with AI technical architecture, infrastructure and systems.
We offer competitive package to the right candidate. Interested party please click " Apply Now" to apply on or before 13 April 2025 in confidence with full resume, stating present and expected salary, and available date and quote the reference.
Applicants who do not hear from us by 30 April 2025 may assume that their applications are unsuccessful.
Further information about the Cyberport is available at https://cyberport.hk/
Personal data collected will be treated in the strictest confidence and only be used for recruitment-related purpose.




