Job Description
20 days ago
Position Overview
We are dedicated to building scalable, resilient, and secure infrastructure for our growing suite of applications. As a Senior DevOps Engineer, you will architect and maintain the infrastructure that powers our AI solutions while optimizing and automating deployment processes. Your role will span the development lifecycle—from automation and deployment to monitoring and incident management—collaborating closely with software engineers to ensure our environments are efficient, secure, and resilient.
Key Responsibilities
• Architect, deploy, and maintain cloud infrastructure across AWS, Digital Ocean, and GPU-optimized hosting providers.
• Manage Infrastructure as Code (IaC) with Terraform for efficient and reliable infrastructure scaling.
• Build and optimize CI/CD pipelines using GitHub Actions for seamless, automated deployments.
• Containerize applications with Docker and manage orchestration with Kubernetes and ArgoCD, supporting large-scale, high-availability deployments (50+ container environments).
• Drive observability initiatives, including the implementation of monitoring, logging, and tracing solutions, using Kibana, Sentry, and other relevant tools to provide real-time insights into system performance and reliability.
• Collaborate with the team to troubleshoot and resolve production issues, ensuring uptime and responsiveness.
• Strengthen security across our infrastructure through best practices in access control, vulnerability management, and threat response.
• Oversee network architecture to support low-latency, secure connectivity across services and troubleshoot networking issues as needed.
Qualifications
• 3+ years of experience in a DevOps or Site Reliability Engineering role, with a proven record of working on large-scale deployments (50+ container environments).
• Expertise in observability tools for monitoring, logging, and tracing, particularly Kibana and Sentry.
• Proficiency in IaC tools, especially Terraform.
• Hands-on experience with Docker, Kubernetes, and ArgoCD for containerization and orchestration.
• Strong experience with CI/CD pipelines and version control, ideally GitHub Actions.
• Proficiency in working with major cloud providers (AWS, GCP, Azure), along with Digital Ocean and GPU-optimized hosting services.
• Strong understanding of networking principles, cybersecurity best practices, and scalability in cloud environments.
• Knowledge of JavaScript, Golang, and Python for infrastructure scripting.
• Experience with PostgreSQL and Redis is a plus
• Good command in English and Chinese (written and spoken).
GOOD TO HAVE
• Previous experience in an AI or machine learning-focused company, especially with infrastructure designed for high-performance GPU computing.
• Ability to manage multi-cloud environments and build cloud-agnostic solutions.
How To Apply
• Interested candidates should press APPLY or send your resume to [via CTgoodjobs Apply Now]
All personal data is collected for recruitment purposes only.
All applications applied through our system will be delivered directly to the advertiser and privacy of personal data of the applicant will be ensured with security.
We are dedicated to building scalable, resilient, and secure infrastructure for our growing suite of applications. As a Senior DevOps Engineer, you will architect and maintain the infrastructure that powers our AI solutions while optimizing and automating deployment processes. Your role will span the development lifecycle—from automation and deployment to monitoring and incident management—collaborating closely with software engineers to ensure our environments are efficient, secure, and resilient.
Key Responsibilities
• Architect, deploy, and maintain cloud infrastructure across AWS, Digital Ocean, and GPU-optimized hosting providers.
• Manage Infrastructure as Code (IaC) with Terraform for efficient and reliable infrastructure scaling.
• Build and optimize CI/CD pipelines using GitHub Actions for seamless, automated deployments.
• Containerize applications with Docker and manage orchestration with Kubernetes and ArgoCD, supporting large-scale, high-availability deployments (50+ container environments).
• Drive observability initiatives, including the implementation of monitoring, logging, and tracing solutions, using Kibana, Sentry, and other relevant tools to provide real-time insights into system performance and reliability.
• Collaborate with the team to troubleshoot and resolve production issues, ensuring uptime and responsiveness.
• Strengthen security across our infrastructure through best practices in access control, vulnerability management, and threat response.
• Oversee network architecture to support low-latency, secure connectivity across services and troubleshoot networking issues as needed.
Qualifications
• 3+ years of experience in a DevOps or Site Reliability Engineering role, with a proven record of working on large-scale deployments (50+ container environments).
• Expertise in observability tools for monitoring, logging, and tracing, particularly Kibana and Sentry.
• Proficiency in IaC tools, especially Terraform.
• Hands-on experience with Docker, Kubernetes, and ArgoCD for containerization and orchestration.
• Strong experience with CI/CD pipelines and version control, ideally GitHub Actions.
• Proficiency in working with major cloud providers (AWS, GCP, Azure), along with Digital Ocean and GPU-optimized hosting services.
• Strong understanding of networking principles, cybersecurity best practices, and scalability in cloud environments.
• Knowledge of JavaScript, Golang, and Python for infrastructure scripting.
• Experience with PostgreSQL and Redis is a plus
• Good command in English and Chinese (written and spoken).
GOOD TO HAVE
• Previous experience in an AI or machine learning-focused company, especially with infrastructure designed for high-performance GPU computing.
• Ability to manage multi-cloud environments and build cloud-agnostic solutions.
How To Apply
• Interested candidates should press APPLY or send your resume to [via CTgoodjobs Apply Now]
All personal data is collected for recruitment purposes only.
All applications applied through our system will be delivered directly to the advertiser and privacy of personal data of the applicant will be ensured with security.
More jobs like this
DevOps Engineer - Cloud Infrastructure Specialist
The Walt Disney Company (APAC)
Central and Western, Hong Kong
🎉 Got an interview?