Job Description
16 days ago
We are seeking an experienced engineer with a strong background in low-level systems programming and optimization to join our growing ML team at Jane Street.
Our team focuses on the performance optimization of machine learning models, encompassing both training and inference. We are interested in efficient large-scale training, low-latency inference in real-time systems, and high-throughput inference in research.
To achieve this, you will work on a whole-systems approach, including storage systems, networking, and host- and GPU-level considerations. This involves improving CUDA performance but also delving deeper into the system to ensure optimal throughput and goodput.
Our ideal candidate has a deep understanding of modern ML techniques and toolsets, as well as the experience and systems knowledge required to debug complex training runs end to end.
Key skills include low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy. Experience with debugging and optimization using tools like CUDA GDB, NSight Systems, and NSight Compute is also essential.
We use a variety of libraries, including Triton, CUTLASS, CUB, Thrust, cuDNN, and cuBLAS. Familiarity with collective algorithms for distributed GPU training in NCCL or MPI is also desired.
Our team is passionate about using innovative approaches and questioning the effectiveness of our methods. You should be fluent in English and comfortable working in a dynamic environment.
Please note, this job description is intended for Jane Street's internal use only.
Our team focuses on the performance optimization of machine learning models, encompassing both training and inference. We are interested in efficient large-scale training, low-latency inference in real-time systems, and high-throughput inference in research.
To achieve this, you will work on a whole-systems approach, including storage systems, networking, and host- and GPU-level considerations. This involves improving CUDA performance but also delving deeper into the system to ensure optimal throughput and goodput.
Our ideal candidate has a deep understanding of modern ML techniques and toolsets, as well as the experience and systems knowledge required to debug complex training runs end to end.
Key skills include low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy. Experience with debugging and optimization using tools like CUDA GDB, NSight Systems, and NSight Compute is also essential.
We use a variety of libraries, including Triton, CUTLASS, CUB, Thrust, cuDNN, and cuBLAS. Familiarity with collective algorithms for distributed GPU training in NCCL or MPI is also desired.
Our team is passionate about using innovative approaches and questioning the effectiveness of our methods. You should be fluent in English and comfortable working in a dynamic environment.
Please note, this job description is intended for Jane Street's internal use only.
More jobs from Jane Street
Software Engineer
Central and Western, Hong Kong
7 days ago
Full Time
Onsite
Technology, Information and Media
Cybersecurity Detection and Response Analyst
Jane Street
Cybersecurity
Central and Western, Hong Kong
7 days ago
Full Time
Onsite
Technology, Information and Media
**Senior Machine Learning Architect**
Jane Street
Software Engineer
Central and Western, Hong Kong
7 days ago
Full Time
Onsite
Technology, Information and Media
Financial Reporting Accountant
Jane Street
Cybersecurity
Central and Western, Hong Kong
7 days ago
Full Time
Onsite
Technology, Information and Media
Cybersecurity Detection and Response Analyst
Jane Street
Cybersecurity
Central and Western, Hong Kong
7 days ago
Full Time
Onsite
Technology, Information and Media
More jobs like this
🎉 Got an interview?