工作描述
16 天前
We are seeking an experienced engineer with a strong background in low-level systems programming and optimization to join our growing ML team at Jane Street.
Our team focuses on the performance optimization of machine learning models, encompassing both training and inference. We are interested in efficient large-scale training, low-latency inference in real-time systems, and high-throughput inference in research.
To achieve this, you will work on a whole-systems approach, including storage systems, networking, and host- and GPU-level considerations. This involves improving CUDA performance but also delving deeper into the system to ensure optimal throughput and goodput.
Our ideal candidate has a deep understanding of modern ML techniques and toolsets, as well as the experience and systems knowledge required to debug complex training runs end to end.
Key skills include low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy. Experience with debugging and optimization using tools like CUDA GDB, NSight Systems, and NSight Compute is also essential.
We use a variety of libraries, including Triton, CUTLASS, CUB, Thrust, cuDNN, and cuBLAS. Familiarity with collective algorithms for distributed GPU training in NCCL or MPI is also desired.
Our team is passionate about using innovative approaches and questioning the effectiveness of our methods. You should be fluent in English and comfortable working in a dynamic environment.
Please note, this job description is intended for Jane Street's internal use only.
Our team focuses on the performance optimization of machine learning models, encompassing both training and inference. We are interested in efficient large-scale training, low-latency inference in real-time systems, and high-throughput inference in research.
To achieve this, you will work on a whole-systems approach, including storage systems, networking, and host- and GPU-level considerations. This involves improving CUDA performance but also delving deeper into the system to ensure optimal throughput and goodput.
Our ideal candidate has a deep understanding of modern ML techniques and toolsets, as well as the experience and systems knowledge required to debug complex training runs end to end.
Key skills include low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy. Experience with debugging and optimization using tools like CUDA GDB, NSight Systems, and NSight Compute is also essential.
We use a variety of libraries, including Triton, CUTLASS, CUB, Thrust, cuDNN, and cuBLAS. Familiarity with collective algorithms for distributed GPU training in NCCL or MPI is also desired.
Our team is passionate about using innovative approaches and questioning the effectiveness of our methods. You should be fluent in English and comfortable working in a dynamic environment.
Please note, this job description is intended for Jane Street's internal use only.
更多來自 Jane Street
軟件工程師
中西區, 香港
7 天前
全職
辦公室工作
科技、資訊和媒體
Cybersecurity Detection and Response Analyst
Jane Street
網絡安全
中西區, 香港
7 天前
全職
辦公室工作
科技、資訊和媒體
**Senior Machine Learning Architect**
Jane Street
軟件工程師
中西區, 香港
7 天前
全職
辦公室工作
科技、資訊和媒體
Financial Reporting Accountant
Jane Street
網絡安全
中西區, 香港
7 天前
全職
辦公室工作
科技、資訊和媒體
Cybersecurity Detection and Response Analyst
Jane Street
網絡安全
中西區, 香港
7 天前
全職
辦公室工作
科技、資訊和媒體
更多相似工作
🎉 Got an interview?