工作描述
16 天前
We are seeking an experienced engineer with a strong background in low-level systems programming and optimization to join our growing ML team at Jane Street.
Our team focuses on the performance optimization of machine learning models, encompassing both training and inference. We are interested in efficient large-scale training, low-latency inference in real-time systems, and high-throughput inference in research.
To achieve this, you will work on a whole-systems approach, including storage systems, networking, and host- and GPU-level considerations. This involves improving CUDA performance but also delving deeper into the system to ensure optimal throughput and goodput.
Our ideal candidate has a deep understanding of modern ML techniques and toolsets, as well as the experience and systems knowledge required to debug complex training runs end to end.
Key skills include low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy. Experience with debugging and optimization using tools like CUDA GDB, NSight Systems, and NSight Compute is also essential.
We use a variety of libraries, including Triton, CUTLASS, CUB, Thrust, cuDNN, and cuBLAS. Familiarity with collective algorithms for distributed GPU training in NCCL or MPI is also desired.
Our team is passionate about using innovative approaches and questioning the effectiveness of our methods. You should be fluent in English and comfortable working in a dynamic environment.
Please note, this job description is intended for Jane Street's internal use only.
Our team focuses on the performance optimization of machine learning models, encompassing both training and inference. We are interested in efficient large-scale training, low-latency inference in real-time systems, and high-throughput inference in research.
To achieve this, you will work on a whole-systems approach, including storage systems, networking, and host- and GPU-level considerations. This involves improving CUDA performance but also delving deeper into the system to ensure optimal throughput and goodput.
Our ideal candidate has a deep understanding of modern ML techniques and toolsets, as well as the experience and systems knowledge required to debug complex training runs end to end.
Key skills include low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy. Experience with debugging and optimization using tools like CUDA GDB, NSight Systems, and NSight Compute is also essential.
We use a variety of libraries, including Triton, CUTLASS, CUB, Thrust, cuDNN, and cuBLAS. Familiarity with collective algorithms for distributed GPU training in NCCL or MPI is also desired.
Our team is passionate about using innovative approaches and questioning the effectiveness of our methods. You should be fluent in English and comfortable working in a dynamic environment.
Please note, this job description is intended for Jane Street's internal use only.
更多来自 Jane Street
软件工程师
中西区, 香港
7 天前
全职
办公室工作
技术、信息和媒体
Cybersecurity Detection and Response Analyst
Jane Street
网络安全
中西区, 香港
7 天前
全职
办公室工作
技术、信息和媒体
**Senior Machine Learning Architect**
Jane Street
软件工程师
中西区, 香港
7 天前
全职
办公室工作
技术、信息和媒体
Financial Reporting Accountant
Jane Street
网络安全
中西区, 香港
7 天前
全职
办公室工作
技术、信息和媒体
Cybersecurity Detection and Response Analyst
Jane Street
网络安全
中西区, 香港
7 天前
全职
办公室工作
技术、信息和媒体
更多类似工作
🎉 Got an interview?