Senior ML Performance Engineer
Company: Lemurian Labs
Location: Santa Clara
Posted on: February 18, 2026
|
|
|
Job Description:
Job Description Job Description At Lemurian Labs, we're on a
mission to bring the power of AI to everyone—without leaving a
massive environmental footprint. We care deeply about the impact AI
has on our society and planet, and we're building a rock-solid
foundation for its future, ensuring AI grows sustainably and
responsibly. Because let's face it, what good is innovation if it
doesn't help the world? We are building a high-performance,
portable compiler that lets developers "build once, deploy
anywhere." Yes, anywhere. We're talking about seamless
cross-platform compatibility, so you can train your models in the
cloud, deploy them to the edge, and everything in between—all while
optimizing for resource efficiency and scalability. If the idea of
sustainably scaling AI motivates you and you're excited about
making AI development both powerful and accessible, then we'd love
to have you. Join us at Lemurian Labs, where you can have fun
building the future—without leaving a mess behind. The Role We're
looking for a Senior ML Performance Engineer to architect and lead
our Performance Testing Platform from the ground up. You'll be the
technical authority on how we measure, validate, and optimize the
performance of large language models (Llama 3.2 70B, DeepSeek, and
others) before and after compiler optimization on modern GPU
architectures. This is a high-impact role where you'll directly
influence our product quality and our customers' success. You'll
work at the intersection of ML systems, GPU architecture, and
performance engineering—building the infrastructure that proves our
compiler delivers real value. Here is what you will do: Design and
build a comprehensive performance testing platform for evaluating
LLM inference workloads across GPU clusters Define and implement
the benchmarking methodology, metrics, and test suites that measure
latency, throughput, memory utilization, power consumption, and
model accuracy Establish baseline performance for unoptimized
models (Llama 3.2 70B, DeepSeek, etc.) and validate
post-optimization improvements Develop automated testing pipelines
for continuous performance validation across compiler releases and
model updates Investigate performance bottlenecks using profiling
tools (ROCm profilers, GPU traces, system-level monitoring) and
work with the compiler team to drive optimizations Create
dashboards and reporting that provide clear visibility into
performance trends, regressions, and wins Collaborate
cross-functionally with compiler engineers, ML engineers, and
DevOps to ensure performance testing is integrated into our
development workflow Document best practices for performance
testing and optimization of ML workloads on GPU hardware Essential
Skills and Experience: BS degree in computer science, computer
engineering, electrical engineering, or equivalent practical
experience 7 years of experience in performance engineering,
benchmarking, or systems engineering roles Deep understanding of ML
inference workloads, particularly transformer-based models and LLMs
Hands-on experience with GPU programming and optimization (CUDA,
ROCm, or similar) Strong programming skills in Python and C/C++
Proven track record of building performance testing infrastructure
or benchmarking platforms from scratch Experience with ML
frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM,
etc.) Proficiency with profiling and debugging tools for GPU
workloads Strong analytical skills with the ability to design
experiments, analyze results, and communicate findings clearly
Experience with CI/CD systems and test automation frameworks
Preferred Skills and Experience: Masters or PhD degree in computer
science, computer engineering, electrical engineering, or
equivalent practical experience. Experience with AMD GPUs
(Mi200/Mi300 series) and ROCm ecosystem Knowledge of compiler
optimization techniques and their impact on performance Experience
with distributed inference and multi-GPU workloads Familiarity with
ML model quantization, pruning, and other optimization techniques
Background in high-performance computing or systems-level
optimization Experience with infrastructure-as-code (Kubernetes,
Docker, Terraform) Contributions to open-source ML or systems
projects Personal Attributes: Obsessive about details — you notice
the 2% regression that others miss Self-driven — you take ownership
and don't wait for permission to solve problems Collaborative
mindset — you work well across teams and help others succeed
Passionate about sustainability — you care about making AI more
efficient and environmentally responsible Clear communicator — you
can explain complex technical concepts to both engineers and
stakeholders Salary depends on experience and geographical
location. This salary range may be inclusive of several career
levels and will be narrowed during the interview process based on a
number of factors, such as the candidate's experience, knowledge,
skills, and abilities, as well as internal equity among our team.
Additional benefits for this role may include: equity, company
bonus opportunities, medical, dental, and vision benefits;
retirement savings plan; and supplemental wellness benefits.
Lemurian Labs ensures equal employment opportunity without
discrimination or harassment based on race, color, religion, sex
(including pregnancy, childbirth, or related medical conditions),
sexual orientation, gender identity or expression, age, disability,
national origin, marital or domestic/civil partnership status,
genetic information, citizenship status, veteran status, or any
other characteristic protected by law. EOE
Keywords: Lemurian Labs, San Jose , Senior ML Performance Engineer, IT / Software / Systems , Santa Clara, California