Engineering Manager, Model Serving
Company: Together AI
Location: San Francisco
Posted on: April 2, 2026
|
|
|
Job Description:
Together AI is building the AI Inference & Model Shaping
Platform that brings the most advanced generative AI models to the
world. Our platform powers multi-tenant server-less workloads and
dedicated endpoints, enabling developers, enterprises, and
researchers to harness the latest LLMs, multimodal models, image,
audio, video, and reasoning models at scale. We are looking for an
exceptional Engineering Lead to partner closely with our
cross-functional engineering, infrastructure, research, and sales
teams to ensure excellence of our ML API offerings. Your primary
focus will be on delivering world-class inference and fine-tuning
in our public APIs and customer deployments by building automation
and operations processes. This role is ideal for a highly motivated
and technically adept individual who excels in fast-paced, dynamic
environments. You will be in charge of designing and scaling our ML
processes & tooling at production scale – optimizing operations to
ensure availability and reliability for our services, across
differing tenants and user loads, and in a multi-cluster
deployment. You will serve as a passionate advocate for internal
and external customers, providing feedback to the wider engineering
and infrastructure teams to improve our systems and core business
metrics. If you thrive in a collaborative, problem-solving
environment and are driven to deliver operational excellence, we
encourage you to apply for this exciting opportunity. Key
Responsibilities Own availability and performance SLAs for
production inference and fine-tuning services across serverless and
dedicated deployments Own & improve testing, deployment,
configuration management, and monitoring practices for
multi-cluster ML infrastructure – partnering closely with Infra
SREs Build self-serve tooling and automation to reduce operational
toil and enable self-serve offerings. Define and enforce
configuration best practices for inference engines (SGLang,
TRT-LLM, vLLM etc.) to prevent runtime issues Lead incident
response, conduct postmortems, and drive reliability improvements
Mentor team members and potentially grow into hiring/team building
as the organization scales Partner with infrastructure and ML
engineering teams to improve system reliability and cost efficiency
Required Qualifications 5 years operating production ML inference
or training systems at scale 2 years in senior IC or tech lead
roles, with demonstrated mentorship and technical leadership
experience. Having built or scaled teams is a plus. Deep expertise
with Kubernetes, multi-cluster orchestration, and ML serving
frameworks Experience with multi-tenant SaaS platforms Proven track
record of SLA ownership with specific metrics (99.9% uptime, p99
latency targets) Customer escalation and incident communication
experience Experience with LLM inference serving systems (SGLang,
vLLM, TRT-LLM, or similar) Ability to influence cross-functional
teams and make deployment/architecture decisions Nice to Have
Experience building internal developer platforms or self-serve
tooling Background in cost optimization for GPU infrastructure
Contributions to open-source ML infrastructure projects About
Together AI Together AI is a research-driven artificial
intelligence company. We believe open and transparent AI systems
will drive innovation and create the best outcomes for society, and
together we are on a mission to significantly lower the cost of
modern AI systems by co-designing software, hardware, algorithms,
and models. We have contributed to leading open-source research,
models, and datasets to advance the frontier of AI, and our team
has been behind technological advancement such as FlashAttention,
Hyena, FlexGen, and RedPajama. We invite you to join a passionate
group of researchers in our journey in building the next generation
AI infrastructure. Compensation We offer competitive compensation,
startup equity, health insurance and other competitive benefits.
The US base salary range for this full-time position is: $250,000 -
$300,000 equity benefits. Our salary ranges are determined by
location, level and role. Individual compensation will be
determined by experience, skills, and job-related knowledge. Equal
Opportunity Together AI is an Equal Opportunity Employer and is
proud to offer equal employment opportunity to everyone regardless
of race, color, ancestry, religion, sex, national origin, sexual
orientation, age, citizenship, marital status, disability, gender
identity, veteran status, and more. Please see our privacy policy
at https://www.together.ai/privacy
Keywords: Together AI, San Jose , Engineering Manager, Model Serving, IT / Software / Systems , San Francisco, California