Sr Director of Engineering - Infinia

DDN

Date: 2 weeks ago

City: Remote

Contract type: Full time

Remote

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC

“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

Job Description

Sr Director of Engineering - Infinia Distributed Platform

We are looking for an experienced and technically driven Director of Engineering to lead the Infinia Distributed Platform organization — the foundational team powering DDN’s flagship AI-native distributed data platform.

In this role, you will oversee engineering teams responsible for the core systems that enable Infinia’s performance, scalability, and reliability at global scale. This includes mission-critical components such as task scheduling, distributed tracing, memory management, SPDK data access, profiling, networking, reliability, distributed locking, internal key-value stores, and filesystem clients — all orchestrated within a multi-tenant, high-throughput environment.

You will define the strategy, scale execution, and mentor engineering leaders to deliver production-grade systems that meet the demands of AI/ML, high-performance computing, and enterprise analytics.

This is a hands-on technical leadership role at the heart of Infinia’s distributed architecture — where decisions today shape how data moves tomorrow.

Key Responsibilities

Core Systems Leadership

Lead and scale multiple engineering teams focused on critical path components of the Infinia platform:

Task scheduling and orchestration
Tracing and observability infrastructure
Memory management and performance tuning
SPDK-based I/O data path
Reliability and fault-tolerance systems
Networking stack optimization and event-driven IO
TDS (Tenant Data Services) and multi-tenant isolation
DLM (Distributed Lock Manager) and concurrency control
Internal KVStore for system metadata and state
FS client for scalable POSIX-like access

Technical Strategy & Execution

Own the end-to-end architecture, roadmap, and execution for all core components.
Guide technical design reviews, enforce performance standards, and align cross-team priorities to platform milestones.
Collaborate with architecture and infrastructure teams to evolve platform interfaces, service contracts, and internal APIs.

Organizational Growth & Team Development

Hire, mentor, and develop engineering managers and senior ICs to build a culture of accountability, innovation, and technical rigor.
Drive a results-oriented mindset focused on high-velocity, high-reliability software delivery.
Set clear goals and foster professional growth through coaching, feedback, and performance management.

Cross-Functional Collaboration

Partner with product management, field engineering, and customer teams to shape feature priorities and ensure core platform needs are anticipated early.
Interface with support and site reliability teams to define SLAs, improve telemetry, and reduce MTTR for platform incidents.
Contribute to platform-wide initiatives in multi-tenancy, fault isolation, observability, and performance benchmarking.

Platform Reliability & Performance

Champion operational excellence across core services — including incident response, regression testing, and release stability.
Optimize memory usage, lock contention, thread scheduling, and task pipelines to deliver microsecond-level performance where required.
Establish strong internal metrics and observability standards to measure system health, responsiveness, and uptime.

Required Qualifications

12+ years of engineering experience in distributed systems, operating systems, or storage platform engineering.
5+ years of experience leading multi-team organizations delivering core systems software in production environments.
Strong expertise in systems programming (C, C++, Rust) and deep knowledge of concurrency, memory models, and network programming.
Proven track record designing and scaling services related to task scheduling, locking, memory, and I/O performance.
Experience managing components at the intersection of infrastructure and application performance, especially in multi-tenant platforms.
Excellent communication, roadmap planning, and cross-functional leadership skills.

Preferred Qualifications

Experience with SPDK, RDMA, DPDK, or high-performance storage stacks.
Knowledge of distributed coordination protocols, key-value stores, or scalable metadata architectures.
Background in AI/ML, HPC, or cloud-native infrastructure (Kubernetes, microservices, etc.).
Familiarity with observability tools (e.g., tracing frameworks, profilers, Prometheus, OpenTelemetry).

Success Metrics – First 30 Days Strategic Alignment

Ramp up on all core components, existing technical challenges, and roadmap priorities.
Meet with team leads and cross-functional partners to assess execution readiness and architectural cohesion.

Early Impact

Identify 2–3 areas for performance optimization, team structure refinement, or architectural alignment.
Deliver a 90-day strategy plan outlining key initiatives across reliability, latency, and scalability.

Team Integration

Build trust and alignment with engineering managers and ICs.
Assess hiring needs and begin shaping the next phase of team growth.

Success Metrics – Beyond 30 Days

Timely, high-quality delivery of core platform milestones aligned to product roadmap.
Improvements in performance, fault-tolerance, and memory/network efficiency across key subsystems.
Clear reduction in escalations, latency spikes, and cross-component coordination complexity.
Team health, engagement, and velocity aligned with long-term technical and business goals.

DDN

Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

Coding assessment: Often in a language of your choice.
Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
Meet and greet with the wider team.
Our goal is to finish the main process in 2-3 weeks at most.

DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

Join us to lead the engineering teams responsible for the very heartbeat of a world-class, AI-native data platform — where every task, trace, and lock matters at scale.

Apply now to shape the foundation of tomorrow’s data intelligence with DDN Infinia.

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Senior Statistical Programmer 1 - UK/EU - Remote

Worldwide Clinical Trials, Remote

1 day ago

Who We AreWe’re a global, midsize CRO that pushes boundaries, innovates and invents because the path to a cure for the world’s most persistent diseases is not paved by those who play it safe. It is built by those who take pioneering, creative approaches and implement them with quality and excellence.We are Worldwide Clinical Trials, and we are a global...

AI Training for Psychology

Outlier, Remote

3 days ago

Help train AI models to become more accurate, relevant, and safe in Psychology!EarningsHourly rate: up to $40 per hour USD, depending on your level of expertiseAbout The OpportunityCutting-Edge Projects: Work on challenging projects that push the boundaries of AI Flexibility: Set your own hours and work remotely from anywhereWeekly payouts: Get paid conveniently on a weekly basisProfessional growth: Gain valuable...

Volunteer Email Marketing Volunteer | Burning Nights CRPS Support

Burning Nights CRPS Support, Remote

1 week ago

Are you an email marketing expert with fantastic writing abilities? Help us engage our audience by creating compelling email/newsletter content to help us support more people affected by a debilitating pain conditionWhat difference will you make?You will join the charity during a challenging period and your expert marketing knowledge and experience will be invaluable to our success. You will have...