Staff Software Engineer - AI In-Market Engineering
DDN
Date: 1 day ago
City: Remote
Contract type: Full time
Remote

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.
"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC
“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA
DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.
Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.
Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.
Job Description
As a Staff Software Engineer - AI In-Market Engineering, you’ll be the final escalation point for the most complex and critical issues affecting enterprise and hyperscale environments. This hands-on role is ideal for a deep technical expert who thrives under pressure and has a passion for solving distributed system challenges at scale.
You’ll collaborate with Engineering, Product Management, and Field teams to drive root cause resolutions, define architectural best practices, and continuously improve product resiliency. Leveraging AI tools and automation, you’ll reduce time-to-resolution, streamline diagnostics, and elevate the support experience for strategic customers.
Key Responsibilities Technical Expertise & Escalation Leadership
Success Metrics – First 30 Days
Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.
Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:
"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC
“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA
DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.
Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.
Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.
Job Description
As a Staff Software Engineer - AI In-Market Engineering, you’ll be the final escalation point for the most complex and critical issues affecting enterprise and hyperscale environments. This hands-on role is ideal for a deep technical expert who thrives under pressure and has a passion for solving distributed system challenges at scale.
You’ll collaborate with Engineering, Product Management, and Field teams to drive root cause resolutions, define architectural best practices, and continuously improve product resiliency. Leveraging AI tools and automation, you’ll reduce time-to-resolution, streamline diagnostics, and elevate the support experience for strategic customers.
Key Responsibilities Technical Expertise & Escalation Leadership
- Own critical customer case escalations end-to-end, including deep root cause analysis and mitigation strategies.
- Act as the highest technical escalation point for Infinia support incidents — especially in production-impacting scenarios.
- Lead war rooms, live incident bridges, and cross-functional response efforts with Engineering, QA, and Field teams.
- Utilize AI-powered debugging, log analysis, and system pattern recognition tools to accelerate resolution.
- Become a subject-matter expert on Infinia internals: metadata handling, storage fabric interfaces, performance tuning, AI integration, etc.
- Reproduce complex customer issues and propose product improvements or workarounds.
- Author and maintain detailed runbooks, performance tuning guides, and RCA documentation.
- Feed real-world support insights back into the development cycle to improve reliability and diagnostics.
- Partner with Field CTOs, Solutions Architects, and Sales Engineers to ensure customer success.
- Translate technical issues into executive-ready summaries and business impact statements.
- Participate in post-mortems and executive briefings for strategic accounts.
- Drive adoption of observability, automation, and self-healing support mechanisms using AI/ML tools.
- 8+ years in enterprise storage, distributed systems, or cloud infrastructure support/engineering.
- Deep understanding of file systems (POSIX, NFS, S3), storage performance, and Linux kernel internals.
- Proven debugging skills at system/protocol/app levels (e.g., strace, tcpdump, perf).
- Hands-on experience with AI/ML data pipelines, container orchestration (Kubernetes), and GPU-based architectures.
- Exposure to RDMA, NVMe-oF, or high-performance networking stacks.
- Exceptional communication and executive reporting skills.
- Experience using AI tools (e.g., log pattern analysis, LLM-based summarization, automated RCA tooling) to accelerate diagnostics and reduce MTTR.
- Experience with DDN, VAST, Weka, or similar scale-out file systems.
- Strong scripting/coding ability in Python, Bash, or Go.
- Familiarity with observability platforms: Prometheus, Grafana, ELK, OpenTelemetry.
- Knowledge of replication, consistency models, and data integrity mechanisms.
- Exposure to Sovereign AI, LLM model training environments, or autonomous system data architectures.
Success Metrics – First 30 Days
- Technical Ramp-Up
- Complete Infinia training, labs, and architecture deep dives.
- Stand up a fully functioning Infinia test system.
- Shadow at least 5 complex escalations and participate in 2 customer calls.
- Operational Integration
- Lead one live incident response and deliver a full RCA within 48 hours.
- Propose 3+ enhancements to internal tools, AI/automation usage, or documentation.
- Establish key partnerships with Engineering and Field teams.
- Strategic Insight
- Deliver a written 30-day reflection with gaps and high-impact recommendations.
- Begin identifying patterns where AI or automation can reduce MTTR or improve proactive detection.
- MTTR on high-severity cases consistently below internal SLAs.
- Volume and quality of resolved L4 escalations.
- Strategic tooling or automation contributions adopted across the support org.
- Executive-ready RCAs that inform product improvement.
- High-impact engagements with strategic accounts (prevention, performance tuning, etc.).
Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.
Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:
- Coding assessment: Often in a language of your choice.
- Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
- Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
- Meet and greet with the wider team.
- Our goal is to finish the main process in 2-3 weeks at most.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Marketing Director
Page Executive,
Remote
17 hours ago
An exciting opportunity for a Marketing Director to lead and drive the marketing strategy for a high growth B2B software organisation. Owning the strategy, you will play a lead role in supporting the overall international growth of the business.Client DetailsA high growth SAAS organisation with an ambitious plans in the coming years. This is an excellent time to join the...

Fraud & Payments Associate
Underdog,
Remote
4 days ago
We’re Underdog.The fastest-growing sports gaming company – ever.We build innovative games, products, and experiences for American sports fans.We’re here to shake up the fastest growing industry with bold ideas, custom-built tech, and the drive to win.Founded in 2020, our team has built four of today’s most widely played fantasy games and launched the Underdog Sportsbook – built entirely in-house with...

Freelance Marketer - Remote Opportunity
A Life Perfected,
Remote
1 week ago
We’re looking for BIG thinkers that have a burning desire to be better and to be their own boss.About UsWe partner with a global organisation in the booming Personal, Leadership and Self Development space. We have a 20+ year history and a presence in over 100 countries. We help individuals achieve all sorts of goals from personal to professional and...
