Senior Site Reliability Engineer

ICEO - Venture Builder


Date: 9 hours ago
City: Warwick
Contract type: Contractor
Remote
Senior Site Reliability Engineer. Remote

Shape the reliability experience of an always-on crypto platform delivering seamless service across North America and Europe.

As an SRE, you'll be leading the charge in designing active-active failover, cross-region routing, and distributed services that are resilient to cloud outages and geopolitical quirks. This is real-world chaos engineering at a global scale.
You’ll lead the creation of a region-aware CI/CD pipeline with canary deployments, automated rollbacks, and feature flags tailored per continent.

Join us remotely, you can be located anywhere in Europe within the CET/CEST time zones, as our work is 100% remote. This is a full-time position.

About us:

ZND is the simple gateway to digital finance, already trusted by thousands of users who have moved over €30+ million through the platform. Fueled by a token raise, we’re rolling out an AI chat assistant for every action and instant credit line on your digital assets.

Our bold vision is to be the place where anyone can trade, earn, ask, and borrow in seconds - crypto made effortless.

What you will be doing:
  • Set and drive SRE strategy – translate business goals into quarterly reliability targets, track progress, and adjust course as needed.
  • Own GCP / GKE architecture – design, implement, and maintain secure, low-latency, highly available clusters across regions.
  • Automate reliability – build self-healing, auto-scaling, and automated incident-response workflows that minimise manual toil.
  • Embed high availability - partner with engineers and product to ship fault-tolerant node.js/JVM services and predictable releases
  • Manage SLIs, and error budgets – define, monitor, report, and continuously improve service reliability metrics.
  • Execute chaos engineering – plan and run automated fault-injection (e.g., Chaos Mesh) to validate resilience before customers are affected.
  • Lead incidents – coordinate response, run blameless post-mortems, and ensure corrective actions are prioritised and implemented.
  • Capacity and cost planning – forecast growth, right-size resources, and optimise spend without sacrificing performance.
  • Document and share knowledge – create clear architecture diagrams, runbooks, and playbooks to keep the organisation unblocked.
  • Mentor and influence – champion SRE and DevOps best practices
  • Engage in team rituals – contribute to daily stand-ups, sprint planning, and roadmap reviews to keep reliability work aligned with product goals.
What do you need:
  • 6 + years in DevOps/SRE with full platform ownership and risk-based decision making
  • Kubernetes and Helm in daily use, Docker containerisation, CI/CD pipelines and version control;
  • Linux administration on Debian/Ubuntu; strong networking skills covering HTTP(S), DNS, TCP/IP, SSH, firewalls, proxies, load balancers
  • Observability stack: Prometheus, Grafana
  • Production experience with Kafka, Redis, Nginx
  • Hands-on cloud work in GCP, AWS or Azure, including HA/DR design with HPA, KEDA and affinity/anti-affinity rules
  • Proficient in at least one programming language: Python, Go, C++, or Java; operational depth with JVM and Node.js services
  • English proficiency B2 + (written and spoken)
  • Personal traits: high ownership, open-minded, naturally curious, strong communicator
What we offer:
  • Remote-first company - we enable you to work from anywhere in the world.
  • Flexible working hours - We have core working hours (11 am–3 pm CET), allowing flexible scheduling outside those hours.
  • 38 days of paid vacation leave - you have 38 days of paid time off per year, and +14 days of paid sick leave
  • Join a forward-thinking team where you have the autonomy to make your own choices and explore new ideas.

Our tech stack & methodologies:

  • Automation & IaC: Bash, Python, GoLang, Terraform
  • Observability: Elasticsearch, Kibana, FluentD; Prometheus, Grafana; Jaeger, Grafana Tempo
  • CI/CD: Bitbucket Pipelines, ArgoCD
  • Containerization & Orchestration: Docker, Kubernetes, Helm
  • Security: SOPS, Okta, TFsec, Trivy, Istio
  • Stateful Services: PostgreSQL, TimescaleDB, Redis Sentinel, Kafka, NATS
  • Networking: Nginx, Ingress-Nginx
  • Collaboration: Slack, Google Meet, Jira, Confluence, Bitbucket

Salary: B2B 75,000 - 90,000 EUR / yearly

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Sales Assistant

Savers Health Home & Beauty, Warwick
4 days ago
Location: Warwick   Hours per Week: 16 hours with the opportunity to work more hours.Shift pattern: Part-time - flexible shift patterns across mornings; afternoons; evenings and weekends, which will be discussed further at interview  Salary: £9.50 - £12.50 per hourIf you love retail, you’re in the right place.  Are you looking to join a great place to work?  We are recruiting...

Pricing and Data Intelligence Manager

Volvo Trucks, Warwick
5 days ago
Salary: £45,000 – £50,000 Location: Warwick (Hybrid – 60% office-based)At Volvo Trucks, we’re not just engineering vehicles. We’re shaping the future of transport.We’re a global brand built on innovation, trust, and a deep belief in the power of people working together. Whether it’s developing next-generation products or delivering smart, data-led insights, our success depends on our ability to collaborate, adapt,...

Nursing Clinical Lead - Emergency Department

South Warwickshire NHS Foundation Trust, Warwick
3 weeks ago
The Emergency Department is looking for a dynamic and motivated leader to join the department as Clinical Lead Nurse.The post holder will have full responsibility for the overall nursing management of the ED at Warwick Hospital and Minor Injuries in Stratford, providing clinical and professional leadership for Nurses, Nurse Associates, Clinical Support Workers and Emergency Nurse Practitioners (ENP’s). This will...