Senior Site Reliability Engineer

ICEO - Venture Builder


Date: 6 hours ago
City: Warwick
Contract type: Contractor
Remote
Senior Site Reliability Engineer. Remote

Shape the reliability experience of an always-on crypto platform delivering seamless service across North America and Europe.

As an SRE, you'll be leading the charge in designing active-active failover, cross-region routing, and distributed services that are resilient to cloud outages and geopolitical quirks. This is real-world chaos engineering at a global scale.
You’ll lead the creation of a region-aware CI/CD pipeline with canary deployments, automated rollbacks, and feature flags tailored per continent.

Join us remotely, you can be located anywhere in Europe within the CET/CEST time zones, as our work is 100% remote. This is a full-time position.

About us:

ZND is the simple gateway to digital finance, already trusted by thousands of users who have moved over €30+ million through the platform. Fueled by a token raise, we’re rolling out an AI chat assistant for every action and instant credit line on your digital assets.

Our bold vision is to be the place where anyone can trade, earn, ask, and borrow in seconds - crypto made effortless.

What you will be doing:
  • Set and drive SRE strategy – translate business goals into quarterly reliability targets, track progress, and adjust course as needed.
  • Own GCP / GKE architecture – design, implement, and maintain secure, low-latency, highly available clusters across regions.
  • Automate reliability – build self-healing, auto-scaling, and automated incident-response workflows that minimise manual toil.
  • Embed high availability - partner with engineers and product to ship fault-tolerant node.js/JVM services and predictable releases
  • Manage SLIs, and error budgets – define, monitor, report, and continuously improve service reliability metrics.
  • Execute chaos engineering – plan and run automated fault-injection (e.g., Chaos Mesh) to validate resilience before customers are affected.
  • Lead incidents – coordinate response, run blameless post-mortems, and ensure corrective actions are prioritised and implemented.
  • Capacity and cost planning – forecast growth, right-size resources, and optimise spend without sacrificing performance.
  • Document and share knowledge – create clear architecture diagrams, runbooks, and playbooks to keep the organisation unblocked.
  • Mentor and influence – champion SRE and DevOps best practices
  • Engage in team rituals – contribute to daily stand-ups, sprint planning, and roadmap reviews to keep reliability work aligned with product goals.
What do you need:
  • 6 + years in DevOps/SRE with full platform ownership and risk-based decision making
  • Kubernetes and Helm in daily use, Docker containerisation, CI/CD pipelines and version control;
  • Linux administration on Debian/Ubuntu; strong networking skills covering HTTP(S), DNS, TCP/IP, SSH, firewalls, proxies, load balancers
  • Observability stack: Prometheus, Grafana
  • Production experience with Kafka, Redis, Nginx
  • Hands-on cloud work in GCP, AWS or Azure, including HA/DR design with HPA, KEDA and affinity/anti-affinity rules
  • Proficient in at least one programming language: Python, Go, C++, or Java; operational depth with JVM and Node.js services
  • English proficiency B2 + (written and spoken)
  • Personal traits: high ownership, open-minded, naturally curious, strong communicator
What we offer:
  • Remote-first company - we enable you to work from anywhere in the world.
  • Flexible working hours - We have core working hours (11 am–3 pm CET), allowing flexible scheduling outside those hours.
  • 38 days of paid vacation leave - you have 38 days of paid time off per year, and +14 days of paid sick leave
  • Join a forward-thinking team where you have the autonomy to make your own choices and explore new ideas.

Our tech stack & methodologies:

  • Automation & IaC: Bash, Python, GoLang, Terraform
  • Observability: Elasticsearch, Kibana, FluentD; Prometheus, Grafana; Jaeger, Grafana Tempo
  • CI/CD: Bitbucket Pipelines, ArgoCD
  • Containerization & Orchestration: Docker, Kubernetes, Helm
  • Security: SOPS, Okta, TFsec, Trivy, Istio
  • Stateful Services: PostgreSQL, TimescaleDB, Redis Sentinel, Kafka, NATS
  • Networking: Nginx, Ingress-Nginx
  • Collaboration: Slack, Google Meet, Jira, Confluence, Bitbucket

Salary: B2B 75,000 - 90,000 EUR / yearly

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Sales Assistant

Savers Health Home & Beauty, Warwick
4 days ago
Location: Warwick   Hours per Week: 16 hours with the opportunity to work more hours.Shift pattern: Part-time - flexible shift patterns across mornings; afternoons; evenings and weekends, which will be discussed further at interview  Salary: £9.50 - £12.50 per hourIf you love retail, you’re in the right place.  Are you looking to join a great place to work?  We are recruiting...

Finance Assistant

Warwickshire Police, Warwick
2 weeks ago
Leek WottonPermanentFull Time£26,106.00 - £28,653.00To ensure transactional requests generated by the finance team are actioned in an accurate and timely manner. Providing support and assistance to the finance team particularly the finance officers and senior finance officer. Assist in other transactional activity as required such as generating purchase orders, processing payment requests and creating supplier/customer records.Main ResponsibilitiesTo raise and record...

Supplier Quality Engineer

Royal Terberg Group, Warwick
2 weeks ago
Role: Supplier Quality EngineerHours: 37 hpw, between 08:00 and 16:30 Monday to Thursday and 08:00-13:00 FridayAre you a quality-focused professional with a keen eye for detail? We are seeking a Supplier Quality Engineer to join our team in Warwick. In this role, you will be responsible for assessing new and existing suppliers, managing incoming supplier materials, supporting project teams, and...