Senior Site Reliability Engineer

ICEO - Venture Builder

Date: 2 weeks ago

City: Warwick

Contract type: Contractor

Remote

Senior Site Reliability Engineer. Remote

Shape the reliability experience of an always-on crypto platform delivering seamless service across North America and Europe.

As an SRE, you'll be leading the charge in designing active-active failover, cross-region routing, and distributed services that are resilient to cloud outages and geopolitical quirks. This is real-world chaos engineering at a global scale.
You’ll lead the creation of a region-aware CI/CD pipeline with canary deployments, automated rollbacks, and feature flags tailored per continent.

Join us remotely, you can be located anywhere in Europe within the CET/CEST time zones, as our work is 100% remote. This is a full-time position.

About us:

ZND is the simple gateway to digital finance, already trusted by thousands of users who have moved over €30+ million through the platform. Fueled by a token raise, we’re rolling out an AI chat assistant for every action and instant credit line on your digital assets.

Our bold vision is to be the place where anyone can trade, earn, ask, and borrow in seconds - crypto made effortless.

What you will be doing:

Set and drive SRE strategy – translate business goals into quarterly reliability targets, track progress, and adjust course as needed.
Own GCP / GKE architecture – design, implement, and maintain secure, low-latency, highly available clusters across regions.
Automate reliability – build self-healing, auto-scaling, and automated incident-response workflows that minimise manual toil.
Embed high availability - partner with engineers and product to ship fault-tolerant node.js/JVM services and predictable releases
Manage SLIs, and error budgets – define, monitor, report, and continuously improve service reliability metrics.
Execute chaos engineering – plan and run automated fault-injection (e.g., Chaos Mesh) to validate resilience before customers are affected.
Lead incidents – coordinate response, run blameless post-mortems, and ensure corrective actions are prioritised and implemented.
Capacity and cost planning – forecast growth, right-size resources, and optimise spend without sacrificing performance.
Document and share knowledge – create clear architecture diagrams, runbooks, and playbooks to keep the organisation unblocked.
Mentor and influence – champion SRE and DevOps best practices
Engage in team rituals – contribute to daily stand-ups, sprint planning, and roadmap reviews to keep reliability work aligned with product goals.

What do you need:

6 + years in DevOps/SRE with full platform ownership and risk-based decision making
Kubernetes and Helm in daily use, Docker containerisation, CI/CD pipelines and version control;
Linux administration on Debian/Ubuntu; strong networking skills covering HTTP(S), DNS, TCP/IP, SSH, firewalls, proxies, load balancers
Observability stack: Prometheus, Grafana
Production experience with Kafka, Redis, Nginx
Hands-on cloud work in GCP, AWS or Azure, including HA/DR design with HPA, KEDA and affinity/anti-affinity rules
Proficient in at least one programming language: Python, Go, C++, or Java; operational depth with JVM and Node.js services
English proficiency B2 + (written and spoken)
Personal traits: high ownership, open-minded, naturally curious, strong communicator

What we offer:

Remote-first company - we enable you to work from anywhere in the world.
Flexible working hours - We have core working hours (11 am–3 pm CET), allowing flexible scheduling outside those hours.
38 days of paid vacation leave - you have 38 days of paid time off per year, and +14 days of paid sick leave
Join a forward-thinking team where you have the autonomy to make your own choices and explore new ideas.

Our tech stack & methodologies:

Automation & IaC: Bash, Python, GoLang, Terraform
Observability: Elasticsearch, Kibana, FluentD; Prometheus, Grafana; Jaeger, Grafana Tempo
CI/CD: Bitbucket Pipelines, ArgoCD
Containerization & Orchestration: Docker, Kubernetes, Helm
Security: SOPS, Okta, TFsec, Trivy, Istio
Stateful Services: PostgreSQL, TimescaleDB, Redis Sentinel, Kafka, NATS
Networking: Nginx, Ingress-Nginx
Collaboration: Slack, Google Meet, Jira, Confluence, Bitbucket

Salary: B2B 75,000 - 90,000 EUR / yearly

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Emergency Response and Resilience Officer

National Gas, Warwick

4 days ago

Emergency Response & Resilience Officer - Warwick (Hybrid) - £40,755 to £51,000National Gas is securing Britain’s energy. We transport gas throughout Great Britain, repair and maintain gas pipelines, and manage the meters that allow millions of homes and businesses to access the energy they need. We are the national gas network, providing secure energy to power Britain, achieve net zero,...

Train Presentation Operative

Southeastern Railway, Warwick

2 weeks ago

Job Introduction Train Presentation Operative We are seeking dedicated and detail-oriented individuals to join our team as Train Presentation Operatives. As a Train Presentation Operative, you will play a crucial role in ensuring that our trains are maintained to the highest standards, meeting both safety and compliance regulations and our passengers' expectations. Can you move people? Apply now.What You'll DoInternal...

AI Software Engineer

Bright, Warwick

2 weeks ago

Department: AIEmployment Type: Full TimeLocation: WarwickDescriptionWho are we?At Bright, we've engineered cutting-edge software for accounting, payroll, tax, and practice management. We've assembled a team of top talent and stand ready to lead the industry with our superior software solutions and unparalleled customer support. We're brilliant people creating brilliant software! Join us in our mission to create software that empowers businesses...