Senior Site Reliability Engineer
ICEO - Venture Builder

Shape the reliability experience of an always-on crypto platform delivering seamless service across North America and Europe.
As an SRE, you'll be leading the charge in designing active-active failover, cross-region routing, and distributed services that are resilient to cloud outages and geopolitical quirks. This is real-world chaos engineering at a global scale.
You’ll lead the creation of a region-aware CI/CD pipeline with canary deployments, automated rollbacks, and feature flags tailored per continent.
Join us remotely, you can be located anywhere in Europe within the CET/CEST time zones, as our work is 100% remote. This is a full-time position.
About us:
ZND is the simple gateway to digital finance, already trusted by thousands of users who have moved over €30+ million through the platform. Fueled by a token raise, we’re rolling out an AI chat assistant for every action and instant credit line on your digital assets.
Our bold vision is to be the place where anyone can trade, earn, ask, and borrow in seconds - crypto made effortless.
What you will be doing:- Set and drive SRE strategy – translate business goals into quarterly reliability targets, track progress, and adjust course as needed.
- Own GCP / GKE architecture – design, implement, and maintain secure, low-latency, highly available clusters across regions.
- Automate reliability – build self-healing, auto-scaling, and automated incident-response workflows that minimise manual toil.
- Embed high availability - partner with engineers and product to ship fault-tolerant node.js/JVM services and predictable releases
- Manage SLIs, and error budgets – define, monitor, report, and continuously improve service reliability metrics.
- Execute chaos engineering – plan and run automated fault-injection (e.g., Chaos Mesh) to validate resilience before customers are affected.
- Lead incidents – coordinate response, run blameless post-mortems, and ensure corrective actions are prioritised and implemented.
- Capacity and cost planning – forecast growth, right-size resources, and optimise spend without sacrificing performance.
- Document and share knowledge – create clear architecture diagrams, runbooks, and playbooks to keep the organisation unblocked.
- Mentor and influence – champion SRE and DevOps best practices
- Engage in team rituals – contribute to daily stand-ups, sprint planning, and roadmap reviews to keep reliability work aligned with product goals.
- 6 + years in DevOps/SRE with full platform ownership and risk-based decision making
- Kubernetes and Helm in daily use, Docker containerisation, CI/CD pipelines and version control;
- Linux administration on Debian/Ubuntu; strong networking skills covering HTTP(S), DNS, TCP/IP, SSH, firewalls, proxies, load balancers
- Observability stack: Prometheus, Grafana
- Production experience with Kafka, Redis, Nginx
- Hands-on cloud work in GCP, AWS or Azure, including HA/DR design with HPA, KEDA and affinity/anti-affinity rules
- Proficient in at least one programming language: Python, Go, C++, or Java; operational depth with JVM and Node.js services
- English proficiency B2 + (written and spoken)
- Personal traits: high ownership, open-minded, naturally curious, strong communicator
- Remote-first company - we enable you to work from anywhere in the world.
- Flexible working hours - We have core working hours (11 am–3 pm CET), allowing flexible scheduling outside those hours.
- 38 days of paid vacation leave - you have 38 days of paid time off per year, and +14 days of paid sick leave
- Join a forward-thinking team where you have the autonomy to make your own choices and explore new ideas.
Our tech stack & methodologies:
- Automation & IaC: Bash, Python, GoLang, Terraform
- Observability: Elasticsearch, Kibana, FluentD; Prometheus, Grafana; Jaeger, Grafana Tempo
- CI/CD: Bitbucket Pipelines, ArgoCD
- Containerization & Orchestration: Docker, Kubernetes, Helm
- Security: SOPS, Okta, TFsec, Trivy, Istio
- Stateful Services: PostgreSQL, TimescaleDB, Redis Sentinel, Kafka, NATS
- Networking: Nginx, Ingress-Nginx
- Collaboration: Slack, Google Meet, Jira, Confluence, Bitbucket
Salary: B2B 75,000 - 90,000 EUR / yearly
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Sales Assistant

Finance Assistant

Supplier Quality Engineer
