At ZILO™, we're redefining what’s possible in technology. ZILO™ is the UK-based FinTech specialising in global asset and wealth management software, designed to scale and transform businesses of all types using our own developed AI Technology. Our mission is to digitalise the future of the global asset management industry.
We are a team of experts with decades of combined experience at leading firms globally, who thrive in fast-paced environments and want to shape the future of technology. Every individual plays a key role in driving progress and making a real impact. We continuously strive to innovate and improve.
Why work with us? At ZILO™, you'll be part of a dynamic and inclusive environment where creativity thrives. We offer the opportunity to work on cutting-edge technology, collaborate with talented individuals, and contribute to projects that have a real-world impact. We value continuous learning, personal growth, and providing our team with the resources they need to succeed.
Ready to shape the future? Let’s talk.
Requirements
We’re looking for a seasoned SRE with a front-end focus, expert in React applications, to join our SRE team. In this role you’ll ensure the reliability, performance, and operability of our React-based user interfaces running on AWS and Kubernetes. You’ll lead incident response for client-side issues, diagnose end-to-end failures in the stack, and build tooling to automate detection and self-healing.
Key Responsibilities
- Incident Response & Troubleshooting
- Act as primary on-call for React application incidents: crashes, memory leaks, performance regressions, or deployment failures.
- Analyze browser logs, application metrics (e.g. Real User Monitoring), and backend traces to isolate root causes across React, Node.js services, AWS, and Kubernetes layers.
- Orchestrate post-incident reviews: document findings, define mitigation plans, and drive tickets to resolution.
- Reliability Engineering & Automation
- Develop and maintain robust observability for front-end components: integrate Datadog for obervability
- Define SLIs/SLOs for page load times, Time to Interactive, and error rates; build alerting that balances sensitivity with noise reduction.
- Automate deployments via CI/CD pipelines (GitHub Actions), including end-to-end tests, canary releases, and rollbacks for React apps.
- Infrastructure & Scaling
- Design and operate Kubernetes (EKS) clusters hosting Node.js microservices and SSR/Next.js rendering tiers.
- Implement auto-scaling policies and ensure blue/green or rolling updates minimize user disruption.
- Manage AWS infrastructure (EC2, ALB, CloudFront, S3) to optimize content delivery and reliability of front-end assets.
- Performance Optimization
- Profile and tune React applications: code-splitting, lazy loading components, optimizing bundle sizes, and minimizing hydration times.
- Leverage caching strategies (CDN invalidation, HTTP caching headers) to reduce latency and origin load.
- Collaborate with UX teams to balance feature richness with performance targets.
- Collaboration & Knowledge Sharing
- Serve as the React/SRE subject-matter expert: mentor engineers on best practices for building resilient front-ends.
- Produce and maintain runbooks, debugging guides, and incident-playbooks specific to client-side failures.
- Partner closely with wider backend SRE, DevOps, and product teams to ensure end-to-end reliability.
Benefits
- Enhanced leave - 38 days inclusive of 8 UK Public Holidays
- Private Health Care including family cover
- Life Assurance – 5x salary
- Flexible working-work from home and/or in our London Office
- Employee Assistance Program
- Company Pension (Salary Sacrifice options available)
- Access to training and development
- Buy and Sell holiday scheme
- The opportunity for “work from anywhere/global mobility”