TEKsystems Global Services are working with a tier one investment bank in hiring a Reliability and Observability Engineer. This individual will apply innovation & optimization to how our client run production systems to ensure they remain reliable, fast, recovered quickly in case of incidents and with minimal toil. Job Duties: Increase the adoption of best practice Reliability standards including: – • Develop and standardize on existing and new observability tools • Perform feature/functionality/usability trials of new observability tools • Design and build automation around the chosen tools to make on boarding new services easy for developers (dashboards, alerts, traces, etc.) • Demonstrate great communication skills in working with technical and non-technical audience • Work with DevOps teams, release managers and project managers to design and implement performance & chaos test and integrate solutions with Continuous Integration and Continuous Delivery. • Create application and infrastructure performance plans/models for a highly scalable, highly-available and high-throughput systems. • Evaluate, develop and execute load test tools to stress the limits of our services. • Understand and performance test the APIs and integration patterns to solve challenging distributed system problems. • Define performance strategy and reporting performance baselines required to certify Go-Lives. • Actively contribute to capacity planning, disaster recovery and environment consistency assurance. • Identify anti-patterns and create fall-back experiences to critical scenarios. • Build workflows and software to automate chaos experiments to safely probe failure modes of our services via fault injections, and measure their resilience levels. Candidate Requirements: • Expertise in deploying and using open-source observability tools in large scale environments, including Prometheus, Grafana, ELK (ElasticSearch + Logstash + Kibana), Jaeger, Kiali, and/or Loki. • Familiarity with open standards like OpenTelemetry, OpenTracing, and OpenMetrics • Experience leading integrations with commercial observability tools like Splunk, Datadog, New Relic, Honeycomb, Sumo Logic or others. • Familiarity with Kubernetes and Istio as the architecture on which the observability platform runs, and how they integrate and scale. • Hands on experience working with AWS, GCP and micro service architecture • 4+ years of experience with a high-level scripting language (such as Go, Shell or Python) • Hands on expertise with cloud providers like AWS, Azure and/or GCP • Experience supporting and enabling application infrastructure that supports high availability/resiliency • Knowledge using tools such as chaos monkey, simian army, chaos toolkits and understanding of APM solutions like AppDynamics, Dynatrace, etc.
Job Title: Reliability And Observability Engineer
Location: London, UK
Job Type: Contract
Trading as TEKsystems. Allegis Group Limited, Maxis 2, Western Road, Bracknell, RG12 1RT, United Kingdom. No. . Allegis Group Limited operates as an Employment Business and Employment Agency as set out in the Conduct of Employment Agencies and Employment Businesses Regulations 2003. TEKsystems is a company within the Allegis Group network of companies (collectively referred to as “Allegis Group”). Aerotek, Aston Carter, EASi, Talentis Solutions, TEKsystems, Stamford Consultants and The Stamford Group are Allegis Group brands. If you apply, your personal data will be processed as described in the Allegis Group Online Privacy Notice available at
To access our Online Privacy Notice, which explains what information we may collect, use, share, and store about you, and describes your rights and choices about this, please go to
We are part of a global network of companies and as a result, the personal data you provide will be shared within Allegis Group and transferred and processed outside the UK, Switzerland and European Economic Area subject to the protections described in the Allegis Group Online Privacy Notice. We store personal data in the UK, EEA, Switzerland and the USA. If you would like to exercise your privacy rights, please visit the “Contacting Us” section of our Online Privacy Notice at for details on how to contact us. To protect your privacy and security, we may take steps to verify your identity, such as a password and user ID if there is an account associated with your request, or identifying information such as your address or date of birth, before proceeding with your request. If you are resident in the UK, EEA or Switzerland, we will process any access request you make in accordance with our commitments under the UK Data Protection Act, EU-U.S. Privacy Shield or the Swiss-U.S. Privacy Shield.