Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.
As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Payments Technology team, you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for the services in your application and product lines. You will ensure those NFRs are accounted for in your products’ design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production. You will solve complex problems in code with a quality driven Product centric approach.
Job responsibilities
Creates high quality designs, roadmaps, plans, standards, and program charters that are delivered by you, the engineers under your guidance or the wider Payments engineering community. Experienced code writing experience and reviewing the teams code to ensure it is at a high quality. Demonstrates site reliability culture, principles and practices every day and champions the adoption of site reliability throughout your team. Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt. Collaborate in the design, creation and advocacy of SRE products that can be used to scale the implementation of SRE best practices within Payments. Evolves and debug critical components of applications and platforms. Makes significant contributions to JPMorgan Chase’s site reliability community via internal forums, communities of practice, guilds, and conferences. Building and design highly distributed systems and SRE products, solving complex problems in code. Maintain and promote best practices in software engineering, leading by example and mentoring others.
Required qualifications, capabilities, and skills.
Demonstrable advanced applied experience of SRE concepts, strategies, and culture. Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as OTEL, Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. Currently hands on, with deep expertise, in at least one of the following programming languages. JAVA, Go, Python, TypeScript, JavaScript. Familiarity with software design patterns Expertise in the software delivery life cycle and tooling, with deep understanding of branching and testing strategies. Advanced experience with developing containerised, serverless and event driven systems. An agile practitioner. Recognized as an active contributor of the engineering community. Ability to anticipate, identify, and troubleshoot defects found during testing. Strong communication skills with ability to mentor and educate others on site reliability and software engineering principles and practices.