Job Summary
The AWS Database team is looking for experienced Site Reliability Engineer to join team, which is support all AWS Databases currently available in JP Morgan. We are currently supporting a lot of AWS native DBs (RDS/Aurora/Neptune) as well as CockroachDB. As a member of SRE team you will be working on supporting existing products and further improving existing observability, telemetry. SRE team is also working closely with each product Engineering team to further improve existing implementations of them
Job Responsibilities:
As a Lead Site Reliability Engineer (SRE) you play an important role in the operations, design and development of our modern public cloud offering. You will bring your past experience to a talented group to further evolve the technical capabilities of our organization, drive improvements by implementing key SRE practices, drive automation and unlock new capabilities for our clients. All roles in Cloud Foundation Services are expected to continuously collaborate across all cloud product offerings.
We look first and foremost for people who are passionate around solving business problems through innovation and engineering practices. You'll be required to apply your depth of knowledge and expertise to all aspects of the product development lifecycle, as well as partner continuously with your stakeholders on a daily basis to stay focused on common goals. We embrace a culture of experimentation and constantly strive for improvement and learning. You’ll work in a collaborative, trusting, thought-provoking environment—one that encourages diversity of thought and creative solutions that are in the best interests of our customers globally.
Design, code, test and deliver software to improve our existing systems by adopting DevOps culture. Troubleshoot and manage incidents, communicate with stakeholders at all levels, facilitate blameless post-mortems, identify follow-up corrective and preventative actions to ensure permanent closure of incidents Actively participate in the development life cycle, ensuring reliability and scalability and operational stability Define, create and track application analytics in support of better service level objectives Ensure adherence to change management release processes, accelerate automation of these processes Run resiliency management planning, scheduling and execution of disaster recovery tests & seek to automate these activities where possible Covering on-call schedule when Production support is required outside of working hours Participate in enhancing product observability and telemetry, support modernization. Brainstorm ideas to simplify and streamline infrastructure by closely working with infrastructure and SRE teams.
Required qualifications, capabilities and skills:
Knowledge of Python / Unix Shell scripting & SQL. Good understanding of development tools: source code control software, automated build, automated testing and JIRA. Understanding of infrastructure as a code concept is desirable. Experience with build automation, test driven development, continuous integration and delivery Experience with Relational and non Relational Databases Previous SRE experience including knowledge about SLO/SLA/SLI and error budgets, is advantageous Experience working or familiarity with one public cloud (AWS, Google or Azure)
Preferred qualifications, capabilities and skills:
Experience in application and system configuration management of a large fleet (1000's of nodes under management), is advantageous. Experience in the use of declarative frameworks (Puppet or Terraform) Experience in at least one programming language (preferably Python), is advantageous Expertise in leveraging APIs, and security, authentication and data structures, is advantageous Expertise in software design using Domain Driven Design, SOLID or GRASP, is advantageous Exposure to Micro service architecture, REST API design/development, , is advantageous Exposure to Docker/ Kubernetes, is advantageous. Knowledge of source code CI/CD integration, is advantageous.
What’s in it for you?
Besides having pride in delivering value to our largest footprint database and delighting our customers, you will be part of a great team, helping our customers unlock the potential of a 'No DBA' model. As a team we care about each other and how we too can embrace modernization. We focus on:
Continued career advancement opportunities, including industry recognized certifications, such as AWS and CKAD Exposure to strong mentorship and leadership examples Professional and technical development programs Opportunities to be a valuable member of a close-knit, collaborative, diverse team that encourages networking