Overview
Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to Equifax engineering principles. SRE is an engineering approach to building and running production systems – we engineer solutions to operational problems. Our SREs are responsible for overall system operation and we use a breadth of tools and approaches to solve a broad set of problems, including limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages. Our culture emphasizes diversity, curiosity, problem solving, openness, collaboration, and mentorship.
What You’ll Do
- Manage system uptime across cloud-native (AWS, GCP) and hybrid architectures.
- Build infrastructure as code (IAC) patterns that meet security and engineering standards using Terraform, scripting with cloud CLI, and programming with cloud SDK.
- Build CI / CD pipelines for build, test and deployment of applications and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains.
- Develop automated tooling to deploy service changes to production and create comprehensive runbooks to detect, remediate and restore services.
- Solve problems and triage complex distributed systems; on-call for high-severity incidents and update runbooks to improve MTTR.
- Lead blameless postmortems and own actions to remediate recurrences.
What Experience You Need
BS degree in Computer Science or related technical field involving coding, or equivalent experience.5-7 years of experience in software engineering, systems administration, database administration, and networking.2+ years of experience developing and / or administering software in a public cloud.Experience monitoring infrastructure and applications to ensure uptime and performance objectives.Experience with Python, Bash, Java, Go, JavaScript and / or Node.js.Cross-functional knowledge with systems, storage, networking, security and databases.System administration skills including automation and orchestration of Linux / Windows using Terraform, Chef, Ansible and / or containers (Docker, Kubernetes).Proficiency with CI / CD tooling and practices.Experience with GCP, AWS and Azure.Seniority level
Mid-Senior levelEmployment type
Full-timeJob function
Engineering and Information Technology#J-18808-Ljbffr