Esta oferta de trabajo no está disponible en tu país.

Disaster Recovery Engineer

Finning CanadaSantiago, Región Metropolitana de Santiago, Chile

Hace 2 días

Descripción del trabajo

Company : Finning Chile S.A. Number of Openings : 1 Worker Type : Permanent Position Overview : Responsable de coordinar actividades específicas relacionadas con la Tecnología de la Información (TI) en la compañía, comprendiendo la definición, desarrollo, mantenimiento y operación de los sistemas, así como mantener el nivel tecnológico del área actualizado, coordinando la implementación de nuevos procedimientos y técnicas.

We invite you to dream big! Finning, CATERPILLAR’s strategic partner and a leader in equipment distribution and services, is looking for top talent to take on the role of Global Disaster Recovery & Resilience Engineer is responsible for building and operating the enterprise-wide Disaster Recovery (DR) and IT Service Continuity capability across Public Cloud (Azure / AWS), Hybrid Cloud (VMware in UK & South America), and Private Managed Services (SAP on HANA and Lawson M3 at a private hosting provider).

This role will design the DR strategy, runbooks / playbooks, and execution model; coordinate recovery with Network, Identity & Access Management (IAM), Storage, Application, and Security (CSIRP) teams to ensure company readiness against technology disruptions.

Major Job Functions :

Lead the execution of enterprise-wide DR processes across Public Cloud (Azure / AWS), Hybrid Cloud (VMware), Private Managed Services (SAP and Lawson M3).
Coordinate cross-domain recovery activities with Network, IAM, Storage, Application, and Security teams during DR drills and real incidents.
Optimize DR tooling and Managed Services DR process and documentation.
Ensure recoverability of Entra ID and Microsoft 365, including break-glass accounts, Conditional Access rollback, and email continuity.
Act as DR Incident Commander during outages, aligning with the Cybersecurity Incident Response Plan (CSIRP) for communication and escalation.
Validate vendor DR capabilities and coordinate joint DR tests with Private Cloud providers.

Service Improvements :

Identify and Suggest process enhancement through automation of DR workflows using tools like PowerShell, Python, or Terraform to reduce recovery time and human error.

Identify and drive remediation of single points of failure and resilience gaps across infrastructure and SaaS platforms.

Implement continuous improvement initiatives based on lessons learned from DR tests and incidents.

Work closely with the cloud financial management team (FinOps) to optimize costs associated with DR services (replication, storage, testing), ensuring financial efficiency without compromising availability.

Documentation :

Develop and maintain DR runbooks / playbooks for all critical systems, including cloud, hybrid, identity, and SaaS services.

Keep dependency maps and CMDB entries accurate for recovery sequencing.

Ensure all DR documentation is audit-ready, version-controlled, and reviewed quarterly.

Reporting :

Produce monthly DR readiness reports covering RTO / RPO compliance, backup / restores success rates, and DR test results.

Maintain audit evidence for DR exercises and backup verification.

Provide executive dashboards summarizing DR posture, risks, and improvement actions.

Requirements :

Education : Bachelor’s degree in computer science, Information Systems, Engineering, or equivalent experience.

5+ years in infrastructure / cloud operations with 3+ years focused on Disaster Recovery / IT Service Continuity.

Proven record leading multi-platform DR tests (cloud, VMware, and SaaS / Identity) and recovering complex applications in production or drills.

Hands-on with Azure Site Recovery or equivalent, AWS Elastic Disaster Recovery, VMware SRM / Zerto or equivalent, Enterprise-grade backup and data protection solutions (e.g., Cohesity or similar).

Certifications (Preferred) : Azure Solutions Architect, Azure Administrator, VMware Site Recovery Manager (SRM). ITIL Foundation. BC / DR : CBCP, ISO 22301 BCMS.

Demonstrated experience leading DR drills and real failovers across multi-cloud (Azure, AWS) and VMware environments.

Deep understanding of replication, failover, and failback strategies.

Identity and SaaS recovery (Entra ID, Microsoft 365).

Networking for DR (DNS failover, ExpressRoute / Direct Connect, VPN).

Advanced English proficiency (spoken and written).

Proficiency in tools like PowerShell, Python or any other to automate DR workflows and validation steps.

Skilled in orchestrating recovery steps with Network, IAM, and Application teams.

Ability to act as Incident Commander during outages, ensuring structured communication and escalation.

Finning is committed to collaborating with and providing reasonable accommodations / adjustments to individuals with disabilities. If you require an adjustment / accommodation at any point during the recruitment process, please inform your recruiter.

Finning is an equal opportunity employer, and we actively encourage all individuals to express themselves and achieve their full potential.

#J-18808-Ljbffr

Crear una alerta de empleo para esta búsqueda

Engineer • Santiago, Región Metropolitana de Santiago, Chile