Cloud DevOps Engineer
Synergia for Workforce Management is seeking a skilled Cloud DevOps Engineer to join an innovative "US based startup” AI-driven SaaS startup revolutionizing collaborative research. This platform leverages advanced AI workflows to accelerate evidence-based research by simplifying the search, extraction, and synthesis of study findings, empowering researchers with efficient, data-driven insights
Position Type: Remote
Synergia invites applications for a DevOps/Site Reliability Engineer (SRE) to maintain and enhance the operational performance, reliability, and scalability of critical production systems. In this role, you will focus on developing and automating the build, test, and deployment pipeline, managing and optimizing infrastructure costs, strengthening system resilience, and proactively managing incidents to ensure seamless, high-availability service.
Ideal candidates bring a strong foundation in cloud infrastructure, DevOps best practices, and a proactive approach to system efficiency and reliability.
Key Responsibilities
- Automation and Deployment
- Design and manage automation for software build, test, and deployment, optimizing speed and resource use.
- Oversee production deployment for web, mobile, and API applications; ensure release consistency, testing, and QA integration.
- Monitoring and Incident Management
- Develop, implement, and manage monitoring tools to ensure application availability and performance.
- Configure alert triggers, track performance metrics, and troubleshoot infrastructure issues to identify bottlenecks or scale challenges.
- Collaborate with third-party services for enhanced observability and monitoring, ensuring integration with error-reporting tools.
- Performance Optimization
- Regularly review infrastructure resources, implement cost-saving measures, and respond to cloud vendor recommendations for resource upgrades and security.
- Refine auto-scaling rules based on real-time metrics and optimize cloud service costs through resource management.
- Disaster Recovery and Resilience
- Lead the development and maintenance of disaster recovery plans and ensure readiness through regular drills and testing.
جميع الحقوق محفوظة لموقع جوبس.
Required Skills and Experience
- Core Technical Skills
- Deployment Management: Experienced in deploying and managing web, mobile, and API applications in cloud environments such as AWS, Azure, or GCP.
- Monitoring and Incident Management: Proficient in using monitoring and observability tools like NewRelic, Datadog, or Prometheus/Grafana.
- System Software and Configuration: Skilled with CI/CD tools (Azure Pipelines, Jenkins, CircleCI) and containerization tools (Docker, Kubernetes, Helm).
- Problem Solving and Response: Experienced in managing cloud infrastructure, troubleshooting, and incident response.
- Foundational Skills
- Collaboration: Proven ability to work cross-functionally with product, QA, and development teams.
- Data and Analytics: Proficient in scripting languages such as Bash for automation tasks.
- Cost Management: Track record in cost-optimization of cloud services.
- Professional Qualifications
- Bachelor's degree in computer science or related field.
- Minimum of 3 years’ experience in a DevOps/SRE role.
Why Join Us?
We offer a competitive salary, remote flexibility, and the opportunity to work within an innovative, forward-thinking environment. Join us to make a meaningful impact in accelerating evidence-based research through cutting-edge technology.
Candidates who meet the required experience criteria are invited to submit their CV, experience records, and relevant documents to [email protected] by 20 November 2024.