#IkoKaziKE

Back to jobs

Lead Devops Engineer, Foundry Rnd

Mastercard

full time Nairobi Posted 2 days ago

Required Skills

  • Education & Background: Bachelor's degree in Computer Science, Engineering, or related field. 8–12+ years of proven experience architecting and operating production-grade infrastructure, especially those supporting AI/ML workloads.

  • Infrastructure as Code: Expert in Terraform and IaC orchestration tools like Terragrunt. Strong experience with configuration management and GitOps practices.

  • Programming & Scripting: Advanced Bash and Python skills and strong software engineering fundamentals (version control, CI, code reviews). Familiarity with Go or other systems programming languages is a plus.

  • CI/CD & Automation: Hands-on experience with Jenkins, GitHub Actions, GitLab CI, or similar tools. Strong understanding of pipeline design, artifact management, and deployment strategies.

  • Monitoring & Observability: Experience with monitoring stacks such as Prometheus, Grafana, Splunk, and ELK. Skilled in building dashboards, alerts, and tuning observability for ML-specific use cases.

  • Cloud Infrastructure: Experience deploying systems on AWS/Azure/GCP. Familiar with cloud-native services, serverless computing, and managed Kubernetes offerings (EKS, AKS, GKE). Comfortable with Linux internals and shell scripting.

  • Security & Networking: Knowledge of security best practices for MLOps, including data privacy, compliance, access controls, and encryption. Understanding of modern networking protocols (mTLS) and secure service communication.

  • Collaboration & Agile Delivery: Strong communication skills and experience working with cross-functional teams. Ability to document designs clearly and deliver iteratively using agile practices.

  • Drive Platform Infrastructure: Own DevOps and infrastructure for MLOps and agentic AI systems, establishing reusable patterns for CI/CD, scalable inference, orchestration, observability, and cost control. Design secure, scalable, repeatable systems using Infrastructure as Code (IaC) to support R&D workloads.

  • Build secure CI/CD & automation systems: Enable secure tool access, workload isolation, and infrastructure for LLM-backed APIs and MCP servers, while partnering with security and compliance on access control, infrastructure governance and auditability.

  • Ensure Reliability & Observability: Implement monitoring, logging, and alerting. Tune observability for ML-specific workloads to ensure performance, reliability, and operational insight.

  • Provide Technical Leadership: Offer hands-on leadership across DevOps and platform initiatives. Review code, enforce best practices, improve tooling, and promote clean, well-tested infrastructure.

  • Cross-Functional Collaboration: Partner with ML, software, and platform engineers to design deployment strategies, scope work, manage agile deliverables, and meet milestones.