$2,769.00 Fixed
BrightMart Solutions
Contract · Flexible hours
About the role
BrightMart Solutions is seeking a Senior Site Reliability Engineer to design, implement, and maintain highly available services for a fast‑growing e‑commerce platform. The project focuses on improving reliability, automating operational tasks, and scaling infrastructure to handle peak traffic.
Key responsibilities
- Design and implement monitoring, alerting, and incident response processes.
- Automate provisioning and configuration of cloud resources using IaC tools.
- Optimize performance and reliability of containerized workloads on Kubernetes.
- Collaborate with development teams to embed reliability best practices into CI/CD pipelines.
- Conduct root‑cause analysis and post‑mortems for production incidents.
- Maintain documentation of architecture, runbooks, and SOPs.
Must-have skills
- Extensive experience with AWS services and networking.
- Strong proficiency in Docker and Kubernetes orchestration.
- Deep knowledge of Linux system administration and troubleshooting.
- Expertise in Infrastructure as Code (e.g., Terraform, CloudFormation).
- Proven track record in implementing monitoring and alerting solutions (Prometheus, Grafana, CloudWatch).
Nice to have
- Experience with chaos engineering tools.
- Familiarity with serverless architectures.
- Proposal: 0
- Less than 3 month
Byron Taylor
,
Member since
Oct 29, 2025
Total Job