Purpose:
System Reliability Engineers (also known as Site Reliability Engineers) are responsible for the keeping all user-facing services (most notably HungerStation.com) and many other HungerStation production systems running smoothly 24/7/365. SREs are a blend of operations gear-heads and software crafters that apply sound engineering principles, operational discipline and mature automation, specializing in systems, whether it be networking, the Linux kernel, or even a specific interest in scaling, algorithms, or distributed systems.
Responsibilities:
- Keep abreast of latest hardware development methodologies in order to be able to provide best-in-class hardware solutions
- Assist in managing technical hardware support activities to internal customers in order to establish optimum customer service levels and incident resolution
- Provide hardware support to the Technology team in order to support continuous operations for the business
- Provide data to execute root cause analysis (RCA)
- Keep record of all issues/incidents and provide analytical reports (RCA/Impact Analysis) for resolution implementation and to keep the senior management informed
- Report on and resolve all hardware malfunctions, anomalies, and issues in a timely and accurate manner, and provide guidance for resolutions to increase the operational capability of the Department and the Business
- Improve continuous integration/continuous delivery (CICD) practices within the Organization to ensure reaching optimum operational levels
Requirements
- 9- 12 years of relevant experience
- Bachelor Degree in a relevant field is required
- Master’s degree in a relevant field is preferred
- You may be a fit to this role if you:
- Think about systems - edge cases, failure modes, behaviors, specific implementations.
- Know your way around Linux and the Unix Shell.
- Know what is the use of config management systems like Terraform, Ansible, Chef . etc.
- Have strong programming skills - Ruby and/or Go.
- Have an urge to collaborate and communicate asynchronously.
- Have an urge to document all the things so you don't need to learn the same thing twice.
- Have a proactive, go-for-it attitude. When you see something broken, you can't help but fix it.
- Have an urge for delivering quickly and iterating fast.
- Share our values, and work in accordance with those values.
- Have experience with Docker, Kubernetes, Prometheus and other cloud-native tools.