Senior Site Reliability Engineer
UJET
Description
About Us
UJET leads the way in AI-powered contact center innovation, delivering a future-proof, cloud platform that redefines the customer experience with cutting-edge AI, true multimodality, and a mobile-first approach. We infuse AI across every aspect of your customer journey and contact center operations, to drive automation and efficiency. UJET's AI solutions empower agents, optimize customer journeys, and transform contact center operations for elevated experiences and actionable insights. Built on a cloud-native architecture with a unique CRM-first approach, UJET ensures unmatched security, scalability, and prioritized data insights (without storing PII). Designed for effortless use, UJET partners with businesses to deliver exceptional interactions, smarter decision-making, and accelerated growth in the AI-driven world.
Learn more at www.ujet.cx.
Position Overview
Weâre looking for a Senior Site Reliability Engineer to help build and scale a high-impact SRE function. Youâll be a technical leader on a team responsible for improving system reliability, reducing operational toil, and establishing best practices across engineering.bIn this position, youâll design how reliability works in UJET, influence engineering decisions, and build the tooling and processes that make production safer and more predictable.
Responsibilities
- Lead efforts to improve system reliability, scalability, and performance across critical services
- Define and implement SLIs/SLOs and error budgets, and use them to guide engineering priorities
- Design and develop observability systems (metrics, logging, tracing, alerting) that produce actionable alerts and data with minimal noise
- Lead complex incident response, acting as incident commander when needed
- Conduct postmortems focused on systemic causes rather than individual fault, and ensure corrective actions from those reviews are completed.
- Identify and eliminate toil through automation, tooling, and improved workflows
- Partner with product and platform teams on architecture decisions, production readiness, and de
Please mention the word **UNQUESTIONABLY** and tag RODguMTk4Ljk5LjE0Mw== when applying to show you read the job post completely (#RODguMTk4Ljk5LjE0Mw==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.
Tags
Apply for this Position
About UJET
UJET is a cloud-based customer support platform that provides tools for businesses to manage customer interactions. It offers features such as ticketing, chat, and voice support.
Job Stats
Hiring Across Borders?
Interview Prep Guide
Preparation Strategy
To prepare for this role, focus on reviewing the fundamentals of site reliability engineering, including SLIs, SLOs, and error budgets. Practice explaining technical concepts and architectures to non-technical stakeholders. Review your past experiences and be prepared to discuss specific examples of your leadership, collaboration, and problem-solving skills. Additionally, brush up on your knowledge of system design, scalability, and performance optimization.
Likely Interview Rounds
- 1. Screening call~30 min
What to prep: Review the fundamentals of site reliability engineering, including SLIs, SLOs, and error budgets. Be prepared to discuss your experience with incident response and postmortem analysis.
- What do you know about SRE principles and how have you applied them in your previous roles?
- Can you describe your experience with observability systems and how you've used them to improve system reliability?
- 2. Technical~60 min
What to prep: Brush up on your knowledge of system design, scalability, and performance optimization. Practice explaining technical concepts and architectures to non-technical stakeholders.
- How would you design a system to monitor and alert on critical service performance metrics?
- Can you explain your approach to automating workflows and eliminating toil in a production environment?
- 3. System design~60 min
What to prep: Review system design principles and practice designing systems for scalability, reliability, and performance. Be prepared to discuss trade-offs and design decisions.
- Design a system for monitoring and logging in a cloud-native architecture.
- How would you approach building a highly available and scalable system for a critical service?
- 4. Behavioral~60 min
What to prep: Review your past experiences and be prepared to discuss specific examples of your leadership, collaboration, and problem-solving skills.
- Tell me about a time when you had to lead a complex incident response effort. What was your approach and what did you learn from the experience?
- Can you describe your experience working with cross-functional teams, such as product and platform teams, to drive technical decisions?
Most Likely Questions
- What do you know about SRE principles and how have you applied them in your previous roles?
- Can you describe your experience with observability systems and how you've used them to improve system reliability?
- How would you design a system to monitor and alert on critical service performance metrics?
- Can you explain your approach to automating workflows and eliminating toil in a production environment?
Common Pitfalls
- Lack of experience with cloud-native architectures and scalability
- Inadequate understanding of SRE principles and practices
- Insufficient experience with incident response and postmortem analysis
Free Prep Resources
- • LeetCode
- • System Design Primer (GitHub: donnemartin)
- • Google's SRE Book