SLO vs SLI Interview Questions at Google SRE: A Comparison of What Interviewers Ask

Google SRE interviewers separate SLO from SLI by demanding a business‑outcome framing and a concrete error‑budget calculation, not a textbook definition. A candidate who treats the SLO as a static number fails, while one who ties it to user‑impact and iteration cycles succeeds. Prepare with the SLO‑SLI‑Error‑Budget Triangle framework, rehearse a three‑sentence script, and expect three phone screens plus a four‑day on‑site loop lasting about four weeks.

You are a mid‑level to senior SRE or site reliability engineer with 3–7 years of production experience, currently earning $150k‑$190k base, and you are targeting a Google SRE role that promises $180k‑$210k base plus $0.05%‑0.08% equity. You have already passed a resume screen and are gearing up for the technical interview loop, but you are uncertain how to distinguish SLO from SLI in the eyes of Google interviewers. This guide is for you.

What do Google SRE interviewers expect when I talk about SLOs versus SLIs?

Interviewers expect you to present SLOs as business‑level service commitments and SLIs as the measurable signals that feed into those commitments, not the other way around. In a Q2 on‑site debrief, the hiring manager pushed back because the candidate described an SLI as “the service‑level objective,” which signaled a misunderstanding of the decision‑making hierarchy. The first counter‑intuitive truth is that the problem isn’t the definition you recite — it’s the judgment signal you emit about ownership. Google uses the SLO‑SLI‑Error‑Budget Triangle: SLO (outcome), SLI (signal), Error Budget (tolerance). When you articulate this triangle, you demonstrate that you view reliability as a trade‑off, not a static checklist. The interviewers will probe: “If you had to pick one metric to monitor, which would you choose and why?” Your answer must map the metric (SLI) to the user‑impact goal (SLO) and then explain the acceptable breach rate (error budget).

How do interviewers differentiate between a good SLO answer and a generic one?

A good SLO answer ties the objective to a concrete user‑experience outcome and quantifies the acceptable failure rate, whereas a generic answer recites “99.9% availability” without context. In a recent hiring committee, the senior SRE on the panel asked the candidate to justify a 99.9% SLO for a latency‑sensitive API. The candidate replied, “We need 99.9% because it’s industry standard,” and the committee marked the response as a red flag. The judgment is not about the number you quote — it’s about the reasoning you attach. The not‑X‑but‑Y contrast appears here: not “pick a round number,” but “explain why that number aligns with user expectations and business goals.” Google looks for a “Signal‑to‑Outcome Mapping” where you state the SLI (e.g., 95th‑percentile latency ≤ 100 ms) and then translate that into the SLO (e.g., 99.9% of requests meet the latency target). This mapping should also include a measurable error budget, such as “allow 0.1% of requests to exceed the latency target per month.”

Why does Google probe error‑budget calculations during the SLI discussion?

Google probes error‑budget calculations to verify that you can balance reliability with feature velocity, a core principle of Site Reliability Engineering. In a recent HC (Hiring Committee) debate, the hiring manager cited a candidate who correctly identified the SLI but faltered when asked to compute the monthly error budget, exposing a gap in operational decision‑making. The not‑X‑but‑Y contrast is clear: not “state the error budget,” but “demonstrate how you would allocate the budget across rollout, incident response, and remediation.” The interviewers will often present a scenario: “You have an SLO of 99.9% uptime; how many minutes of downtime per month does that permit?” A correct answer is 43.2 minutes (0.1% of 30 days). Then they ask, “If a new feature risks consuming 20 minutes of that budget, what do you do?” Your judgment must show you can prioritize, perhaps by throttling the feature launch or increasing redundancy. This reveals whether you view the error budget as a strategic lever, not a static safety net.

What script should I use when the interviewer asks me to design an SLO for a new service?

When asked to design an SLO, use a three‑sentence script: (1) Identify the user‑impact goal, (2) Choose the SLI that best reflects that goal, (3) State the SLO and error‑budget tolerance. In a 2023 on‑site loop, the candidate was prompted with, “Design an SLO for a video‑streaming service’s start‑up latency.” The candidate answered, “Our users care about the time to first frame, so the user‑impact goal is sub‑2‑second start‑up. We will measure the 95th‑percentile start‑up latency as the SLI. We set an SLO of 99.5% of sessions meeting the sub‑2‑second target, giving us a monthly error budget of about 65 minutes.” This script earned a “strong” rating because it covered outcome, signal, and tolerance succinctly. The not‑X‑but‑Y contrast emerges: not “list a metric,” but “anchor the metric to a user‑experience story.” Practice this script until it feels natural; interviewers reward clarity and purposeful framing over exhaustive detail.

How many interview rounds will test SLO/SLI knowledge, and what is the timeline?

Three technical phone screens and a four‑day on‑site loop will each contain at least one SLO/SLI question, resulting in four distinct evaluation points over a typical four‑week timeline. In a recent hiring cycle, a candidate progressed through two 45‑minute phone screens where the first focused on system design and the second on reliability fundamentals, including an SLO scenario. The on‑site loop consisted of four 60‑minute interviews: a coding screen, a design deep‑dive, a troubleshooting exercise, and a culture‑fit discussion, each probing the SLO‑SLI‑Error‑Budget Triangle from a different angle. The judgment is that you cannot treat SLO/SLI as a peripheral topic; Google integrates it throughout the interview loop. Not‑X‑but‑Y appears again: not “prepare a single answer,” but “ready multiple lenses on the same framework.” Expect the process to span 28 days from initial phone screen to final decision, with offers typically arriving within five business days after the on‑site.

Building Your Interview Toolkit

  • Review the SLO‑SLI‑Error‑Budget Triangle and be able to draw it on a whiteboard in under two minutes.
  • Memorize the conversion of a 99.9% uptime SLO to minutes of allowed downtime per month (43.2 minutes).
  • Practice the three‑sentence SLO design script with at least three different service examples.
  • Simulate a debrief scenario where a hiring manager pushes back on your SLI choice; rehearse a concise rebuttal that references user‑impact.
  • Work through a structured preparation system (the PM Interview Playbook covers the SLO/SLI framework with real debrief examples, so you can see how interviewers evaluate the signal‑to‑outcome mapping).
  • Schedule a mock interview with a senior SRE who can critique your error‑budget calculations.
  • Prepare a one‑pager that lists three SLIs you have owned, the associated SLOs, and the observed error‑budget consumption over the past six months.

Blind Spots That Sink Candidacies

  • BAD: “I would set the SLO to 99.9% because that’s the industry standard.” GOOD: Tie the 99.9% figure to a concrete user‑impact scenario, e.g., “Our users expect < 100 ms latency 99.9% of the time, which translates to a 43.2‑minute monthly error budget.”
  • BAD: “Our SLI is CPU utilization, so we’ll monitor that.” GOOD: Choose an SLI that directly reflects the user‑experience goal, such as “95th‑percentile request latency,” and explain why CPU alone does not capture the user‑impact.
  • BAD: “The error budget is a safety net we never touch.” GOOD: Demonstrate how you would allocate the error budget across feature rollouts, incident response, and capacity planning, showing strategic trade‑offs.

FAQ

What is the most common SLO mistake candidates make at Google?

Candidates often recite a generic availability percentage without linking it to a user‑impact outcome; the judgment is that the answer must articulate the business goal behind the number.

How should I respond if the interviewer asks for an SLI I have never used?

Admit the gap, then pivot to a comparable metric you have owned, explaining the similarity in how it reflects the same user‑experience dimension.

Do I need to know exact Google internal SLO thresholds to succeed?

No. Google evaluates your reasoning process, not the memorization of internal thresholds; focus on demonstrating the SLO‑SLI‑Error‑Budget framework and your ability to apply it to new services.


Ready to build a real interview prep system?

Get the full PM Interview Prep System →

The book is also available on Amazon Kindle.