Spotify Data Scientist Interview SQL Questions: What You’ll Actually Face
TL;DR
Spotify data scientist interviews prioritize SQL questions that test real-world query design under ambiguity, not syntax memorization. The evaluation hinges on clarity of assumptions, scalability of logic, and business alignment — not just correctness. Candidates who fail do so because they rush to write code without framing the problem, not because they lack technical skill.
Who This Is For
This is for data scientists with 1–5 years of experience applying to mid-level roles at Spotify, particularly those transitioning from non-music or non-platform companies. If you’ve practiced LeetCode-style SQL problems but haven’t worked through cohort analysis, funnel drop-offs, or A/B test data modeling, you’re unprepared. The bar isn’t raw coding speed — it’s structured thinking under incomplete requirements.
How Hard Are Spotify Data Scientist SQL Questions?
Spotify’s SQL bar matches Level 5 at Amazon or L4 at Meta: difficult, but not academic. The challenge isn’t nested CTEs or obscure window functions — it’s the open-ended nature. In a Q3 2023 debrief, a candidate solved a retention query perfectly but was rejected because they assumed a 7-day retention window without asking. The hiring committee noted: “The answer wasn’t wrong. The judgment was.”
Spotify leans on behavioral data from streaming logs — think play events, skips, session lengths, follow actions. You’ll likely face questions around user engagement, churn prediction, or campaign impact. One interviewer described a live case: “We gave a schema with plays, podcasts, and user profiles. The candidate had to define ‘inactive user’ themselves. That’s the test.”
Not syntax recall, but scoping: Spotify doesn’t care if you mix up ROW_NUMBER() and RANK() — they care if you know when to use either.
Not query optimization, but clarity: Writing a slow query with clean logic beats a fast one with magic numbers and no aliasing.
Not perfection, but iteration: Interviewers expect you to refine your approach. Silence is worse than backtracking.
In 12 real debriefs reviewed, zero candidates were dinged for missing a JOIN. Three were rejected for failing to define metrics before coding.
What Types of SQL Problems Does Spotify Ask?
Spotify focuses on four problem types: funnel analysis, cohort retention, A/B test evaluation, and behavioral segmentation. Each maps to actual work done by data scientists on teams like Listener Insights, Creator Analytics, or Growth.
Funnel questions dominate. Example: “Calculate the conversion rate from account creation to first play.” This seems trivial — until you realize “first play” could mean any track, premium content, or non-podcast audio. In a debrief last November, the hiring manager pushed back because the candidate used MAX(play_timestamp) instead of MIN. “They aggregated wrong,” he said. “But worse — they didn’t state their assumption.”
Cohort retention is second most common. You might get: “Measure 30-day retention for users who signed up in January.” The trap? Defining “retained.” Is it one play? Five minutes of streaming? Spotify internal docs define it as “at least one session >60 seconds,” but you won’t be told that. The interviewer wants you to ask.
A/B test questions test causal logic. Example: “Users in group A saw a new homepage. Group B didn’t. Did engagement increase?” The real test isn’t writing the query — it’s identifying unit of analysis (user vs. session), handling multiple observations, and checking for balance in pre-period behavior.
Segmentation problems look like: “Find power users who listen to >10 hours/week and follow 50+ artists.” These test JOIN efficiency and filtering order. Do you filter users before aggregating playtime? Or after? One candidate lost points for GROUP BY before WHERE, creating a Cartesian product on 10M rows.
Not abstract logic, but product context: Every query ties back to a decision.
Not isolated queries, but narrative flow: You’re expected to link metric definition to business outcome.
Not standalone answers, but trade-off discussion: Indexing costs vs. query speed, precision vs. latency.
How Is the SQL Round Structured at Spotify?
The SQL interview is typically 45 minutes, part of a 3-round onsite (or virtual onsite). It follows a behavioral round and precedes a case study. You’ll get one main question, sometimes with a follow-up. Code is written in a shared editor like CoderPad — no syntax highlighting.
You’re given a simplified schema: users, plays, follows, experiments. Tables include playid, userid, trackid, timestamp, sessionid, eventtype, ispremium, country. The schema is incomplete by design. In a February interview, a candidate asked if “play” meant playback started or reached 30 seconds. That question alone raised their rating from “lean no” to “yes.”
Interviewers use a rubric with four categories:
- Assumption articulation (25%)
- Query correctness (30%)
- Code readability (20%)
- Business alignment (25%)
In HC discussions, readability often swings decisions. One candidate wrote a correct query using subqueries instead of CTEs. The interviewer noted: “Hard to follow. Would block code review.” That comment tanked their score.
You won’t be asked to optimize for production. But you will be asked: “How would this scale at 500M users?” The expected answer isn’t “add indexes” — it’s “pre-aggregate daily metrics in a summary table” or “use approximate COUNT(DISTINCT) with HyperLogLog.”
Not a test of memory, but of communication: Silence kills.
Not a race, but a dialogue: Interviewers adjust hints based on your pacing.
Not code-only, but logic-first: You can talk through the solution before typing.
How Should You Explain Your SQL Thinking Out Loud?
Silence is the fastest path to rejection. In a debrief, an HC member said: “We can’t score what we can’t see. If they’re coding silently for 3 minutes, we assume they’re stuck — even if they’re not.” Spotify uses a “think-aloud” protocol similar to Google’s.
Start with metric definition. Say: “I’ll define ‘engagement’ as sessions per user per week. Is that aligned with your intent?” This does two things: it shows structure, and it invites correction. In one interview, the candidate proposed using “time played” instead of “number of plays.” The interviewer confirmed that was better — and that exchange became a positive data point.
Break the problem into steps:
- Clarify the goal
- Define key terms
- Sketch the output table
- List required tables and joins
- Write the query incrementally
One candidate said: “First, I’ll get all users who signed up in the last 30 days. Then, I’ll join with play events. Then, I’ll count unique sessions. Finally, I’ll group by country.” That verbal framework earned top marks — even though their final query missed a WHERE clause.
Avoid: “So… I’ll start with SELECT FROM plays…”
Use: “I need play events filtered to the last 7 days, so I’ll start with a WHERE clause on timestamp.”
Not performance under pressure, but transparency in process: Spotify hires for collaboration.
Not brilliance in isolation, but clarity in expression: Code is secondary to communication.
Not speed, but precision: A slow, clear candidate beats a fast, quiet one.
Preparation Checklist
- Practice defining metrics before writing code: For any problem, write a one-sentence definition of the KPI first.
- Build familiarity with music streaming data models: Understand how plays, skips, follows, and sessions relate.
- Simulate ambiguity: Have a peer give you incomplete prompts and force you to ask clarifying questions.
- Use real Spotify schemas from public case studies: Reverse-engineer their data flow from product updates.
- Work through a structured preparation system (the PM Interview Playbook covers behavioral analytics and funnel SQL with real debrief examples).
- Time yourself — but prioritize completeness over speed: Aim for 30 minutes per problem, then review for clarity.
- Record yourself thinking aloud: Playback to check if your logic is audible, not just internal.
Mistakes to Avoid
- BAD: Writing a full query without defining the metric. One candidate calculated “average plays per session” when asked for “percentage of users who play daily.” The code was flawless — the intent was wrong. Result: rejection.
- GOOD: Starting with: “To calculate ‘daily active users,’ I’ll define a user as active if they have at least one play event in a 24-hour period. Should I include podcasts?”
- BAD: Using SELECT in joins. In a test case, a candidate JOINed users and plays with SELECT *, creating a 200-column output. The interviewer commented: “This wouldn’t pass linter rules.”
- GOOD: Explicitly naming needed columns: SELECT u.userid, p.playtimestamp, p.track_id.
- BAD: Ignoring scalability. A candidate used a correlated subquery to find each user’s first play. It worked for 10K rows — but the interviewer asked: “How would this run on 2 billion plays?” They had no answer.
- GOOD: Saying: “This subquery works for small data. For scale, I’d precompute firstplaydate in a materialized view.”
FAQ
What level of SQL is expected for a Spotify data scientist?
Expect intermediate-to-advanced applied SQL, not theoretical knowledge. You must handle JOINs, window functions, filtering, aggregation, and CTEs — but the real test is structuring solutions under ambiguity, not syntax mastery. Fluency matters less than clarity of approach.
Do Spotify data scientist interviews use live SQL coding?
Yes. You’ll write SQL in a shared editor like CoderPad or Google Docs during a 45-minute session. The schema is provided, but intentionally incomplete. Interviewers assess how you handle gaps, not just final output. Expect to talk while coding — silence is interpreted as lack of communication.
Are Spotify SQL questions based on real business cases?
Yes. Questions mirror actual work: measuring retention, analyzing A/B tests, tracking user growth. One 2023 question asked to calculate the impact of a new playlist recommendation feature — identical to a Q2 project on the Discovery team. Use Spotify’s public blog posts to anticipate scenarios.
Ready to build a real interview prep system?
Get the full PM Interview Prep System →
The book is also available on Amazon Kindle.