Uber data scientist SQL and coding interview 2026

Uber Data Scientist SQL and Coding Interview 2026

TL;DR

Uber’s data scientist role demands mastery of real-time SQL, complex joins, and Python coding under time pressure. The technical bar is high, especially in the coding rounds, where candidates often fail on window functions and edge case handling. Compensation ranges from $131,000 to $252,000 base, but only those who demonstrate production-grade logic survive the final review.

Who This Is For

This guide is for mid-level data scientists with 2–5 years of experience applying to Uber’s Core Platform, Marketplace, or Risk teams. You’ve written SQL daily and used Python for analysis, but you haven’t yet passed Uber’s coding screen. You need to know not just what questions are asked, but how decisions are made in the hiring committee.

What does Uber’s data scientist coding interview actually test?

Uber doesn’t test academic SQL—it tests production logic. In a Q3 2024 debrief, a candidate wrote a correct query but was rejected because they used a WHERE clause instead of a HAVING to filter post-aggregation results. The hiring manager said, “They know syntax, but not data integrity in live systems.”

The real test is judgment under constraints. Uber runs real-time pricing, fraud detection, and dispatch systems. Your code must reflect awareness of scale, latency, and edge cases. Not “can you write a join,” but “do you know when a join will blow up the executor?”

Most candidates prepare with LeetCode easy-medium problems. That’s not the issue. The issue is they treat SQL like a puzzle, not an operational tool. At Uber, a malformed CTE or missing index consideration can cascade into service degradation. The interview simulates that pressure.

Judgment signals matter more than correctness. In one instance, a candidate admitted they weren’t sure about the partition order in a window function and asked for clarification. That transparency scored higher than a silent, incorrect solution. Uber values engineers who know their limits.

Not puzzle-solving, but systems thinking.

Not syntax recall, but optimization instinct.

Not clean data assumptions, but null-handling rigor.

How is the SQL round structured at Uber in 2026?

The SQL interview is 45 minutes, conducted live via CoderPad or similar. You’re given one main problem with 2–3 follow-ups that increase in complexity. The first part is usually a join or aggregation. The second adds time-series logic. The third forces optimization or edge-case resolution.

In a recent interview for the Uber Eats team, the prompt was: “Find the top 3 restaurants by completed orders last week, then show the change from the prior week.” Simple on surface. But the schema included soft deletes, timezone mismatches, and promo orders that needed filtering.

Candidates typically spend 15 minutes on the base query. Strong performers reserve 10 minutes for assumptions and validation. Weak ones dive in without clarifying if “completed” means status = ‘delivered’ or includes ‘picked_up’.

One HC member noted: “The difference between L4 and L5 isn’t query speed—it’s whether they check for duplicates in the order_events table before joining.” That’s the hidden filter. Uber’s data is messy. Your code must assume it.

You’re expected to:

Clarify definitions (What is “last week”? UTC or local?)
Handle nulls and duplicates
Use CTEs or subqueries appropriately
Optimize for readability and execution

Syntax errors are forgivable. Logical gaps are not.

Not writing comments is acceptable. Not questioning data quality is fatal.

Not knowing a function is fine. Not knowing when to use it is disqualifying.

What kind of Python coding questions does Uber ask?

Python rounds focus on data manipulation and algorithmic efficiency. You’ll use Pandas and built-in libraries—not scikit-learn or ML frameworks. Uber wants to see how you handle real data, not theoretical models.

A typical question: “Given a list of driver GPS pings, calculate average speed between consecutive points and flag any above 120 km/h.” You must parse timestamps, compute delta time, handle invalid coordinates, and manage floating-point precision.

Candidates fail here by writing nested loops. Uber runs this at scale. They want vectorized operations. In a debrief, a hiring manager said, “If they use iterrows(), we stop the clock. That’s a production red flag.”

Another common trap: not defining functions. One candidate solved the problem in a single block of code. When asked to modify it for different thresholds, they had to rewrite everything. The bar is modular, reusable logic.

You’re not expected to memorize syntax. You can ask for method names. But you must know:

Vectorization vs iteration
Memory implications of .copy()
How to handle mixed data types in Series
When to use groupby().apply() vs transform()

Not coding speed, but code quality.

Not library depth, but performance awareness.

Not output correctness, but scalability anticipation.

How do Uber’s coding interviews differ by team?

The Core Platform team asks the most complex SQL. Queries often involve event streams, sessionization, or funnel drop-offs across microservices. For example: “Identify users who abandoned the signup flow after OTP but before email verification, then re-engaged via push notification within 7 days.”

Marketplace (Rides, Eats) focuses on time-series and metrics. Expect rolling averages, YoY growth, and cohort retention. One L5 candidate was asked to compute “driver utilization rate” using trip start/end times and handle overlapping rides. The debate in the HC was whether they accounted for timezone shifts during daylight saving.

Risk and Fraud teams test anomaly detection logic. A real prompt: “Given transaction timestamps and amounts, flag sequences where amount increases monotonically over 5 events in 10 minutes.” The catch? You must avoid using external libraries. Solution requires sliding windows and state tracking.

In a hiring committee meeting, the Risk lead rejected a candidate who used pandas.DataFrame.rolling() but didn’t consider memory usage on large user histories. “We process millions of sequences. That approach doesn’t scale,” they said.

Not all teams require the same depth.

Not all roles test the same functions.

Not all interviews reward the same style.

Generalists who prepare only for joins fail in Risk. Specialists who ignore time zones fail in Marketplace. The mismatch isn’t skill—it’s context blindness.

Preparation Checklist

Solve 30+ real Uber SQL questions from Glassdoor and LeetCode, focusing on time-series and self-joins
Practice writing Python without autocomplete—use a barebones editor to simulate the interview environment
Build a cheat sheet for window functions: LAG(), LEAD(), ROW_NUMBER(), and when to use each
Run timed mocks with ambiguous prompts—force yourself to ask clarifying questions before coding
Work through a structured preparation system (the PM Interview Playbook covers Uber-specific SQL patterns with real debrief examples from 2024–2025 cycles)
Review Levels.fyi salary data to calibrate expectations and understand level-based bar shifts
Record yourself explaining your code out loud—communication is scored in the final review

Mistakes to Avoid

BAD: Writing a SQL query without declaring assumptions. One candidate joined rides and drivers on driver_id but didn’t check if the join was 1:1 or 1:many. The data had duplicates due to role changes. Their counts were off by 200%. They were rejected despite “correct” syntax.

GOOD: Stating assumptions upfront: “I’m assuming driverid is unique in the drivers table. If not, I’d dedupe using effectivedate.” This signals data maturity.

BAD: Using ORDER BY in a subquery without LIMIT. In PostgreSQL, this is allowed but meaningless. In Uber’s Redshift cluster, it can cause planner issues. A candidate did this and couldn’t explain why it was problematic. The HC said, “They don’t understand execution order.”

GOOD: Explaining why you’re using a CTE instead of a subquery: “CTEs improve readability and allow intermediate validation—critical when debugging in production.”

BAD: Hardcoding dates like '2025-01-01'. Uber’s data is global and dynamic. One candidate used a fixed date range and was asked what happens when the query runs next month. They had no answer.

GOOD: Using relative dates: “WHERE requesttime >= CURRENTDATE - INTERVAL '7 days'” — this shows awareness of operational use.

FAQ

What’s the salary for an Uber data scientist in 2026?

Base salaries range from $131,000 for entry-level to $252,000 for senior roles, per Levels.fyi. The number depends on level (L4–L6), team, and stock refreshers. Total comp can exceed $400,000 at L5+, but coding performance directly impacts leveling. Underperform in the technical round, and you’ll be offered at a lower band.

Do Uber data scientist interviews include live coding?

Yes. All coding rounds are live, proctored, and use real-time editors. You’ll write SQL and Python from scratch. Pre-written code is not allowed. Interviewers assess not just output but approach—how you debug, structure, and communicate. Silence is penalized. One candidate was strong technically but said nothing while coding. The HC noted, “We can’t assess judgment if we don’t hear reasoning.”

How long does the Uber data scientist interview process take?

The process takes 2–4 weeks from recruiter call to decision. It includes a 30-minute recruiter screen, one coding round (60 minutes), one behavioral round (45 minutes), and one domain round (45 minutes). The coding round is the highest attrition point. Candidates who pass move to the hiring committee, where 30% are downleveled or rejected based on code quality and system thinking.

Ready to build a real interview prep system?

Get the full PM Interview Prep System →

The book is also available on Amazon Kindle.