DiDi data scientist interview questions 2026
TL;DR
DiDi’s data scientist interviews prioritize applied causal inference, ride-pooling simulation logic, and product-aware analytics over generic machine learning. Candidates fail not from technical weakness but from misaligned framing — treating problems as academic when DiDi evaluates product-impact tradeoffs. You will face 4–5 rounds, including a take-home case on supply-demand rebalancing, and a live coding session in Python focused on real-time decision systems.
Who This Is For
This is for experienced data scientists with 2–5 years in marketplace platforms, logistics, or mobility who are targeting mid-to-senior roles at DiDi and have already cleared HR screens. If you’ve worked on dispatch algorithms, dynamic pricing, or fleet utilization but haven’t internalized how DiDi measures incremental lift in driver retention or matched rate, you are not ready.
What kind of technical questions does DiDi ask in data scientist interviews?
DiDi asks technical questions rooted in marketplace dynamics, not textbook statistics. In a Q3 2025 debrief, a candidate correctly calculated p-values for an A/B test but lost points for not adjusting for network effects — a dealbreaker. The problem wasn’t the math; it was the assumption of independence in a two-sided platform where driver behavior affects rider wait times and vice versa.
Not precision, but robustness. DiDi doesn’t care if your confidence interval is exact if you ignore spillover effects between regions. One candidate proposed a clustered bootstrap for city-level geo-experiments — that moved them to hire committee. Another used standard t-tests on pooled ride data and was rejected despite flawless code.
Expect Python coding on time-series filtering, spatial joins (e.g., matching drivers to zones), and simulating dispatch logic under latency constraints. You’ll write functions to compute ETA bias or rejection cascades, not build Random Forests. In a live coding round, one candidate was asked to simulate a surge trigger system with hysteresis — that’s the bar.
The framework isn’t CRISP-DM; it’s “measure → model → intervene → monitor.” Your answer must link analysis to operational impact. Saying “we should run an A/B test” is table stakes. Saying “we should hold back 5% of drivers in high-defection zones to measure counterfactual retention” shows judgment.
How does DiDi evaluate product sense in data scientist interviews?
Product sense at DiDi means quantifying tradeoffs in real-time systems, not listing features. In a 2024 HC debate, a candidate described improving ETA accuracy by 10% but didn’t address how that affected driver acceptance rates — the committee killed the offer. The silent rule: every metric improvement must be weighed against driver utility.
Not features, but friction. DiDi PMs and DSs co-own matched rate, not NPS. One candidate was given a scenario: “Riders complain about long pickup times. What would you measure?” The strong answer started with “Define the bottleneck: is it driver supply, dispatch inefficiency, or rider drop-off after booking?” The weak answer jumped to “build a better ETA model.”
You’ll be asked to design metrics for new products like shared cargo delivery or inter-city pooling. A 2025 case involved balancing driver earnings per hour against rider cancellation rates in a new subscription tier. The winning candidate built a multi-objective function with hard constraints on minimum driver income — that’s the expectation.
DiDi doesn’t want insights; it wants guardrails. When asked about reducing no-shows, one candidate proposed a penalty system. Another proposed modeling driver location drift during trip acceptance delay. The second got hired — because they treated drivers as constrained agents, not data points.
What’s on the DiDi data scientist take-home assignment?
The take-home is a 72-hour case on rebalancing idle drivers across urban zones, based on historical trip data and weather disruptions. It includes GPS traces, driver session logs, and service-level metrics. You must write SQL to extract patterns, build a simple simulation, and recommend a policy — all documented in a 2-page memo.
Not completeness, but clarity. Hiring managers scan for three things: whether you identified the right KPI (e.g., minutes-to-match, not utilization), how you handled missing session ends (drivers going offline without logging out), and if your policy scales under peak load. One candidate imputed session ends using kernel density — praised in debrief. Another assumed all missing ends were 8-hour shifts — failed.
Code must be production-adjacent. No Jupyter cells with unvectorized loops. DiDi expects PEP8-compliant scripts with error handling for null coordinates. In Q2 2025, a candidate used pandas .apply() for Haversine distance — rejected. Another used vectorized NumPy with pre-filtering for city boundaries — advanced to onsite.
The memo is more important than the model. One candidate submitted a complex RL agent but buried the business impact in Appendix C — hiring manager said “didn’t read it.” The hire wrote: “Policy X reduces average wait time by 1.2 minutes but increases driver idle time by 8%. We recommend pilot in Zone 7 due to low baseline churn.” That’s the standard.
How important is coding in the DiDi data scientist interview?
Coding is evaluated on operational realism, not algorithmic complexity. You will write Python functions to process streaming ride events under memory and latency constraints, not solve Leetcode. In a 2024 round, a candidate wrote a perfectly correct KD-tree for nearest-driver search — rejected because DiDi uses grid hashing in production.
Not correctness, but context. DiDi’s backend runs on C++ and PySpark, but interviews use Python for accessibility. You must know when to use vectorization, when to chunk data, and how to handle out-of-order events. One candidate used sorted() on a 10M-row driver log — rejected. Another used heapq.nsmallest with a rolling time window — moved forward.
Expect to implement core logic from DiDi’s systems: surge multipliers, cancel-scorers, or re-dispatch handlers. In a live session, a candidate was asked to write a function that flags “ghost trips” — bookings with no subsequent movement. The strong answer used velocity thresholds and GPS error ellipses. The weak answer used distance between pickup and first ping — too naive.
You will not be asked to build neural nets. You will be asked to simulate how a change in dispatch radius affects driver queuing. Code must include edge cases: what if a driver crosses zone boundaries mid-match? What if GPS signal drops for 30 seconds? These aren’t follow-ups — they’re part of the initial spec.
How does the onsite interview loop work at DiDi?
The onsite consists of 4 rounds over 5 hours: (1) technical deep dive on past projects, (2) live coding on ride data, (3) product case with a PM, and (4) values alignment with a senior leader. Each round is 60 minutes, with 15-minute breaks. Interviews start at 9:30 AM Beijing time — punctuality is silently scored.
Not depth, but consistency. In a 2025 debrief, a candidate gave a strong technical answer but contradicted their take-home memo on driver churn assumptions — red flag. The committee assumes either poor documentation or weak conviction. Another candidate referred back to their memo’s assumptions unprompted — praised for operational rigor.
The values round isn’t cultural fit — it’s decision-making under ambiguity. One candidate was asked: “Your model shows a new feature increases rider satisfaction but decreases driver earnings. What do you do?” The answer “I’d recommend launching with driver补贴 (subsidies)” passed. “I’d let the PM decide” failed — DSs at DiDi own tradeoffs.
Hiring managers review your calendar invites and note if you requested English or Chinese interviews. Bilingual fluency isn’t required, but opting into Chinese signals commitment. In one case, a candidate scheduled all interviews in English despite listing Mandarin as fluent — hiring manager noted “lack of immersion” in feedback.
Preparation Checklist
- Redo 2–3 past projects with a focus on marketplace metrics: matched rate, CPS (completed trips per session), driver idle time
- Practice writing SQL queries that handle sessionization with gaps (e.g., drivers going offline mid-shift)
- Simulate dispatch logic in Python: find nearest available driver, apply surge rules, handle cancellations
- Study DiDi’s public tech blog — especially posts on ETA optimization and dynamic pricing from 2023–2025
- Work through a structured preparation system (the PM Interview Playbook covers DiDi-specific tradeoff frameworks with real debrief examples)
- Build a small simulation of driver rebalancing using real OSM data and synthetic trip logs
- Prepare 2–3 stories where you influenced product decisions using data, focusing on driver-side impact
Mistakes to Avoid
- BAD: Building a machine learning model in the take-home when the problem only required descriptive analytics. One candidate trained an LSTM on pickup times — hiring manager wrote “overkill, missed the point.”
- GOOD: Using rolling averages and spatial heatmaps to identify chronic supply shortages, then proposing targeted driver bonuses — that’s what DiDi ships.
- BAD: Answering “How would you improve matching?” with “Better algorithms.” Vague and table stakes.
- GOOD: “Reduce dispatch radius in high-density zones during peak to cut ETA, but monitor driver acceptance rate — if it drops >5%, revert.” Shows tradeoff thinking.
- BAD: Saying “We should A/B test everything” without specifying holdback strategy or spillover controls.
- GOOD: “Randomize at city-cluster level, monitor spillover via border zone wait times, and use CUPED with driver session length as covariate.” That’s DiDi-grade rigor.
FAQ
What is the salary range for a data scientist at DiDi in 2026?
Senior data scientists in Beijing earn 600,000–850,000 RMB annually, including cash and restricted stock. Level matters: L6 starts at 600K, L7 at 750K. Offers below 600K are typically for L5 or non-Beijing roles. Signing bonuses exist but are rare — compensation is frontloaded in base and stock.
How long does the DiDi data scientist interview process take?
From resume submit to offer, 18–25 days. HR screen (2 days), take-home (3 days to complete, 4 days to grade), onsite scheduling (3–5 days), onsite (1 day), HC decision (3–7 days). Delays usually happen in calendar coordination or stock approval, not evaluation.
Do DiDi data scientists need to know deep learning?
No. Deep learning is used in specific teams (e.g., autonomous driving, NLP for support), but core marketplace roles rely on causal inference, simulation, and statistical modeling. Knowing how to train a transformer won’t help. Knowing how to estimate LATE (Local Average Treatment Effect) in a non-compliant A/B test will.
Ready to build a real interview prep system?
Get the full PM Interview Prep System →
The book is also available on Amazon Kindle.