Mobility Resilience Under Extreme Weather
Research Question
How does extreme precipitation affect rideshare demand across New York City neighborhoods, and do lower-income communities experience differential recovery trajectories?
Approach
A Difference-in-Differences causal framework treating extreme precipitation events as a natural experiment. The model estimates treatment effects using zone and time fixed effects, comparing trip volumes on weather-shock days against otherwise similar non-shock days within each taxi zone.
Data Pipeline
- 100M+ NYC TLC HVFHV trip records (Parquet, Polars) across 262 NYC taxi zones
- NOAA hourly precipitation observations merged at the zone-event level
- Automated pipeline: raw loading → datetime normalization → weather merge → feature engineering
- Calendar features (hour, day-of-week, month) and lagged demand (1-7 day lags) as controls
Key Results
- Causal effect: Statistically significant +97 trip/zone/day demand increase on extreme precipitation days (p < 0.0001)
- Demand forecasting: XGBoost/LightGBM models achieved R² = 0.96 with time-series cross-validation
- Top predictors: Lagged demand (1-day and 7-day) and precipitation intensity identified via SHAP feature importance
- Equity analysis: Stratified by neighborhood income quintile with Mann-Whitney U testing for differential weather shock recovery
Tech Stack
Python, Polars, Pandas, GeoPandas, NumPy, SciPy, Statsmodels, Scikit-learn, XGBoost, LightGBM, SHAP, Matplotlib, NOAA API, NYC TLC data. Build tooling via uv + pyproject.toml.