Research Question

How does extreme precipitation affect rideshare demand across New York City neighborhoods, and do lower-income communities experience differential recovery trajectories?

Approach

A Difference-in-Differences causal framework treating extreme precipitation events as a natural experiment. The model estimates treatment effects using zone and time fixed effects, comparing trip volumes on weather-shock days against otherwise similar non-shock days within each taxi zone.

Data Pipeline

  • 100M+ NYC TLC HVFHV trip records (Parquet, Polars) across 262 NYC taxi zones
  • NOAA hourly precipitation observations merged at the zone-event level
  • Automated pipeline: raw loading → datetime normalization → weather merge → feature engineering
  • Calendar features (hour, day-of-week, month) and lagged demand (1-7 day lags) as controls

Key Results

  • Causal effect: Statistically significant +97 trip/zone/day demand increase on extreme precipitation days (p < 0.0001)
  • Demand forecasting: XGBoost/LightGBM models achieved R² = 0.96 with time-series cross-validation
  • Top predictors: Lagged demand (1-day and 7-day) and precipitation intensity identified via SHAP feature importance
  • Equity analysis: Stratified by neighborhood income quintile with Mann-Whitney U testing for differential weather shock recovery

Tech Stack

Python, Polars, Pandas, GeoPandas, NumPy, SciPy, Statsmodels, Scikit-learn, XGBoost, LightGBM, SHAP, Matplotlib, NOAA API, NYC TLC data. Build tooling via uv + pyproject.toml.