
- Parallelised ETL on 11 M+ MTA entries/exits with Spark SQL & DataFrames.
- Engineered temporal & weather features; trained SparkML regressors (GBT, RF).
- Achieved R² = 0.62 for 30‑minute ahead crowd prediction across all stations.
- Enabled real‑time dashboards to surface hotspots for commuters.