The problem: You're sending the same email to 50,000 people. Some of them have been season ticket holders for a decade. Some signed up last week for a discount. Treating them the same is leaving engagement — and revenue — on the floor.

What I built: A full ML segmentation pipeline for a sports club, starting from raw fan data — demographics, purchase history, ticketing behaviour — through PySpark-based data cleaning and transformation, into a clustering model that actually separated fans into meaningful groups.

Not "high value vs low value." Proper behavioural segments you can run targeted campaigns against.

Results:

  • 35% increase in fan engagement

  • 5TB+ of daily merchandise and ticketing data processed reliably

  • Marketing team went from batch blasting to actually targeted outreach

The model itself isn't magic — it's just applied k-means with sensible feature engineering. The hard part was the pipeline. Getting messy, multi-source fan data into a shape the model could actually learn from. That's where most of these projects die.

Previous
Previous

Retail & Operations Data Engineering — Cost Analysis & ETL at Scale

Next
Next

AI Consulting & Automation — From Prototype to Production