The problem: You're sending the same email to 50,000 people. Some of them have been season ticket holders for a decade. Some signed up last week for a discount. Treating them the same is leaving engagement — and revenue — on the floor.
What I built: A full ML segmentation pipeline for a sports club, starting from raw fan data — demographics, purchase history, ticketing behaviour — through PySpark-based data cleaning and transformation, into a clustering model that actually separated fans into meaningful groups.
Not "high value vs low value." Proper behavioural segments you can run targeted campaigns against.
Results:
35% increase in fan engagement
5TB+ of daily merchandise and ticketing data processed reliably
Marketing team went from batch blasting to actually targeted outreach
The model itself isn't magic — it's just applied k-means with sensible feature engineering. The hard part was the pipeline. Getting messy, multi-source fan data into a shape the model could actually learn from. That's where most of these projects die.