AI and Analytics for Business

Updates

Matt Horton of MLB.com Revisits ‘Successful Applications of Customer Analytics’

Matt Horton (MBL.com) and Diny Hurwitz (Milwaukee Brewers) co-presented on What Are the Best Indicators of a Baseball Season Seat Holder Renewing Their Season Seats. You can re-watch the presentation below.

Matt Horton Covers Additional Questions Left in Queue from the Conference

How did W\L record compared to prior years correlate for new season ticket holders (under ~2 years)? How did you account for other external factors like weather/schedule for cross season data?

There were quite a few questions about team performance not explicitly being in the model. It’s important to keep in mind this model isn’t trying to predict renewal rate, it’s prioritizing customers by their likelihood to renew their season seats. If previous season’s winning percentage were put into the model it would be the same value for everyone and therefore not significant in the model.

While winning percentage isn’t explicitly in the model, number of games attended is. This leads us to the conclusion that fans who purchased smaller packages, and fans who purchased large packages and only attended a few games, were less likely to renew their season seats. I believe indirectly this is taking the team’s field performance into account.

Why did you choose SAS for the analysis?

I am most familiar with SAS so it was used for this analysis. This analysis could also have been done in a multitude of other software packages (R, etc.).

Can this model be used for other forms of sport as well? Or are there a lot of customizations you have built in for baseball?

This model was developed specifically for the Brewers. There are differences between clubs and I wouldn’t recommend applying one club’s model to another club.

However, the methodology used to develop this model could be applied to other clubs or even different industries.

How do you include survey data with low response rates?

This is a great question! As with most surveys, we only had responses for about half of the population. For season seat holders who did not answer the survey, we replaced their missing values with a “neutral” answer.

How did you decide what data to use? Did you have hunches about what would be predictive before you started?

We started with data that was readily available (purchase history and demographic data). In each annual iteration of the model, we’ve added additional data sources (ticket usage data, email response data, and survey responses). The model has gotten more powerful (higher lift) and more accurate with the additional data sources added. However, the first model with fewer sources still provided quite a few insights.

Did you experiment with step-wise regression (forward and/or backward)?

With nearly 1,000 predictors step-wise regression wasn’t considered. In addition to the run time a step-wise approach would take, a resulting model wouldn’t be optimized for lift and accuracy, not to mention possible multicollinearity issues. Using Proc Varclus in SAS is a much more efficient way to build a model that is optimized for your desired metrics.