I am a huge football (I think they call it ‘soccer’ in the US) fan and the football world cup every four years is an absolute feast for enthusiasts like me. Strangely though, despite all the progress in most industries of ‘big data’, football usually keeps only one statistic up on screen: goals scored by each team.
With three matches per day for the first two weeks of the four week tournament, the six hour time difference between San Francisco and Brazil, and having recently started at Airbnb, it was going to be impossible to keep up with every game all the time without some help.
So I wrote myself a twitter bot in Matlab called Scotty Stats to tweet the scores and probabilities of each side winning the match. It output the pairs of tweets, one for the home team and one for the away team. I also included a 🙂 or 😦 at the end if the probability of winning was particularly high or low respectively!
Ideally I would have collected the predictive data during the game e.g. number of shots on goal so far, past history in meetings, injuries before this game to key players, possession in game so far, and built a model to predict the probability of winning. But this would have been very time intensive and probably also cost dollars to get rich enough data.
So I did what every good hack does and re-used other people’s hard work. Who has live scores and probabilities of winning? – The bookies of course! Every few minutes I would have my Matlab program go to my favourite sports betting website, e.g. Betfair or Sporting Index, and fetch the time gone, the latest score, and the latest odds of either side winning. Transforming the betting odds to a probability is trivial. And although the betting odds have a profit premium built in, they were close enough for my purposes to the fair odds.
The twitter bot is not running anymore but it served me well during the World Cup. Although the England team crashed out early and the Germans won (again) so it didn’t all go to plan!