Can judges predict whether a criminal will reoffend?

A colleague forwarded me a TED talk by former attorney general of New Jersey Anna Milgram that argues for the use of statistics and data science in the legal system.

Frustrated by the lack of data in the judicial system to measure and understand the level of crime and the impact of new policies, Anna built a team of data scientists to aggregate crime data and eventually build a predictive model for re-offense rates. Her hope is that this can be used by judges throughout America to better inform their decisions.

This is another example of the power of data and statistics for predicting human behaviour, something that I am very interested in also and actively work on at Airbnb. With current tools and data I would say it is more of a data art than data science, but the hope is that at least very typical behaviour may be accurately modelled.

Send me any other cool talks or stuff you have read on this topic.


Tweet yourself sports stats

I am a huge football (I think they call it ‘soccer’ in the US) fan and the football world cup every four years is an absolute feast for enthusiasts like me. Strangely though, despite all the progress in most industries of ‘big data’, football usually keeps only one statistic up on screen: goals scored by each team.

With three matches per day for the first two weeks of the four week tournament, the six hour time difference between San Francisco and Brazil, and having recently started at Airbnb, it was going to be impossible to keep up with every game all the time without some help.

scottystatsSo I wrote myself a twitter bot in Matlab called Scotty Stats to tweet the scores and probabilities of each side winning the match. It output the pairs of tweets, one for the home team and one for the away team. I also included a 🙂 or 😦 at the end if the probability of winning was particularly high or low respectively!

Ideally I would have collected the predictive data during the game e.g. number of shots on goal so far, past history in meetings, injuries before this game to key players, possession in game so far, and built a model to predict the probability of winning. But this would have been very time intensive and probably also cost dollars to get rich enough data.

So I did what every good hack does and re-used other people’s hard work. Who has live scores and probabilities of winning? – The bookies of course! Every few minutes I would have my Matlab program go to my favourite sports betting website, e.g. Betfair or Sporting Index, and fetch the time gone, the latest score, and the latest odds of either side winning. Transforming the betting odds to a probability is trivial. And although the betting odds have a profit premium built in, they were close enough for my purposes to the fair odds.

The twitter bot is not running anymore but it served me well during the World Cup. Although the England team crashed out early and the Germans won (again) so it didn’t all go to plan!