We came, we saw, we hacked!

Last weekend I spent Saturday and Sunday hacking on government data at this year’s BayesImpact‘s hackathon – Bayes Hack 2016! Located at OpenDNS HQ, the event invited teams of Data Scientists, Engineers, Designers, and anyone who is interested in data to hack for 24 hours.

For those unfamiliar with ‘hacking’: the premise is basically to build something in a very very short amount of time. We call it ‘hacking’ because you have to cut corners and write some ugly code to get a product out quickly. It’s different to your normal job but very liberating!


I teamed up with four other Data Scientists from Airbnb and an Engineer and we decided to look at the Department of Labour‘s database on jobs and associated skills, knowledge, education requirements. Our prompt was the following:

Economic landscapes change dramatically, often outpacing a workforce lagging in its adaptation to new opportunities and industries. How can data scientists leverage predictive modeling to close the gap?

What did we build? We broke into two teams and one team built a recommendation engine for users to enter their skills and abilities to get back job suggestions. The second team, which I worked on, built an interactive visualisation for these recommendations to enable users to explore related jobs.


For each of the 954 jobs in the database, we computed the the coordinates of the job in the 35-dimensional space of skills. These skills are include: Reading Comprehension, Active Listening, Writing, etc. For each pair of jobs in this skills vector space, we computed the distance between them using the Kullback-Leibler Divergence to give us a value between 0 and 1. The smaller the distance (divergence), the more similar two jobs are in terms of the skills required to be competent in the jobs. The visualisation was made in Gephi and exported to SigmaJs.

We were one of the 8 finalists on the day but eventually lost out to the fantastic Go Bot Chat team working on the Department of Interior’s database. The project provides parks and recreations recommendations to people using a chatbot service built on top of Facebook’s Messenger.

The weekend was super fun and inspiring to see how much can be done so quickly on so much openly available data. You can see our full source code on Github and all the other projects from the competition there too.


Talking Trust with Kellogg’s MBA class

I had the pleasure of video-conferencing into Kellogg‘s MBA class in Social Media at Northwestern University yesterday. Brayden King kindly invited me to talk about how Airbnb thinks about Trust and the challenges facing sharing economies.


We spoke about the role of Data Science at the company and how it has changed over the years. As the volume of data has grown, we have more often than not moved away from explanatory predictive models to Machine Learning algorithms.

One thing that stood out to me as top of mind for the students in the MBA class was the process of Trust development for first time users. How does a first time guest get accepted by a host on Airbnb? How does a first time host get selected by a guest?

At Airbnb we have a team of highly skilled Data Scientists and Engineers working on matching algorithms designed to help first time guests and hosts. And even more than this, the community are their own best resource. Experienced hosts help new hosts manage their listing and new guests book their first experience.

At the heart of everything data-related we work on at Airbnb is the community and enabling them to make more connections amongst themselves and new users.

Airbnb launches first ever Kaggle competition!

In an exciting new partnership, Airbnb has teamed up with Kaggle to create an online Data Science data challenge. In this challenge we provide historical data on the first country guests book and then ask candidates to predict future first bookings.


Try the challenge yourself! You have until February 11th 2016 to submit your entries. And if you have any questions you can use the forum and I will respond as soon as possible. Good luck and hope you have fun playing with our data!

Can we measure Trust? Stanford thinks we can…


Along with a team of Stanford University Sociologists led by Karen Cook and Paolo Parigi, I am conducting a study on behalf of Airbnb to understand the social consequences of sharing goods and services with strangers.

Karen has published multiple books on the formation of Trust in modern societies and more recently on the role of Trust in the online world. Paolo is also interested in social networks and has conducted previous studies of Trust in the sharing economy.

Together we will be surveying Airbnb members to better understand Trust inside and outside of the sharing economy, as well as what drives changes in Trust. Stay tuned for more!

The trees have confidence

My latest Machine Learning blog post Confidence Splitting Criterions Can Improve Precision And Recall in Random Forest Classifiers is out on the Airbnb Data blog:


The Trust and Safety Team maintains a number of models for predicting and detecting fraudulent online and offline behaviour. A common challenge we face is attaining high confidence in the identification of fraudulent actions. Both in terms of classifying a fraudulent action as a fraudulent action (recall) and not classifying a good action as a fraudulent action (precision).

A classification model we often use is a Random Forest Classifier (RFC). However, by adjusting the logic of this algorithm slightly, so that we look for high confidence regions of classification, we can significantly improve the recall and precision of the classifier’s predictions. To do this we introduce a new splitting criterion (explained below) and show experimentally that it can enable more accurate fraud detection.

Have a read and let me know what you think!