DJ Patil wants Silicon Valley to work on real problems

I was fortunate enough to attend a Q&A with United States Chief Data Scientist DJ Patil at the Commonwealth Club last week. DJ was keen to stress the challenges facing the US government and the Big Data available to help solve these problems. But noted that the talent and progress we see in technology is not being applied to ‘real’ problems.


DJ gave examples from Law Enforcement and Health Care amongst as areas that are ripe for disruption by data and technology. He also stressed that much public data is readily available online, both at the local and national level – and he invited the Data Scientists in the audience to start hacking for social solutions!


We came, we saw, we hacked!

Last weekend I spent Saturday and Sunday hacking on government data at this year’s BayesImpact‘s hackathon – Bayes Hack 2016! Located at OpenDNS HQ, the event invited teams of Data Scientists, Engineers, Designers, and anyone who is interested in data to hack for 24 hours.

For those unfamiliar with ‘hacking’: the premise is basically to build something in a very very short amount of time. We call it ‘hacking’ because you have to cut corners and write some ugly code to get a product out quickly. It’s different to your normal job but very liberating!


I teamed up with four other Data Scientists from Airbnb and an Engineer and we decided to look at the Department of Labour‘s database on jobs and associated skills, knowledge, education requirements. Our prompt was the following:

Economic landscapes change dramatically, often outpacing a workforce lagging in its adaptation to new opportunities and industries. How can data scientists leverage predictive modeling to close the gap?

What did we build? We broke into two teams and one team built a recommendation engine for users to enter their skills and abilities to get back job suggestions. The second team, which I worked on, built an interactive visualisation for these recommendations to enable users to explore related jobs.


For each of the 954 jobs in the database, we computed the the coordinates of the job in the 35-dimensional space of skills. These skills are include: Reading Comprehension, Active Listening, Writing, etc. For each pair of jobs in this skills vector space, we computed the distance between them using the Kullback-Leibler Divergence to give us a value between 0 and 1. The smaller the distance (divergence), the more similar two jobs are in terms of the skills required to be competent in the jobs. The visualisation was made in Gephi and exported to SigmaJs.

We were one of the 8 finalists on the day but eventually lost out to the fantastic Go Bot Chat team working on the Department of Interior’s database. The project provides parks and recreations recommendations to people using a chatbot service built on top of Facebook’s Messenger.

The weekend was super fun and inspiring to see how much can be done so quickly on so much openly available data. You can see our full source code on Github and all the other projects from the competition there too.