Elon Musk wants to open source Artificial Intelligence


Taken from Wired‘s article two months ago:

This morning, OpenAI will release its first batch of AI software, a toolkit for building artificially intelligent systems by way of a technology called “reinforcement learning”—one of the key technologies that, among other things, drove the creation of AlphaGo, the Google AI that shocked the world by mastering the ancient game of Go. With this toolkit, you can build systems that simulate a new breed of robot, play Atari games, and, yes, master the game of Go.

He envisions OpenAI as the modern incarnation of Xerox PARC, the tech research lab that thrived in the 1970s. Just as PARC’s largely open and unfettered research gave rise to everything from the graphical user interface to the laser printer to object-oriented programing, Brockman and crew seek to delve even deeper into what we once considered science fiction. PARC was owned by, yes, Xerox, but it fed so many other companies, most notably Apple, because people like Steve Jobs were privy to its research. At OpenAI, Brockman wants to make everyone privy to its research.

But along with such promise comes deep anxiety. Musk and Altman worry that if people can build AI that can do great things, then they can build AI that can do awful things, too. They’re not alone in their fear of robot overlords, but perhaps counterintuitively, Musk and Altman also think that the best way to battle malicious AI is not to restrict access to artificial intelligence but expand it. That’s part of what has attracted a team of young, hyper-intelligent idealists to their new project.

Giving up control is the essence of the open source ideal. If enough people apply themselves to a collective goal, the end result will trounce anything you concoct in secret. But if AI becomes as powerful as promised, the equation changes. We’ll have to ensure that new AIs adhere to the same egalitarian ideals that led to their creation in the first place. Musk, Altman, and Brockman are placing their faith in the wisdom of the crowd. But if they’re right, one day that crowd won’t be entirely human.

You can read the full text here.


Hedge funds are luring away Tech’s AI superstars

hedgefundsai An arms race has resumed amongst the world’s biggest hedge funds. Seeing the potential of the technologies produced at some of the most prolific Machine Learning groups in big tech companies such as Google and Facebook, a recent article notes that hedge funds are lifting lead Data Scientists to work on building better alpha strategies.

In the past, algorithmic trading prided itself on hiring highly skilled statisticians to sculpt informative signals and combine them in a state-of-the-art model to predict movements in prices. With the success of deep learning software, such as IBM’s Watson, hedge funds now see potential in throwing their financial big data at artificial intelligence at these artificial intelligence black boxes to predict alpha.

Bridgewater hired David Ferrucci, former lead engineer at IBM for developing Watson, Renaissance Technologies was founded by Bob Mercer and Peter Brown, former language recognition leads at IBM, and recently Blackrock hired Bill MacCartney, a former Google scientist.

For these robotics rockstars moving from Tech to Finance, one downside is that there work becomes a lot more secretive. The nature of algorithmic trading is very hush hush with all hedge funds in direct competition with each other. Compared to publishing research papers at IBM or Google, the traders at these funds will have to keep their advances to themselves – which is a loss for the rest of the scientific community.

CyberLaunch Accelerator launches Security ML challenge

I recently joined Cyberlaunch, the world’s leading accelerator for information security (Infosec) and machine learning (ML), as a Mentor for their startup companies.


Last week they launched a Startup Challenge to find the brightest solutions to challenging Infosec and ML problems. There are two prizes, each worth over $150,000.

Its sure to be a very competitive field and I am looking forward to the entries!

Airbnb launches first ever Kaggle competition!

In an exciting new partnership, Airbnb has teamed up with Kaggle to create an online Data Science data challenge. In this challenge we provide historical data on the first country guests book and then ask candidates to predict future first bookings.


Try the challenge yourself! You have until February 11th 2016 to submit your entries. And if you have any questions you can use the forum and I will respond as soon as possible. Good luck and hope you have fun playing with our data!

Can judges predict whether a criminal will reoffend?

A colleague forwarded me a TED talk by former attorney general of New Jersey Anna Milgram that argues for the use of statistics and data science in the legal system.

Frustrated by the lack of data in the judicial system to measure and understand the level of crime and the impact of new policies, Anna built a team of data scientists to aggregate crime data and eventually build a predictive model for re-offense rates. Her hope is that this can be used by judges throughout America to better inform their decisions.

This is another example of the power of data and statistics for predicting human behaviour, something that I am very interested in also and actively work on at Airbnb. With current tools and data I would say it is more of a data art than data science, but the hope is that at least very typical behaviour may be accurately modelled.

Send me any other cool talks or stuff you have read on this topic.

Beating the government: is big data crucial or creepy?

An article on Thursday in the UK online tech journal ArsTechnica reviews the surprising power of mobile communications data to identify trending unemployment.

PLOS One paper and Journal of the Royal Society Interface paper both published last week look at changes in the frequency, location, and timing of interactions between people via their cellular records. The correlations between these changes and observed layoffs can be used to train models for future predictions.

The article asks: is this harvesting of phone records to get ahead of employment shocks a critical tool for planners and government officials? Or actually a very creepy and invasive use of personal information? Comments welcome!


This image, unrelated to the unemployment study, shows seasonal population changes in France and Portugal, measured by cellphone activity.

Check out my new Machine Learning blog post on Airbnb


While almost all members of the Airbnb community interact in good faith, there is an ever shrinking group of bad actors that seek to take advantage of the platform for profit. This problem is not unique to Airbnb: social networks battle with attempts to spam or phish users for their details; ecommerce sites try to prevent the use of stolen credit cards. The Trust and Safety team at Airbnb works tirelessly to remove bad actors from the Airbnb community and to help make the platform a safer and trustworthy place to experience belonging.

Missing Values In A Random Forest

We can train machine learning models to identify new bad actors (for more details see the previous blog post Architecting a Machine Learning System for Risk). One particular family of models we use is Random Forest Classifiers (RFCs). A RFC is a collection of trees, each independently grown using labeled and complete input training data. By complete we explicitly mean that there are no missing values i.e. NULL or NaN values. But in practice the data often can have (many) missing values. In particular, very predictive features do not always have values available so they must be imputed before a random forest can be trained.

Read more…

When to wait for flight prices to drop

Bing Price Predictor

Bing Flights Price Predictor

Kayak Price Predictor

Kayak Flights Price Predictor

I’ve often heard people talk about when is the best time to book flights (apparently its Tuesday nights). And there has been a rise in airfare blogs such as Airfare Watchdog and CheapAir’s Blog.

Even online flight booking platforms such as Bing and Kayak are starting to offer advice on whether prices are trending up or down and whether now is the best time to buy.

Model Parameters Value Over Time

Model Parameters Values Over Time

Recently, I came across a dataset of about 6 months worth of internal US flights prices data. For about 100 popular routes, the dataset had the time and current price for the future flight. I wanted to see whether we could actually predict directional changes in price with any confidence.

I built a model to try to predict whether the price would drop by at least 10% in the next 7 days. Using only historical price returns and weekly updating of the model parameters, I calculated the daily out-of-sample performance. The results were much better than I expected.

Model R2 In Test Data Over Time

Model R2 In Test Data Over Time

Firstly, the 2 parameters in my model were reasonably stable over time – a key property of a well defined model. And secondly, the out-of-sample R2 (measure of performance) was consistently positive and around 5%.

More concretely and actionable: for the dataset I was looking at, the price actually dropped 18% of the time (to below 10% in the proceeding week), the model made a prediction that the price would drop 13% of the time, and it was correct in 73% of these predictions.

With more features data such as flight duration, number of changes, oil prices, seasonality i’m confident that the 13% could get closer to 18% and the 73% could be pushed even higher, maybe to 95%.