Check out my latest Airbnb blog post on how to prepare for moving from Academia to Data Science in industry.
Tag Archives: data science
Kaggle interviewed me on Data Science at Airbnb
Anonymisation will unlock data mining
It is widely accepted that many companies such as Facebook, Google, Amazon, etc have vast amounts of data on our friends, interests, and spending habits amongst other things. At times, for example for data mining or scientific collaboration, it can be useful for companies to access internal and external data. However, they are rightfully blocked by Privacy laws.
Hence, there is an increasing work in thinking of ways to properly anonymise data to enable mining. Aggregation of data to wash PII (personally identifiable information) is one way to achieve this, but can lose important granularity and detail.
Raffael Strassnig, VP Data Scientist at Barclays retail bank, spoke at a summit last month to stress the importance of protecting privacy. Anonymising data at scale is a very hard problem but Strassnig’s team have implemented an algorithm by a PhD candidate and modified it to work on Barclay’s Big Data. The method involves:
“clustering the data into k-means clusters, with no cluster overlapping, the clusters being a certain size to comply with k-anonymity constraint, and minimising the loss of data when applying the procedure to the dataset by using a dissimilarity measure”
Future developments in application of Machine Learning techniques may enable use of PII without anonymisation. Until then, the Data Science team at Barclays is leading the way in protecting their users’ data while processing it.
Check out the UK’s new Government Digital Service centre
The UK has launched a new Data group of Data Scientists to help drive complex government policies with data and models. They are focusing on encouraging data literacy in the government.
You can read more about opportunities here on the government website.
Are Algorithms becoming our Overlords?
An apocalyptic piece from the Evening Standard last Easter which highlights all the different points in our lives where algorithms are controlling what we see or hear or do. A few examples include:
- social network feeds
- travel websites
- song compositions
- pension investments.
Why should we be concerned? As Robert Colvile, author of The Great Acceleration, mentions in the context of financial markets:
‘The real danger is that it can all happen at speeds to which humans can’t react. Firms go bankrupt or markets get shattered before anyone’s really realised what’s going on, which is why it’s really important to have the right safeguards in place.’
Elon Musk wants to open source Artificial Intelligence
Taken from Wired‘s article two months ago:
This morning, OpenAI will release its first batch of AI software, a toolkit for building artificially intelligent systems by way of a technology called “reinforcement learning”—one of the key technologies that, among other things, drove the creation of AlphaGo, the Google AI that shocked the world by mastering the ancient game of Go. With this toolkit, you can build systems that simulate a new breed of robot, play Atari games, and, yes, master the game of Go.
He envisions OpenAI as the modern incarnation of Xerox PARC, the tech research lab that thrived in the 1970s. Just as PARC’s largely open and unfettered research gave rise to everything from the graphical user interface to the laser printer to object-oriented programing, Brockman and crew seek to delve even deeper into what we once considered science fiction. PARC was owned by, yes, Xerox, but it fed so many other companies, most notably Apple, because people like Steve Jobs were privy to its research. At OpenAI, Brockman wants to make everyone privy to its research.
But along with such promise comes deep anxiety. Musk and Altman worry that if people can build AI that can do great things, then they can build AI that can do awful things, too. They’re not alone in their fear of robot overlords, but perhaps counterintuitively, Musk and Altman also think that the best way to battle malicious AI is not to restrict access to artificial intelligence but expand it. That’s part of what has attracted a team of young, hyper-intelligent idealists to their new project.
Giving up control is the essence of the open source ideal. If enough people apply themselves to a collective goal, the end result will trounce anything you concoct in secret. But if AI becomes as powerful as promised, the equation changes. We’ll have to ensure that new AIs adhere to the same egalitarian ideals that led to their creation in the first place. Musk, Altman, and Brockman are placing their faith in the wisdom of the crowd. But if they’re right, one day that crowd won’t be entirely human.
You can read the full text here.
Do you really need Data Scientists?
This and many more common questions about Data Science are tackled by Instacart VP Data Science Jeremy Stanley, and former LinkedIn data leader Daniel Tunkelang. The term Data Science was only coined a decade or so ago but has gathered so much momentum that most business leaders now feel like they should have a Data Science team – even if they don’t know what they would do with them.
Jeremy and Daniel take us through some common misconceptions and recommended ways for thinking about finding real impact from Data Science. Some of my favourite lines from the article:
The above may sound a lot like data analytics, and indeed the difference between analytics and decision science isn’t always clear. Still, decision science should do more than produce reports and dashboards.
But collecting data isn’t enough. Data science only matters if data drives action.
Similarly, data-driven decision making requires a top-down commitment. From the CEO down, the organization has to commit to making decisions using data, rather than based on the highest paid person’s opinion ( or HiPPO).
Many people equate big data to data science, but size isn’t everything. Data science is about separating the signal in data from the noise.
Don’t hire a head of data or build a team until you have work for them to do. At the same time, ensure you’re collecting key data early on so that team can have an impact once you’re ready.
Build a company culture early that makes it a great place to practice data science, and you’ll reap dividends when they matter most.
Over time, the impact that a data science team has will be far higher if you build a diverse team with extremely different backgrounds, skill-sets, and world views.
Finally, focus early on hiring data scientists who reflect your company ideals. To be effective, data scientists must be trusted by their teams, the users of their products, and the decision makers they influence.
DJ Patil wants Silicon Valley to work on real problems
I was fortunate enough to attend a Q&A with United States Chief Data Scientist DJ Patil at the Commonwealth Club last week. DJ was keen to stress the challenges facing the US government and the Big Data available to help solve these problems. But noted that the talent and progress we see in technology is not being applied to ‘real’ problems.
DJ gave examples from Law Enforcement and Health Care amongst as areas that are ripe for disruption by data and technology. He also stressed that much public data is readily available online, both at the local and national level – and he invited the Data Scientists in the audience to start hacking for social solutions!
Joined the Advisory Board for Imperial’s MSc in Business Analytics
Delighted to have officially joined the Advisory Board for Imperial College‘s MSc in Business Analytics at the Business School. This is the first year the MSc has been running and the progress so far has been phenomenal. The course is heavily subscribed by candidates from around the globe and the current class has a wonderful diversity of students and experience.
I attended the most recent Board meeting in March and was impressed with the content and ambition of the course. We are working hard to improve the course for the next class and strongly believe the course will be a world leader in delivering data science and business analytics over the years to come. Excited to be a part of this!
Airbnb’s Kaggle Competition Closed!
We had a phenomenal response to the Airbnb Data Science competition on Kaggle. Over 1,400 individuals submitted over 20,000 entries! Congratulations to everyone that participated and hoped you enjoyed getting hands on with our data.