Reliable data is 90% of the data scientist’s work

telegraphglobalwarming An article in the UK newspaper The Telegraph today reports to have uncovered a huge scandal in the measurement and use of temperature records from South American weather stations. The article claims that temperature readings have been reversed to show a 1 degree celsius rise in the past 40 years when in fact temperatures have been cooling.

Whether there is any truth or not in the article is of huge importance and interest. But the meta-message to take away here is that data has to be vetted and reliable and trustworthy before models can be built and decisions taken.

It is no surprise then that a data scientist may find themselves spending much more time obtaining, cleaning, checking, and re-checking data, than analysing it. And this is just how it should be, and also why a data scientist is unique in their role as the curator of data. The article is a timely reminder that this responsibility should be taken extremely seriously and executed with the upmost integrity.


