Twitter‘s latest SEC filings declare that only 8.5% of their active accounts are probably robots. Thats 24 million of their 284 million active accounts. However, a previous report by the Wall Street Journal in April last year suggested that as many as 44% of twitter accounts have never tweeted.
The devil, as always, is in the detail. First of all in deciding what an ‘active’ account constitutes, and secondly in judging which criteria to use for classification as a ‘bot’ account.
Much of the task of a data scientist is precisely this problem of constructing sensible definitions, as well as understanding and communicating the implications of alternative definitions. A successful data scientist, like an academic researcher, will be able to accurately uncover the learnings from a dataset in different scenarios, and then action based on the likeliest scenario.