
It’s dizzyingly strange to be a student again, 5 years after I first graduated. I am older (not a lot wiser), and in the interim I have got married and bought a house. I joke that nowadays while many of my fellow students are out enjoying the nightlife, I am returning to my house in rural Devon and making compost! In addition to this there is the peculiar fact that I was once an arts student and now I find myself a sciences student. I am an odd sort of chrysalis. So you’ll have to forgive the somewhat predictable blog post from the historian-turned-data-scientist: it’s all part of my transition process.
We’ve now talked about what data science is – but where did it come from? Why had most of us not heard of it until recently? These are not necessarily simple questions to answer as, ‘data’ and ‘science’ being well-established words in their own right, it can be quite difficult to see exactly when they stopped being two distinct words pushed together and became bound up in a new concept. It would have been much easier to trace back if someone had come up with a new word. (In fact someone did, in a way. In 1974 a Danish computer scientist called Peter Naur suggested the term ‘datalogy’ to describe data processing methods. Thankfully we’re not still using that today – I have tried saying it aloud several times and my enunciation is laughable. I would haltingly have to tell people that I was a ‘datalogist’ – it would not do.)
Data science as a term seems to have sprung into existence in 1996, with its inclusion in a statistical conference title: “Data Science, classification, and related methods”. A year later, a Taiwanese statistician called Jeff C. Wu gave a lecture entitled, with admirable efficiency: “Statistics = Data Science?”. He answered this question in the affirmative – proposing that statistics be replaced by data science. Then in 2001, an American computer scientist called William S. Cleveland proposed a new discipline of data science, which would be broader than statistics and would incorporate advances in computing. In 2002 the Data Science Journal began to be published, which was followed a year later by similarly-named The Journal of Data Science.
With this flurry of activity around the turn of the millennium, the term was born. But I still wasn’t aware of data science (and I don’t think this is solely because I was ten in 2003). Why has the term become so popular in recent years?
The answer, unsurprisingly, relates to the Internet. The Internet heralded an extraordinary boom in data, resulting in a sudden high demand for its study and manipulation. Startups were built on data. Big Internet-based corporations ran on and churned out the stuff. Social media exploded. Data became the 21st century commodity, and so the developing discipline of data science found its foothold.
This brings us to the present day, in which ‘data science’ and ‘big data’ have become buzzwords of the moment. But before it was fed by the Internet, we can see that data science was born of parent disciplines statistics and computer science. These have much longer histories. But I think that’s an exercise for another day.