Defining Data Science

Image generated using DALL·E 2

I get asked a lot what I’m studying at university. When I reply with “data science”, I have come to expect a momentary flicker of panic on the person’s face before they reply with “oh right!”. It’s an encounter that will be familiar to anyone who has done a degree in something outside of National Curriculum set of subjects. I am forever surprised at how few people give voice to the words printed across their expression: “what is that?”

If you’re someone who has ever felt embarrassed about asking this question, I can assure you that you shouldn’t be. Anyone who is studying or working in a weird-sounding subject should be willing and able to respond – after all, they had to learn the answer at some point! I didn’t know what data science was until two years ago, and it is a term that has only come into common parlance relatively recently.

What do I tell people data science is? I tell them it’s like statistics with more programming. This, of course, is a simplification but serves as a nice springboard for more discussion. In fact it’s a bastardisation of a definition my dad heard, which is that a data scientist is someone who knows more about computer science than a statistician and more about statistics than a computer scientist!

Wikipedia (that wonderfully comprehensive and well-cited trove of knowledge) has this to say: “data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.” That’s a lot more sophisticated than my single-breath answer. If we unpack it we can see the two dimensions emerge:

(1) Science. That’s the “processes, algorithms and systems” part. Science is experimentation and observation that “extracts knowledge” from evidence. And the evidence here is…

(2) Data. The raw facts which can be “structured or unstructured”.

A question might be occurring to you. The above seems to be covered between scientists, statisticians and computer scientists. Why do we need data scientists?

Volume is significant here. We, as organisations, as countries, as a globe, have So. Much. Data. You have probably heard the phrase ‘big data’ being used. It’s an understatement. If Douglas Adams was talking about big data rather than space, he might says that it “is big. Really big. You just won’t believe how vastly hugely mindbogglingly big it is.” Big. As a starting point, think about the fact that every minute, over 350 thousand tweets are posted on Twitter, 72 hours of footage are uploaded to YouTube, over 200 thousand posts are made on Instagram, and over 200 million emails are sent. The Internet has opened the floodgates on data.

Big data deserves a blog post of its own. Many, many blog posts. In fact, a whole discipline. When, in the Harvard Business Review, Thomas H. Davenport and D.J. Patil described data science as “the sexiest job of the 21st century” (no wolf whistles, please) they explained: “if your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a ‘mashup’ of several analytical efforts, you’ve got a big data opportunity.” It’s a nice advert, encouraging people both to employ data scientists and to become them.

You begin to see why we might need people specially equipped for the extraction of knowledge from this data (or ‘data mining’ – to use another familiar phrase). Their armoury must include skills inherited from the fields of science, mathematics and computer science. These skills might be honed into specialisms such as data modelling, sentiment analysis or machine learning.

Finally – who needs data scientists? The most obvious application is in business, as consumer data can be studied to make predictions and inform strategies. But I would be keen to impress that the public sector has great need of data science too. In areas such as healthcare, education, infrastructure, law and order: there is data which has the potential to inform changes that could be of societal benefit.

In the future, when someone asks the question “what did data scientists ever do for us?”, let’s make the answer:

“A lot.”


Leave a comment