What is Data Science and Big Data?
The new big thing in business, the buzzword that everyone is throwing left and right in the industry still remains a mystery to many. In this post and many to follow, I will attempt to explain Data Science and Big Data, and their usefulness to many industries in layman's terms.
Why did Big Data and Data Science only emerge now?
Data is becoming increasingly more available, detailed, and affordable. Large amounts of data is being collected by a myriad of methods and devices that every person is using on a continuous basis. Data from web searches and histories, mobile and smartphone devices, sensors, instruments, transactions, and many more. At the same time, technology has boomed in the past few decades to easily organize, manipulate, and explore vast amounts of data in small amounts of time. This marriage between immense data availability and technological advancement naturally gave birth to the field of Big Data and Data Science.
Why not just use the scientific method?
Our technological advancements can easily be attributed to the spread of the scientific method when it comes to solving problems. The scientific method is mostly associated with the STEM (Science, Technology, Engineering, and Mathematics) fields, whereby a problem arises, and to solve it one would follow the following steps:
- Identify the problem through observations
- Create a hypothesis that is based on previous knowledge and observations
- Experiment and collect the relevant data, while keeping repeatability and lack of external factors in mind
- Analyze the data and test if the hypothesis was correct
- If correct, the problem is solved. and you now have a working theory. If not, revise the hypothesis with the knowledge obtained from the collected data, rewrite the hypothesis and repeat until the problem is solved
This method worked perfectly well for STEM fields, which is evident by the ever increasing rate of technological advancement with the spread of the scientific method. However, human, social, and economic fields did not benefit as much from it as scientists and engineers did. The many hindrances to using the scientific method for fields that seek to quantify human behaviour include the inability to do large scale experiments, the lack of repeatability, the inability to control for the innumerable outside factors that can influence the results, and the speed at which cultures change from generation to generation and through the years.
The dawn of Big Data
Here is where big data made its name. There is no need to conduct costly and time consuming experiments to collect data when all of it is available to you immediately and is being updated in real time. The sheer size of the data collected constantly from a plethora of sources has facilitated the use of logic and data driven decisions in traditionally qualitative fields. The large data acts as the subject of experiments, where hypotheses can be tested using the now already collected data for validation.
Moreover, one of Big Data's greatest advantages is that it opened the door for exploratory research. Not only can the data answer questions that couldn't be previously tested, it also contains answers to questions that haven't been asked yet. Diving into the data and trying to find hints of correlations is now made possible with Big Data. Virtually every sector of the economy now has access to tons of data, and businesses are accumulating new data at a rate that exceeds their capacity to extract value from it. Here is where Data Scientists truly shine, they are explorers who dig deep and find those pesky correlations that no one saw before. Using statistically significant correlations in business strategy and decision making is now easier than ever.