The types of data that are being generated in the world cannot be all summed up under the banner of big data. And, big data itself holds different meanings for different sections. A large number of people just think of it as a collection of humongous data that needs lots of space to be stored. A growing number of people think of it as a synonym for predictive analytics. And then there are the purists, for whom it is the data set that traditional databases wouldn’t be able to process.


To have a deeper understanding of big data, it is important that we understand its nuances. Let us enlighten ourselves with some types of big data, so that when we have to refer to a particular kind, we know what to call it.

Big data: Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. To simplify it can be defined by 4 Vs – High Volume, Velocity, Variety, Veracity. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Big data bigcomprises of the classic predictive analytics problems where huge amount of data is analyzed to gather insight about patterns, trends, etc. This analysis ups the level of accuracy while making decisions.

Fast data: There are some situations where nothing except real-time analytics will work. A good enough traffic forecast now has value; the same forecast one hour later will be of no use. Most companies have requirements for this kind of data since analysis in real time can solve a lot of problems. For instance, the Federal Energy Regulatory Commission has estimated – considering the ability of software to monitor safety checks, back-up systems, and historical-to-current patterns – that 188 GW of power plants can be postponed by dynamically balancing energy loads.

Dark data: It is data that you can obtain, but isn’t easily accessible. For example, photographs, handwritten notes, comments, feedback forms, video streams, and ingress-egress data from security checkposts. Like various other categories of data, this isn’t easy to tap into. Not just because there’s a ton of it out there, but because it takes a huge amount of computing and data entry to find relevant material.  Another problem is that of insufficient infrastructure; for instance, out of the 245 surveillance cameras installed worldwide, only 2% are HD capable and only 20% networked.

Lost data: Technically not lost, but this data is informational or operational data that is generated by the likes of manufacturing equipment and industrial machinery that is found within commercial establishments. Often, such data is landlocked in operational systems. For instance, McKinsey & Co. says that there are approximately 30,000 sensors on an oil rig, out of which the data of only 1% is used for decision making.

New data: This is data that you want to have but are not harvesting. A startup in Israel, TaKaDu, is using software that brings in mathematical algorithms to detect and prevent leaks in water pipelines. For a desert nation like Israel, every drop of water is literally precious, and if algorithms could save water, is there anything like it? There is another company, Enlighted, which wants to use the motion sensors present in LED fixtures to analyze traffic movements. Behavioral patterns and location data and from smart phone sensors are being used by Ginger I.O to monitor patients of bipolar disorder who live in remote places to provide care more efficiently.

Are there more categories that you have come across? Share in the comments section.