Big data technologies and practices are moving quickly. Here’s what one should know, according to (Mitchell, 2013), to stay ahead of the game :
a) Big data analytics in the cloud
This allows users to access extremely scalable computing and storage resources through the Internet. It allows companies to get server capacity as needed and expand it rapidly to the enormous scale required to process big datasets and execute complicated mathematical models. Cloud computing reduces the price of data storage because the resources are shared among many users, who pay only for the capacity they actually utilize. Companies can access this capability much more quickly, without the expense and time needed to set up their personal systems, and they do not have to purchase enough capacity to accommodate highest usage.
b) Hadoop: The new enterprise data operating system
Hadoop is by far the most popular implementation of MapReduce. MapReduce is a completely open source platform which handles Big Data. As it is flexible, it works with multiple data sources. It either aggregates multiple sources of data in order to do large scale processing, or reads data from a database in order to run processor-intensive machine learning jobs. It has several diverse applications, but one of the top usages is for large volumes of constantly changing data. Changing data may be web-based or social media data, location-based data from weather or traffic sensors, or machine-to-machine transactional data.
c) Big data lakes
Traditional database theory dictates that you design the data set before entering any data. A data lake, also known as an enterprise data lake or enterprise data hub, turns that model on its head. It offers tools for people to analyze the data, along with a high-level definition of what data exists in the lake.
d) More predictive analytics
Predictive analytics with ML.NET is the branch of data mining concerned with the prediction of future prospects and trends. The central element of predictive analytics is the predictor, a variable that can be measured for an individual or other unit to predict future behavior.
With big data, analysts have not only more data to work with, but also the processing power to handle great numbers of records with many attributes.
e) In-memory analytics
It works by increasing the speed, reliability and performance when querying data. Business Intelligence deployments are typically disk-based, that is the application queries data stored on physical disks. In contrast, with in-memory analytics, the queries and data exist in the server's random access memory (RAM).
The use of in-memory databases to speed up analytic processing is increasingly popular and highly valuable in the right setting. Many web application development companies are making use of In-memory analytics to attain more reliability and greater performance.
f) More, better NoSQL
Alternatives to traditional SQL-based relational databases, termed NoSQL (short for “Not Only SQL”) databases, are rapidly gaining importance as tools for use in specific kinds of analytic applications.
According to (Wikipedia), the working definition of NoSQL is as follows:
“A NoSQL, formerly referred to as "non SQL" or "non-relational", database provides a mechanism for storage as well as retrieval of data which is modeled in means other than the tabular relations used in relational databases.”