An open-source framework written in Java which allows users to store as much as terabytes or even petabytes of Big Data – both structured and un-structured – across a cluster of computers. The unique storage mechanism which uses a distributed file system (HDFS) to map data across any part of a cluster.
This is the module which offers a key selling point of Hadoop, as it ensures scalability. When data is received by Hadoop it is executed over three different stages:
This is the name given to the storage system used by Hadoop. It utilises a master/slave set-up, where one primary machine controls a large number of other machines, making it possible to access big data quickly across the Hadoop clusters. By dividing the data into separate pieces, it stores them at speed on multiple nodes in one cluster.
The structure of Hadoop means that it can scale horizontally, unlike traditional relational databases. This is because the data can be stored across a cluster of servers, from a single server to hundreds.
Faster data processing is made possible by the distributed file and powerful mapping offered by Hadoop.
Both your structured and unstructured data can be used to generate value by Hadoop. It can draw useful insights from sources such as social media, daily logs and emails.
The data stored by Hadoop is stored in replicate form across different servers in multiple locations, which increases reliability.
When utilising Hadoop, it becomes simple to store, manage and process large data sets, bringing effective data analysis in-house.