Hadoop Concepts

What is Hadoop:  is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing.


Distributed Reliable File System

  •  Apache Hadoop Distributed File System (HDFS)
  •  Inspired by Google File System
  •  Single Logical View of distributed Linux File Systems

 Data typically is Replicate 3 times

  •  Fault Tolerant
  •  Better I/O

Distributed Compute Framework & Resource Manager

  •  Apache MapReduce and YARN
  •  Inspired by Goolge MapReduce

HDFS Blocks

Files are broken into chunks

