HDFS Architecture

HDFS Architecture

·

2 min read

we will be talking about what are the major daemons are in hadoop and how the files are stored in hdfs in this article.

Hadoop Daemons explained

hdfs is a master/slave architecture. Master node called as namenode whereas slave node called as datanode

  • NameNode - This is the master node where metadata of data is stored. Only one master node is allowed. NameNode is responsible for client read/write requests.

Hadoop 2.x came up with a feature called high availability which will have a backup name node called a standby name node.

  • DataNode- This is the slave node that contains the storage of the data. Datanodes can be many in the cluster (set of machines). Files are split into blocks and stored in data nodes. These nodes are scalable i.e. we can add nodes on a required basis

image.png

Storage Mechanism:

Files are split into 128M size chunks called blocks. To achieve fault tolerance, Each block is further replicated 3 times as default. The replication factor and block sizes are always configurable.

IF file1 has 300M in size, it splits into three blocks correspondingly block A(128M), block B(128M), block C (44M). The 3 blocks are again replicated 3 times. Finally, File1 is stored as 9 blocks in HDFS.

The above diagram shows how the file is distributed in 4 data nodes as 9 blocks.

Read/Write requests:

The client requests name node for available data nodes to write the data. Namenode returns the available data nodes to the client. client write the first block to the near data node. Once the first block is written to the data node, the data node will create the pipeline with the other two data nodes for replication. The client never writes 3 blocks. it only writes one block and replication of the same will be taken care by data nodes.

To read a file from hdfs, Client interacts with the name node which contains metadata(info about stored data). Namenode checks for permissions of the file. returns the block locations to the client. The client directly reads the data from data nodes.

The below diagrams shows the read/write operations of hdfs.

image.png