Elasticity Task — LVM on Hadoop

Pratyush Pathak
4 min readMar 16, 2021

--

Task 7.1: Elasticity Task

A. Integrating LVM with Hadoop and providing Elasticity to DataNode Storage.

So lets start!!

For this we will first check the configuration files of Hadoop NameNode and DataNode i.e, the core-site.xml file and the hdfs-site.xml file

1. For NameNode:

vim /etc/hadoop/core-site.xml

My NameNode IP is 192.168.43.97 which I have updated here, now in hdfs-site.xml file:

vim /etc/hadoop/hdfs-site.xml

Here I have first created a directory /nn and then allocated it to hdfs

Now we have to start the service:

hadoop-daemon.sh start namenode

You can confirm the service is running or not using jps command

2. For DataNode:

vim /etc/hadoop/core-site.xml

Here also you have to provide the namenode IP

vim /etc/hadoop/hdfs-site.xml

Here also I have first created a directory /dn1 and then allocated it to hdfs

Now we have to start the service:

hadoop-daemon.sh start datanode

You can check the service is running or not using jps command.

Now, you can check the report of hdfs cluster

hadoop dfsadmin -report

For now there is only 1 DataNode is connected and it is sharing around 50GiB of storage.

And here our task is to provide elasticity to this storage which DataNode is sharing using the help of LVM.

To accomplish this we have to follow these steps:

  • Step 1: Add volume to the DataNode
  • Step 2: Create a physical volume
  • Step 3: Create a volume group
  • Step 4: Create a logical volume
  • Step 5: Format and mount the volume to the DataNode

So lets do it step by step

Step 1:

For adding volume to DataNode, I have created 2 volumes (of size 10Gib and 20Gib) and added it to the DataNode VM which are Slave-1_1.vdi and Slave-1_2.vdi

Now we can check if the volume of 10Gib and 20Gib are attached or not using command:

fdisk -l

Step 2:

Now we will create a physical volume of both the volumes which are /dev/sdb and /dev/sdc

pvcreate /dev/sdb
pvcreate /dev/sdc

Now we can confirm whether they are created or not by displaying the physical volumes.

pvdisplay

Step 3:

Now we will create a volume group in which we will add both the physical volumes which are /dev/sdb and /dev/sdc. Here I'm giving the volume group name as test and then we can display it using command vgdisplay

vgcreate test /dev/sdb /dev/sdc
vgdisplay

It is showing that one volume group is created of size 30GiB (10Gib + 20Gib)

Step 4:

Now we will create a logical volume of size 15Gib from the volume group test which is of size 30GiB. Here I'm giving the logical volume name as lv1 and then we can display it using command lvdisplay

lvcreate --size 15G --name lv1 test
lvdisplay

Step 5:

Now we will format the logical volume before mounting it

mkfs.ext4 /dev/test/lv1

Its formatted, now we can mount it to the datanode directory /dn1 and then we can check whether it is mounted or not using df -lh command

mount /dev/test/lv1  /dn1

df -lh

Almost done! now to implement this storage to the cluster we have to restart the DataNode

hadoop-daemon.sh stop datanode

hadoop-daemon.sh start datanode

jps

Now to check the storage sharing by DataNode in the cluster we can see the report

hadoop dfsadmin -report

Done!!

Thank you!! :)

--

--