Configure Hadoop and start cluster services using Ansible Playbook

Let’s start!

To perform this task, first we will have to configure the ansible inventory file and ansible.cfg file

vim /etc/ansible/ansible.cfg

And then we will have to update the master and slave node IP addresses and the credentials

vim /root/ip-hadoop.txt

You can see here;

Now we can check the connectivity using ping command:

Now, Lets start writing the ansible playbook

vim hadoop.yml

First we will write the playbook to create a directory in managed nodes and copy the Hadoop and JDK software (rpm files) into it

To run this playbook we have to write a command:

ansible-playbook -v <file_name> you can use -v (-vv, -vvv, -vvvv ) for more information

ansible-playbook hadoop.yml

It’s successfully copied to both the nodes

Now we will install both Hadoop and JDK into both the nodes

Output screen:

for me its already installed, That’s why its showing “ok”

Master Configuration

Step 1: First we will have to create a directory that will be added to the hdfs-site.xml file

Step 2:Now we have to configure the hadoop core-site.xml and hdfs-site.xml file. So for that, I have copied the code and saved into a directory and here I’m copping the file to /etc/hadoop/ directory

You can download the code from here: click here

Step 3: Format the namenode

Step 4: Start the namenode service

Step 5: Checking for the service is running or not

Output:

Slave Configuration

Step 1: First we will have to create a directory that will be added to the hdfs-site.xml file

Step 2:Now we have to configure the hadoop core-site.xml and hdfs-site.xml file. So for that, I have copied the code and saved into a directory and here I’m copping the file to /etc/hadoop/ directory

You can download the code from here: click here

Step 3: Start the datanode service

Step 4: Checking for the service is running or not

We can check whether the cluster is started or not using the dfsadmin -report

Output:

If you are finding any problem in the connectivity to the server then it might be because of the firewall, so try after disabling or stopping the firewall using command: systemctl stop firewalld

We can also check the cluster is ready to use and the datanode is sharing the storage or not form any (master and slave) nodes using command:

hadoop dfsadmin -format

Done! That’s all

I hope it was helpful

For code: https://github.com/pkpathak143/hadoop-config-ansible-playbook

Thank you!

ARTH Lerner