Configuring YARN High availability using Cloudera Manager

This is the second article in the High Availability series. In the last post we configured High Availability for HDFS and this time we will complete a whole Highly available Hadoop, as we will configure YARN for High availability. This basically mean configuring a standby Resource Manager and it can be done very easily using the Cloudera manager.

First, open YARN service, then from the “Action” button choose “Enable high availability”:


Then select the additional server which will host the second ResourceManager:

View full size image

Select from the list of nodes in the cluster:

View full size image

Now Cloudera manager will do some processing:

View full size image

When it completes, the cluster has YARN high availability and can sustain losing one ResourceManager.

I want to test it so I first found out who is the active ResourceManager.

Go to YARN -> instances and see which ResourceManager is the standby one and which is active:

View full size image

cloudera6 is the active ResourceManager and then I ran a hive select, shutting down cloudera6 while it was running. The hive job had a pause when the failover happened, but then it completed successfully without any unusual messages. There is not much to show in this case, because the job ran and completed like any other job:

hive> select avg(value) from sampledata_1;
Query ID = hive_20160807092828_6ad25b36-dfb6-4602-b0c9-fa22e6bb823c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1470549931568_0001, Tracking URL = http://cloudera6.lan:8088/proxy/application_1470549931568_0001/
Kill Command = /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/bin/hadoop job -kill job_1470549931568_0001
Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1
2016-08-07 09:29:54,095 Stage-1 map = 0%, reduce = 0%
2016-08-07 09:30:22,896 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 7.55 sec
2016-08-07 09:30:26,588 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 15.0 sec
2016-08-07 09:30:30,243 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 21.73 sec
2016-08-07 09:30:45,689 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 29.11 sec
2016-08-07 09:30:49,243 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 36.69 sec
2016-08-07 09:30:51,514 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 39.5 sec
2016-08-07 09:30:52,711 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 39.5 sec
2016-08-07 09:31:03,718 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 43.38 sec
MapReduce Total cumulative CPU time: 43 seconds 380 msec
Ended Job = job_1470549931568_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 5 Reduce: 1 Cumulative CPU: 43.38 sec HDFS Read: 1048090552 HDFS Write: 13 SUCCESS
Total MapReduce CPU Time Spent: 43 seconds 380 msec
Time taken: 163.259 seconds, Fetched: 1 row(s)

I can force a failover without shutting down the entire server if I click on the active ResourceManager, and then, in the ResourceManager page, open the actions menu and choose to shut down this ResourceManager:

After few seconds you can see that the secondary ResourceManager has taken over:

View full size image

We can now start the ResourceManager we stopped earlier and it will take the standby role.


This entry was posted in Hadoop, YARN and tagged , . Bookmark the permalink.

Leave a Reply