This is the second article in the High Availability series. In the last post we configured High Availability for HDFS and this time we will complete a whole Highly available Hadoop, as we will configure YARN for High availability. This basically mean configuring a standby Resource Manager and it can be done very easily using the Cloudera manager.
First, open YARN service, then from the “Action” button choose “Enable high availability”:
Then select the additional server which will host the second ResourceManager:
Select from the list of nodes in the cluster:
Now Cloudera manager will do some processing:
When it completes, the cluster has YARN high availability and can sustain losing one ResourceManager.
I want to test it so I first found out who is the active ResourceManager.
Go to YARN -> instances and see which ResourceManager is the standby one and which is active:
cloudera6 is the active ResourceManager and then I ran a hive select, shutting down cloudera6 while it was running. The hive job had a pause when the failover happened, but then it completed successfully without any unusual messages. There is not much to show in this case, because the job ran and completed like any other job:
hive> select avg(value) from sampledata_1; Query ID = hive_20160807092828_6ad25b36-dfb6-4602-b0c9-fa22e6bb823c Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1470549931568_0001, Tracking URL = http://cloudera6.lan:8088/proxy/application_1470549931568_0001/ Kill Command = /opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/bin/hadoop job -kill job_1470549931568_0001 Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1 2016-08-07 09:29:54,095 Stage-1 map = 0%, reduce = 0% 2016-08-07 09:30:22,896 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 7.55 sec 2016-08-07 09:30:26,588 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 15.0 sec 2016-08-07 09:30:30,243 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 21.73 sec 2016-08-07 09:30:45,689 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 29.11 sec 2016-08-07 09:30:49,243 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 36.69 sec 2016-08-07 09:30:51,514 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 39.5 sec 2016-08-07 09:30:52,711 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 39.5 sec 2016-08-07 09:31:03,718 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 43.38 sec MapReduce Total cumulative CPU time: 43 seconds 380 msec Ended Job = job_1470549931568_0001 MapReduce Jobs Launched: Stage-Stage-1: Map: 5 Reduce: 1 Cumulative CPU: 43.38 sec HDFS Read: 1048090552 HDFS Write: 13 SUCCESS Total MapReduce CPU Time Spent: 43 seconds 380 msec OK 16384.885447 Time taken: 163.259 seconds, Fetched: 1 row(s)
I can force a failover without shutting down the entire server if I click on the active ResourceManager, and then, in the ResourceManager page, open the actions menu and choose to shut down this ResourceManager:
After few seconds you can see that the secondary ResourceManager has taken over:
We can now start the ResourceManager we stopped earlier and it will take the standby role.