Configuring CDH 5.10 with MIT Kerberos

Hadoop’s basic authentication is based on applicative user/password, and does not take care of OS users at all. OS users related to services such as hdfs, hive or yarn can access almost any data and perform any actions related to their role. A more secure way is to use external authentication like LDAP or Kerberos server.

This post will demonstrate configuring Cloudera cluster with MIT Kerberos server.

Installing and configuring Kerberos server

One option for installing Kerberos is to download the sources, offered by MIT here and follow the instructions to compile, install and configure it.

I prefer to install the packages using Yum (I’m using CentOS), since it handles the creation of services and configuration files for you. So the following procedure will describe installation using Yum as shown here.

First, we have to install Kerberos packages, along with some utility packages:

yum install autoconf cc gcc bison byacc krb5-server krb5-libs krb5-auth-dialog krb5-workstation


Edit /etc/krb5.conf and change it so realm name and server name fits your case. Realm name should be in uppercase.

 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

 default_realm = MYREALM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true

 kdc = kerberos.lan
 admin_server = kerberos.lan

max_life and max_renewable_life are required by Hadoop to support renewable tickets with non-zero lifetime.

Edit /var/kerberos/krb5kdc/kdc.conf  and change realm name:

 kdc_ports = 88
 kdc_tcp_ports = 88

 #master_key_type = aes256-cts
 acl_file = /var/kerberos/krb5kdc/kadm5.acl
 dict_file = /usr/share/dict/words
 admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
 supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal

If you did not install JCE (Java strongest encryption), then you have to remove  “aes256-cts:normal” to disable the use of JCE encryption in Kerberos server .You can read more on this here.

Now we will use kdb5_util to create the Kerberos database and stash file. Sometimes, there is not enough entropy, and that will cause the command to hang for a long time on “Loading random data”. To avoid this, run this command first:

rngd -r /dev/urandom -o /dev/random


Now it runs successfully without waiting too much:

[[email protected] krb5kdc]# kdb5_util create -r MYREALM -s
Loading random data
Initializing database '/var/krb5kdc/principal' for realm 'MYREALM',
master key name 'K/[email protected]'
You will be prompted for the database Master Password.
It is important that you NOT FORGET this password.
Enter KDC database master key:
Re-enter KDC database master key to verify:

You can see the newly created database files at “/usr/local/var/krb5kdc”. The hidden stash file is also present in this directory with the name “.5k.

In our kdc.conf file, we had a pointer to the acl file, /var/kerberos/krb5kdc/kadm5.acl. This file lists the principals of the Kerberos administrators. You can read more about acl files here. Add thos line to the ACL file to give access to Cloudera users (this list may vary according to the services running in your cluster):

*/[email protected] *
[email protected] * flume/*@MYREALM
[email protected] * hbase/*@MYREALM
[email protected] * hdfs/*@MYREALM
[email protected] * hive/*@MYREALM
[email protected] * httpfs/*@MYREALM
[email protected] * HTTP/*@MYREALM
[email protected] * hue/*@MYREALM
[email protected] * impala/*@MYREALM
[email protected] * mapred/*@MYREALM
[email protected] * oozie/*@MYREALM
[email protected] * solr/*@MYREALM
[email protected] * sqoop/*@MYREALM
[email protected] * yarn/*@MYREALM
[email protected] * zookeeper/*@MYREALM

This means that user root in my domain has full administrative privileges on the Kerberos database, and adds all other cloudera cluster users.

We now have to add administrative principals to Kerberos database. This must be done by running the kadmin.local utility locally to the host where the master KDC runs with one of the users that are in the acl file:

[[email protected] ~]# kadmin.local
Authenticating as principal root/[email protected] with password.
kadmin.local: addprinc root/[email protected]
WARNING: no policy specified for root/[email protected]; defaulting to no policy
Enter password for principal "root/[email protected]":
Re-enter password for principal "root/[email protected]":
Principal "admin/[email protected]" created.
kadmin.local: exit

Now we can start the Kerberos services:

service krb5kdc start

service kadmin start

You can check if it is running properly by connecting to kinit with the administrator that you created earlier. It should show no errors:

[[email protected] log]# kinit root/[email protected]
Password for root/[email protected]:

We want both services to start automatically with the server, so run this:

chkconfig --add krb5kdc

chkconfig --add kadmin

chkconfig krb5kdc on

chkconfig kadmin on

Configure CDH with Kerberos using the wizard

Now that our MIT Kerberos server is in place, we start configuring CDH to use it.

Manually configuring Hadoop to use Kerberos is a tedious and error prone process. We will use the easier way which is using the wizard offered by Cloudera manager. The whole procedure is described here.The first step is to create the CDH administrator principal:

[[email protected] ~]# kadmin.local
Authenticating as principal root/[email protected] with password.
kadmin.local: addprinc -pw manager cloudera-scm/[email protected]
WARNING: no policy specified for cloudera-scm/[email protected]; defaulting to no policy
Principal "cloudera-scm/[email protected]" created.
kadmin.local: exit

Install those two packages on all nodes in the cluster:

yum install krb5-workstation krb5-libs

Now run the wizard by selecting “enable kerberos” from the cluster menu:

The next scree shows four tasks that should be completed before we can go on. If you followed the steps in this post, they’re should be already done, so just check the boxes on the left:

View full size image

In the next step you will have to provide the hos name where you installed KDC and the realm name. Leave the other parameters at their default values:

View full size image

The next page lets you choose if you want Cloudera manager to manage crb5.conf for you or you want to do it manually (in this case you will have to copy crb5.conf file to all nodes). If you choose to let Cloudera manager handle it, specify the same values you specified in crb5.conf earlier in this post:

View full size image

Next you will have to enter the cloudera principal we created earlier along with the realm name:

When you press “continue”, Cloudera manager will import the account manger credentials from KDC server:

On the next page you shouldn’t change anything:

View full size image

You should add a principal in DKC for each of those users, according to the actual services running on your cluster (run kadmin.local on KDC server ad add those principals):

addprinc -pw hive [email protected]
addprinc -pw hdfs [email protected]
addprinc -pw hue [email protected]
addprinc -pw oozie [email protected]
addprinc -pw spark [email protected]
addprinc -pw yarn [email protected]
addprinc -pw zookeeper [email protected]
addprinc -pw fafka_mirror_maker [email protected]


View full size image

Next, the wizard will start configuring everything, including cluster restart and show its progress:

And you will get a message when it is done:

 Now let's test it. First, we will try to access HDFS the usual way by su - hdfs. This should be blocked by kerberos security:
[[email protected] ~]$ hdfs dfs -ls /tmp
17/05/14 14:29:18 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
17/05/14 14:29:18 WARN ipc.Client: Exception encountered while connecting to the server : GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
17/05/14 14:29:18 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
ls: Failed on local exception: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "cloudera1.lan/"; destination host is: "cloudera1.lan":8020;

And it was indeed blocked. Now let’s try it the right way, using kinit:

[[email protected] ~]$ kinit [email protected]
Password for [email protected]:
[[email protected] ~]$ hdfs dfs -ls /tmp
Found 5 items
drwxrwxrwx - hdfs supergroup 0 2017-05-14 14:32 /tmp/.cloudera_health_monitoring_canary_files
-rw-r--r-- 3 hdfs supergroup 29 2017-05-07 22:45 /tmp/demo.txt
drwxr-xr-x - yarn supergroup 0 2017-04-14 23:03 /tmp/hadoop-yarn
drwx-wx-wx - hive supergroup 0 2017-03-26 23:31 /tmp/hive
drwxrwxrwt - mapred hadoop 0 2017-04-14 23:03 /tmp/logs

Works !
Now our cluster is kerberised.

Kerberos server becomes a single point of failure now, and it is a good idea to configure a slave server for it, but we will try it in another post.

This entry was posted in Cloudera and tagged , , , . Bookmark the permalink.

Leave a Reply