Integrating Apache NiFi with Cloudera manager

Apache Nifi is a very powerful and easy to use tool for transferring data between different platforms. It was adopted by HortonWorks and it is included in their HDP platform, but Cloudera, which is my favorite Hadoop distro does not include it.

You can still run Nifi on the same servers where Cloudera runs either by using its embedded ZooKeeper or using the Cloudera cluster ZooKeeper servers, but i will run outside Cloudera cluster. But I want to take it a step further and have Cloudera manager manage the NiFi as a native service.

To do this we have to build a custom parcel around Apache NiFi. Fortunately, Prateek Rungta has already done this. Here is his parcel and instructions how to install it. The problem is that if you follow the instructions in the github page, you will encounter some problems and won’t be able to build and install the parcel. There are several changes and prerequisites in order to complete the process that are not listed in the instructions (at least on my CentOS 6.9 based cluster). This is what worked for me.

I recommend doing all this in the host where Cloudera manager runs. First of all, install git using yum, then install maven following the instructions from here.

The first step went smoothly without any change:

cd /tmp
git clone
cd cm_ext/validator
mvn install

The next step will download and build Nifi. However, it will fail validation in one file as described here. Since there is no solution for this bug yet I decided to completely skip unit test. How to do it is described here. Edit /tmp/nifi/pom.xml file and add in the maven properties section:


Also, in the maven-surefire-plugin add this:


The whole process can run for almost an hour, but skipping the unit tests seeds it up significantly. We will also run the build in parallel so it will even be a bit faster.

cd /tmp
git clone
cd nifi
mvn -T 2.0C clean install

If the above step completed without errors, we can proceed. If you continue with the original instructions you will hit the two issues described here. As the author suggests, the first issue is resolved if you edit and replace those two lines:

  sed -i "" "s/<VERSION-FULL>/$FULL_VERSION/g"     $file
  sed -i "" "s/<VERSION-SHORT>/${SHORT_VERSION}/g" $file

With those:

  sed -i -e "s/<VERSION-FULL>/$FULL_VERSION/g"     $file
  sed -i -e "s/<VERSION-SHORT>/${SHORT_VERSION}/g" $file

The second issue is actually a feature that is not supported in Python prior to version 2.7.

I used CentOS 6.9 which comes with built-in Python 2.6. Follow these instructions to install Python 2.7.

CentOS 6 relies on Python 2.6 for YUM, so you should install Python 2.7 alongside Python 2.6, not instead. The 2.6 python will remain in /usr/bin while the new 2.7 python will be installed in /usr/local/bin. Then add /usr/local/bin to the PATH before /usr/bin so it will be picked up first.

After Python 2.7 is in place, you can continue:

export PATH=/usr/local/bin:$PATH
cd /tmp
git clone
cd nifi-parcel
POINT_VERSION=5 VALIDATOR_DIR=/tmp/cm_ext ./ /tmp/nifi/nifi-assembly/target/nifi-*-SNAPSHOT-bin.tar.gz
VALIDATOR_DIR=/tmp/cm_ext ./

Apache NiFi requires Java 1.8, so you will have to upgrade all your Cloudera nodes to Java 1.8 before attempting to run NiFi on them. Here is how to do it.

Next, start a small Python web server to make the parcel files accessible by http:

cd /tmp/nifi-parcel/build-parcel
python -m SimpleHTTPServer 14641

Now go to your Cloudera manager, enter parcels menu and click “configuration” button. You will see the list of all cloudera parcel repositories. Add an entry with the hostname where you ran python web server with port 14641 as shown below:

Then you can go to parcels and click “check for new parcels”. You will then see NiFi appears in the list of available parcels:

View full size image

Download, distribute and activate it.

Now we have to copy the NiFi jar file to its place in the Cloudera manager host. I will assume that you made all the build process on the Cloudera manager host. If not than you will have to copy the file over to the CM host.

cp /tmp/nifi-parcel/build-csd/NIFI-1.0.jar /opt/cloudera/csd

mkdir /opt/cloudera/csd/NIFI-1.0

cp /tmp/nifi-parcel/build-csd/NIFI-1.0.jar /opt/cloudera/csd/NIFI-1.0
cd /opt/cloudera/csd/NIFI-1.0

jar xvf NIFI-1.0.jar

rm -f NIFI-1.0.jar

service cloudera-scm-server restart

Wait until Cloudera manager starts and then click cluster -> add service. You will see NiFi in the list of available services:

view full size image

Add it and select the nodes it will run on. It will now appear in the list of managed services. At this point you can stop, start and monitor NiFi from Cloudera manager:

However, if you will try to access one of the NiFi nodes directly, for example, http://cloudera2:8080/nifi, you will see that it runs in standalone mode, not cluster.

NiFi integration with Cloudera is not as good as a native service. Cloudera manager does not configure it as a cluster automatically and it cannot control NiFi configuration either. Unfortunately, you will still have to edit nifi.propertoes file on each node and configure it to work in cluster and to use Cloudera’s Zookeeper. Here is the configoration changes I had to do manually (the file is in /opt/cloudera/parcels/NIFI-0.0.5.nifi.p0.5/conf):


I left all other parameters at their default values. It can be tweaked further, but this is the minimal configuration that works. You should, of course, change host names and ports to fit your environment.

After that, restart the NiFi service and it will start working as a cluster. Here is how it looks like in NiFi console where you can see this is a Three node cluster (It may take few minutes for the nodes to elect a primary and form a cluster):


You can also choose cluster from the menu and see all nodes details:

View full size image

That’s it. It’s a bit tedious but it’s worth the bother because you get to manage Nifi and Cloudera from a single console. Thanks to Prateek Rungta for enabling this with his custom parcel.

This entry was posted in Cloudera, NiFi and tagged , , . Bookmark the permalink.

Leave a Reply