I did not have the chance to install CDH 5.13 yet, but CDH 5.12 ships with a relatively old Spark 1.6.
It is a good idea to upgrade it to latest Spark 2.2 and cloudera provides a special package for this purpose. You can download it and view the installation instructions here and here. You can also download the installation file directly from my site. Spark 2.2 can also coexist with the older 1.6 version.
You will need CDH 5.8-5.12 (5.13 is not officially supported at the time of writing this post), JSK 8, Cloudera manager 5.8.3 or above and scala 2.11.
See here how to install scala. If you have a large cluster you may want to install using a configuration management tool like puppet or chef.
Copy the CSD file to /opt/cloudera/csd on the host where Cloudera manager runs.
cd /opt/cloudera/csd chown cloudera-scm:cloudera-scm SPARK2_ON_YARN-2.2.0.cloudera1.jar chmod 644 SPARK2_ON_YARN-2.2.0.cloudera1.jar
Restart SCM server:
service cloudera-scm-server restart
Also restart Cloudere management services.
Now, add the parcel repository for Spark 2.2:
Go to Hosts -> Parcels. You will see a new line, showing Spark2:
Download, distribute and activate it.
From the cluster page, click the actions button and choose “add service”:
In the next page, choose the services that Spark2 depends on. I chose to include Hive in case I will want to access Hive tables from Spark:
The next page takes care of TLS encryption data, if you did not set up TLS in your cluster you should just skip it.
Now assign roles. You should assign gateway roles to all the hosts in the cluster:
After some processing, your new Spark2 service will start. Go back to the cluster page and restart any stale services.
FInally, you can see Spark and Spark2 running at the same time:
You may delete Spark 1.6 now if you do not need it (for example, running Hive on spark is only supported with Spark 1.6).