When installing CDH using the administrator, there is a step that prompts you for the database you want to use for metastores of various components of the Cloudera platform. The supported database types are Oracle, MySQL and PostgreSQL. The default is to use a local PostgreSQL installation that is done for you by the Cloudera installer. However, using this local installation is not recommended for production, and if you use it you will see this warning whenever you open Cloudera manager:
There is an option to hide this warning, but it is still a good idea to use an external database.
Last time we saw how to configure external database for Hive/Impala and this time we will see how to complete this process and migrate the management services to an external Oracle database. Please note that the procedure described here will work only for enterprise data hub edition of Cloudera manager and not for the express edition, where you cannot import the configuration file.
Configuring an external metastore is relatively easy at installation time, but it gets a little more complicated to migrate an already running cluster to a new external database. This is the scenario we will cover in this post.
Backup current configuration
We do not want to lose the data we already have, so we should back up the current configuration. Use the Cloudera manager API to first find the version:
So our version is v16. Then use the API to extract Cloudera manager configuration:
curl -u admin:admin http://cloudera1:7180/api/v16/cm/deployment > /root/deployment.json
You will get a long JSON file containing the configuration. Keep it in a safe place for use later.
Then stop the scm server and the embedded PostgreSQL database services:
service cloudera-scm-server stop service cloudera-scm-server-db stop
Now backup the current PostgreSQL database and the scm server directory:
cd /var/lib/cloudera-scm-server-db mkdir /usr/backup tar -cvf /usr/backup/database.tar * cd&nbsp;/etc/cloudera-scm-server tar -cvf /usr/backup/properties.tar *
Preparing Oracle database
I already covered preparing the Oracle database in my last post, the description I gave there includes all required users for this post too.
Configuring Cloudera manager to use the new database
On the server where cloudera manager server runs there is a file, /etc/cloudera-scm-server/db.properties, that holds the information needed to connect to the cloudera manager database. Here is how it looks like when using the embedded database:
com.cloudera.cmf.db.type=postgresql com.cloudera.cmf.db.host=localhost:7432 com.cloudera.cmf.db.name=scm com.cloudera.cmf.db.user=scm com.cloudera.cmf.db.password=BEEkbbBpLR com.cloudera.cmf.db.setupType=EMBEDDED
You can see that the password is there in clear text, not very secured…
We should run a script that prepares the data structures for cloudera manager in the new Oracle database and also creates a new /etc/cloudera-scm-server/db.properties file that points scm server to the new database. The script is on the scm server in this path: “/usr/share/cmf/schema/scm_prepare_database.sh”. There are many options and they are all documented here, but this is how I ran it:
[[email protected] ~]# /usr/share/cmf/schema/scm_prepare_database.sh oracle -h oracle.lan cloudera scmserver scmserver JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. [ main] DbCommandExecutor INFO Successfully connected to database. All done, your SCM database is configured correctly!
Remove the old db.management.properties file that contains the old management services database information and start scm server:
mv /etc/cloudera-scm-server/db.mgmt.properties /etc/cloudera-scm-server/db.mgmt.properties.old service cloudera-scm-server start
You will have to wait a while until the server is up and running and ready to accept requests. Now we will upload the deployment backup JSON file we created earlier to the new database using the following command (This will work only for a system with enterprise data hub license and not for express edition:
curl --upload-file deployment.json -u admin:admin -H "Content-Type: application/json" http://cloudera1:7180/api/v16/cm/deployment?deleteCurrentDeployment=true
You should replace the file with the location and name of you own file, and the host name with the name of your scm server host. The version (v16) may also be different, I showed earlier in this post how to check the version. If everything is OK, then you will receive the contents of the configuration file as a return value.
Then restart cloudera-scm-server service.
Now scm server should start up and you will be able to login to the admin console and see all your cluster data. However, the scm server won’t be able to communicate with the agents. To fix this, run the following commands on each node:
rm -f /var/lib/cloudera-scm-agent/cm_guid service cloudera-scm-agent restart
At this point some management services such as report manager and activity monitor might not work. This is because you have to manually tell them to use Oracle database instead of PostgreSQL. Go to Cluster -> Configuration-> Database settings:
There you will find a list of database related properties for each management service, like database type, DB host name and port, user and password. For example, here is the definitions for Activity monitor:
Change all of them to refer to our new Oracle database and then click save. You may need to redeploy client configuration for the changes to take effect. After that all management services should work properly.