In the previous post we saw how to install a single Elasticsearch node. This time we will extend it to create a real cluster.
Configuring a small cluster is a breeze, even if done manually (as I will show below). However, configuring large clusters manually is a pain and error prone.
At the time of this post, Elasticsearch does not offer any easy way to mass deploy its nodes. You can of course use a 3rd party tools like Puppet or Chef to do that.
Few days ago Elasticsearch announced their new Elastic stack that is still in alpha and supposed to provide a central administration and provisioning for all the organization’s clusters (see here), but I did not have that chance to install or test it yet.
Elasticsearch nodes can form a cluster in one of two ways: multicast or unicast.
In multicast, no configuration is needed besides setting the cluster name in all nodes configuration file to the same name.
The nodes should ping and discover each other and join the cluster automatically. The problem is that it does not always work and it is sensitive to network changes.
On my setup I could not get it to work (maybe it’s because I use my router as DNS).
Unicast is more reliable and thus recommended for production environments, and it requires just a bit more configuration.
So we will focus on the unicast method here.
You should edit the configuration file (Its default location is /etc/elasticsearch/elasticsearch.yml) this way:
First, choose a name for the cluster and change it in
to be the same on all nodes.
Then, give each node a unique name by changing this parameter:
If you do not give a name to each node, Elasticsearch will give it a random Marvel character name which will not persist across restarts, so it’s a good idea to give it a fixed name.
uncomment this line:
discovery.zen.ping.unicast.hosts: ["host1", "host2"]
and add this line to disable multicast detection:
Note: This post was written before Elasticsearch 5.x was released. In version 5.x on, only unicast is supported and you do not have to explicitly disable multicast. Trying to include the above line in the configuration file of version 5.x will raise an error.
Now, change the parameter
to include some of your real nodes. There is no need to include all the nodes, since each node connects to one of the nodes listed in the list and then gets the full list of nodes directly from this node, not from the list.
So you should just specify two or three nodes for redundancy.
For example, I have set up a three node cluster with nodes ES1, ES2 and ES3, and in the list I only specified ES1.
It is also a good idea to set
to (number of nodes/2+1) to prevent split-brain scenario. So in my case I set it to 2.
Here is a summary of the changes I made to my configuration file:
Cluster.name: guy Node.name: node-x (different for each node) discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["ES1.lan"] (in production you should specify more than one node here) discovery.zen.minimum_master_nodes: 2
Then I restarted the service on all nodes.
Now if we go to ES1 and look at its cluster log (/var/log/elasticsearch/guy.log), we can see that the other two nodes join the cluster:
We can also look in Elastic HQ and see that cluster guy has three nodes: