Using Logstash to load csv file into Elasticsearch

Logstash is a great tool offered by Elasticsearch itself for transferring data between Elasticsearch and various other sources/targets. It uses plugin technology so it is very versatile and except the official plugins there are many 3rd party plugins that fill the gap and cover almost every existing technology.

You can find some information about available plugins in the documentation here.

The first experiment will be indexing the contents of a csv file.

To install Logstash using yum:

First, get the Elasticsearch PGP key:

rpm --import

Then create the yum repository and install logstash:

echo "[logstash-2.3]

name=Logstash repository for 2.3.x packages




enabled=1" >> /etc/yum.repos.d/logstash.repo

yum install logstash

export PATH=$PATH:/opt/logstash/bin

We now have to find the Logstash configuration file. We can look at


and see that


In this directory we should create a file called logstash.conf with the following content:


input {
file {
path => "/home/hive/sampledata.csv"
type => "record"
start_position => "beginning"
filter {
csv {
columns => ["line","value","message"]
separator => ","
output {
elasticsearch {
action => "index"
hosts => ""
index => "sampledata1"
workers => 1

You can see that the file includes Three sections: input, filter and ourput.

The input part specifies the location of the file, the type (which can be anything) and weather to start processing from the beginning of the file.

Logstash file input is designed mainly for ingesting of log files, where it should monitor the end of the file for added lines. This is the default behavior and if we want to change it we have to specify start_position=beginning.

The filter specifies the structure of the lines in the csv file and the separator.

The output section specifies the action that logstash needs to do, the host where Elasticseach runs (a node) the index name to create and how many concurrent workers to use.

Now we should restart logstash service or just run it from command line, specifying the config file:

service logstash restart


logstash agent –f /etc/logstash/conf.d/logstash.conf

It is not very verbose and the only message you will get is “Pipeline main started”, but you can see that the index was created and you can see that documents start to pile up in it until it ingests all the csv file contents.

Here is the index in elastic HQ:

View full size image

If you need to set the index mapping in advance, you can do it and use the same commands mentioned above and it will work the same way.


Logstash keeps the current location in the file in a special file called “sincedb”. It monitors the file for changes and any lines added to the file later will be sent to Elasticsearch and stored in the index (similar to tail command in Linux/Unix). The “start_position” parameter is only used in first time processing of the file to indicate if the file should be read from the beginning or just have to be monitored for new lines. Once the offset data is in the sincedb file, Logstash doesn’t use start_position anymore.

This entry was posted in ElasticSearch, Logstash and tagged , , . Bookmark the permalink.

Leave a Reply