Introduction to Logstash on how to get and send data
Logstash presentation
Logstash is a tool to fetch data from a source and send it to a destination while doing some transformation on it on the fly. It has been specially designed and used to send data to ElasticSearch.
It is able to take multiple entries and output at a time and can make use of conditionals to treat data differently based on criteria and conditionals.
It uses a configuration file in a slighty modified JSON (described below).
Logstash configuration
The Logstash configuration is separated in 3 parts :
-
Input
-
Filter
-
Output
Each part can be in a different file (in the same folder) and we only need to give the folder to Logstash (it will concatenate the different files). |
Input
The different inputs are in the field
and you can put more than one in it.input
There is a list of different input plugins and we will see a few of them here.
A lot of fields are set by default and are fine like that for a simple utilisation. For more parameters do check the link above. |
Read a file on a local file system
input { file { start_position => "beginning" sincedb_path => "/pathToSincedb/sincedb" path => "/pathToYourData/*" } }
What does it do ?
This will read all the files that are in the folder
since the beginning of the file line by line.pathToYourData
The
let you set the path of the sincedb file that save which file you have already read and to where. So if your file has been updated the modification are taken into account by Logstash.sincedb_path ⇒ "/pathToSincedb/sincedb"
Logstash will ignore files that haven’t been modified for more than 24 hours. A little will resolve this problem
|
Read data from Kafka
input { kafka{ topic => ["myTopic1", "myTopic2"] auto_offset_reset => "earliest" bootstrap_servers => ["localhost:9092", "localhost:9093"] } }
What does it do ?
This configuration will make Logstash consume on the topics
and myTopic1
from the last offset commited or the earliest message if there is no offest (with the field myTopic2
) on the IPs auto_offset_reset
and localhost:9092
localhost:9093
Read data from ElasticSearch
It is particulary imporant to be able to read from ElasticSearch for reindexing or simply get the data to put it elsewhere.
input { elasticsearch { hosts => ["localhost"] index => "myIndex" query => "*" } }
What does it do ?
This is one of the simplest configuration that will take all the data from the
mapping.myIndex
Read data from Filebeat
input { beats { port => "5044" } }
What does it do ?
With this configuration, Logstash will listen to the port 5044 where Filebeat is supposed to send data.
Read data from multiple input
If we want to have multiple output we only have to put them in the input field one after another.
input { file { start_position => "beginning" sincedb_path => "/pathToSincedb/sincedb" path => "/pathToYourData/*" } kafka { topic => ["myTopic1", "myTopic2"] auto_offset_reset => "earliest" bootstrap_servers => ["localhost:9092", "localhost:9093"] } elasticsearch { hosts => ["localhost"] index => "myIndex" query => "*" } beats { port => "5044" } }
Determine the origin of data
If you have multiple output you won’t know in the filter and output where you data come from as you don’t have a notion of pipeline in logstash. To keep this information you have to put a tag in the
like below.input
input { file { tags => "FILE" start_position => "beginning" sincedb_path => "/pathToSincedb/sincedb" path => "/pathToYourData/*" } kafka { tags => "KAFKA" topic => ["myTopic1", "myTopic2"] auto_offset_reset => "earliest" bootstrap_servers => ["localhost:9092", "localhost:9093"] } }
With that, in the
and filter
we will only have to test the tag and know where our data come from.output
Filter
WORK IN PROGRESS - COME BACK LATER FOR SOME MORE AMAZING CONTENT !