Difference between revisions of "Elastic Search, Logstash and Kibana"

From Gridkaschool
(Installation and configuration of an nginx reverse proxy)
Line 245: Line 245:
   
 
We will show you how to do it. When you are ready, call us.
 
We will show you how to do it. When you are ready, call us.
  +
  +
== Exercise 2: Daily value of Yahoo stock market (JSON import) ==
  +
  +
You have previously downloaded the .csv file and imported to Elasticsearch using the filter plugin csv.
  +
It is possible to ingest JSON data directly to Elasticsearch. That is the reason why we are going to convert the csv file to JSON format and see how the direct JSON import to Elasticsearch works.
  +
  +
=== csvkit ===
  +
  +
The web page can be found here:
  +
http://csvkit.readthedocs.org/en/0.9.1/
  +
  +
Install the following packages and csvkit using pip (The PyPA recommended tool for installing Python packages):
  +
# aptitude install python-dev python-pip python-setuptools build-essential
  +
# pip install csvkit
  +
  +
Converting csv2json with the --stream (one-line) option:
  +
# csvjson --stream yahoo-stock.csv > yahoo-stream.json
  +
  +
Delete the stock index:
  +
# curl -XDELETE 'localhost:9200/stock?pretty'
  +
  +
Check that it has been deleted:
  +
# curl 'localhost:9200/_cat/indices?v'
  +
  +
Try to import the obtained JSON file to Elasticsearch with the following command:
  +
# curl -XPOST 'localhost:9200/stock/stock-market/_bulk?pretty' --data-binary @yahoo-stream-ELK.json
  +
  +
There will be an error because Elasticsearch needs the _id field for every JSON line.
  +
  +
We can solve this problem manually with a bash script (you can also try perl, python, awk ...)
  +
# COUNT=1; while read line; do echo {\"index\":{\"_id\":\"$COUNT\"}}; echo $line; COUNT=$(( $COUNT + 1 )); done < yahoo-stream.json > yahoo-stream-ELK.json
  +
  +
Import the one-single line JSON to Elasticsearch:
  +
# curl -XPOST 'localhost:9200/stock/stock-market/_bulk?pretty' --data-binary @yahoo-stream-ELK.json
  +
  +
Navigate to your Kibana server and check the fields for the stock index. You will see that the fields Date, Open, High, Low, Close, Volume and Adj Close are strings.
  +
  +
You can convert a string to a number in a JSON file, simply removing the double quotes of the numeric fields.
  +
  +
Solution: perl command for removing quotes around numbers:
  +
# perl -pe 's{"((\d+)?(?:\.\d+)?)"}{$1}g' yahoo-stream-ELK.json > out2.json
  +
  +
Delete again the stock index, execute the import of the JSON file and check your Kibana server.
  +
  +
Comment: there are also two codec plugins called json and json_lines. Do you think that they could be used?

Revision as of 00:29, 9 September 2015

Overview

Authors: Samuel Ambroj Pérez, Kajorn Pathomkeerati, [1] [2]

Introduction to Elasticsearch and Logstash

Introduction to Kibana

Installation of ELK in one single machine (Debian 8)

Installation of Elasticsearch

Connect to the first machine (on the left side) that has been provided to you:

ssh gks@141.52.X.X

This machine is going to be used for a while, so do not connect yet to the second VM.

The gks user has sudo rights because is included in the sudoers file, so from your gks user, execute:

sudo -i bash

Update and upgrade all the packages:

# apt-get update
# apt-get upgrade

Change the timezone (optional):

# dpkg-reconfigure tzdata

Installation of aptitude, curl, openjdk (open source java) and chkconfig:

# apt install -y aptitude curl openjdk-7-jdk chkconfig

Download and install the Public Signing Key:

# wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Save the repository definition:

# echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-1.7.list

Update in order to make the repository ready to use and install Elasticsearch:

# aptitude update && aptitude install elasticsearch

Start Elasticsearch:

# /etc/init.d/elasticsearch start

Check the status (two options):

# systemctl status elasticsearch
# /etc/init.d/elasticsearch status

Modify chkconfig in order to start ES when booting:

# chkconfig elasticsearch on

Elasticsearch is now installed. Congratulations! Let's continue with the installation of Logstash and finally Kibana.

Installation of Logstash

Download the .deb file:

# wget -q https://download.elastic.co/logstash/logstash/packages/debian/logstash_1.5.4-1_all.deb 

Install it:

# dpkg -i logstash_1.5.4-1_all.deb

It would be enough, but when preparing the tutorial we saw the following WARNING:

WARN -- Concurrent: [DEPRECATED] Java 7 is deprecated, please use Java 8.
Java 7 support is only best effort, it may not work. It will be removed in next release (1.0).

So, we install the Oracle java, version 8.

Installation of Oracle Java 8

Download the tarball:

# wget --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u60-b27/jdk-8u60-linux-x64.tar.gz

Create the folder /opt/jdk and move the tarball there:

# mkdir /opt/jdk
# mv jdk-8u60-linux-x64.tar.gz /opt/jdk/

Extract the tarball there:

# cd /opt/jdk
# tar -xzvf jdk-8u60-linux-x64.tar.gz

Update alternatives:

# update-alternatives --install /usr/bin/java java /opt/jdk/jdk1.8.0_60/bin/java 100
# update-alternatives --install /usr/bin/javac javac /opt/jdk/jdk1.8.0_60/bin/javac 100

Display the priorities and the version of Java:

# update-alternatives --display java
# java -version

java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-1~deb8u1) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

It is not pointing to the Oracle version, so we increase the value from 100 to 10000:

# update-alternatives --install /usr/bin/java java /opt/jdk/jdk1.8.0_60/bin/java 10000
# update-alternatives --install /usr/bin/javac javac /opt/jdk/jdk1.8.0_60/bin/javac 10000
# java -version

java version "1.8.0_60" Java(TM) SE Runtime Environment (build 1.8.0_60-b27) Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)


Installation of Kibana 4

Download Kibana4:

# wget https://download.elastic.co/kibana/kibana/kibana-4.1.1-linux-x64.tar.gz

Extract the file:

# tar xvf kibana-4.1.1-linux-x64.tar.gz

Move it and change to a shorter name:

# mv kibana-4.1.1-linux-x64/ /opt/
# mv /opt/kibana-4.1.1-linux-x64/ /opt/kibana4

Launch Kibana:

# cd /opt/kibana4/
# ./bin/kibana > /dev/null &

Check that Kibana is working in a browser:

http://141.52.X.X:5601

The access is not secured. In order to make it more secure we are going to install a reverse nginx proxy in the next section.

Installation and configuration of an nginx reverse proxy (Debian 8)

Kibana comes with a plugin called Shield which allows you to easily protect this data with a username and password, while simplifying your architecture. Advanced security features like encryption, role-based access control, IP filtering, and auditing are also available when you need them.

It is necessary a Gold or Platinum subscription to use it. We do not have it for the tutorial, so we secure our Kibana server using an nginx reverse proxy.

The steps are the following.

Install nginx and apache2-utils (see man aptitude in case you do not know the -y option)

# aptitude install -y nginx apache2-utils

Create the folder where the certificate and public key for your server will be saved:

# mkdir /etc/nginx/ssl

Get the certificate and public key:

# openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/nginx/ssl/nginx.key -out /etc/nginx/ssl/nginx.crt

Create the user admin (you can choose any other) which will be allowed to access to your Kibana server (choose a password)

# htpasswd -c /etc/nginx/htpasswd.users admin     

Change the file /etc/nginx/sites-enabled/default to the following one:

# cat /etc/nginx/sites-enabled/default 
server {
    listen 80;
    return 301 https://$host$request_uri;
}
server {
listen 443 ssl;

server_name <your-server>;  

ssl on;
ssl_protocols  SSLv2 TLSv1; # Remove SSLv3 because of security hole!
ssl_ciphers  ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
ssl_prefer_server_ciphers   on;

ssl_certificate /etc/nginx/ssl/nginx.crt;
ssl_certificate_key /etc/nginx/ssl/nginx.key;

auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/htpasswd.users;
location / {
proxy_pass http://localhost:5601;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
  

Restart the nginx service:

# /etc/init.d/nginx restart

In a browser:

141.52.X.X

Accept the certificate and access using your username and password.

Exercise 1: Daily value of Yahoo stock market (using csv plugin)

Due to the fact that we cannot use real data with sensitive information, we are going to see a first example using the publicly available data at Yahoo Finance web page. We have selected the daily evolution of the Yahoo index (YHOO) from the beginning (April 1st, 1996) to today, but you can choose another one.

For the Yahoo index, you can get the data from here:

# wget -O yahoo-stock.csv 'http://ichart.finance.yahoo.com/table.csv?s=YHOO&c=1962'

Take a look at this file:

# vi yahoo-stock.csv

We would like to use Logstash in order to parse the data and send it to Elasticsearch. There is a filter plugin called csv, so take a look at it:

https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html

Please, try to think about the plausible logstash configuration file before checking the following solution (save this file in /etc/logstash/conf.d/<filename>.conf)

input {  
  file {
    path => "/root/csvkit-tutorial/yahoo-stock.csv"
    sincedb_path => "/dev/null"
    start_position => "beginning"    
    type => "stock-market"
  }
}
filter {  
  csv {
      separator => ","
      columns => ["Date","Open","High","Low","Close","Volume","Adj Close"]
  }
  mutate {convert => ["High", "float"]}
  mutate {convert => ["Open", "float"]}
  mutate {convert => ["Low", "float"]}
  mutate {convert => ["Close", "float"]}
  mutate {convert => ["Volume", "float"]}
  mutate {convert => ["Adj Close", "float"]}
}
output {  
    elasticsearch {
        action => "index"
        host => "localhost"
        index => "stock"
        workers => 1
    }
}
 

Execute logstash:

# /opt/logstash/bin/logstash agent -f /etc/logstash/conf.d/stock-market.conf  > /var/log/logstash/execution-log 2>&1 &

Check if the data has been imported to Elasticsearch:

# curl 'localhost:9200/_cat/indices?v'

You will see yellow status for the stock index because we are working with one single host and the number of replicas is distinct from zero.

To solve it, set the property index.number_of_replicas equal to zero in the file /etc/elasticsearch/elasticsearch.yml:

index.number_of_replicas: 0

It is necessary to delete the indices in Elasticsearch and to kill the current logstash process (ps aux | grep logstash).

In order to delete the indices with curl:

# curl -XDELETE 'localhost:9200/stock?pretty'

Run logstash again and check if the data has been imported again.

Now, it is time to play with Kibana (in a browser):

141.52.X.X

We will show you how to do it. When you are ready, call us.

Exercise 2: Daily value of Yahoo stock market (JSON import)

You have previously downloaded the .csv file and imported to Elasticsearch using the filter plugin csv. It is possible to ingest JSON data directly to Elasticsearch. That is the reason why we are going to convert the csv file to JSON format and see how the direct JSON import to Elasticsearch works.

csvkit

The web page can be found here:

http://csvkit.readthedocs.org/en/0.9.1/

Install the following packages and csvkit using pip (The PyPA recommended tool for installing Python packages):

# aptitude install python-dev python-pip python-setuptools build-essential
# pip install csvkit

Converting csv2json with the --stream (one-line) option:

# csvjson --stream yahoo-stock.csv > yahoo-stream.json

Delete the stock index:

# curl -XDELETE 'localhost:9200/stock?pretty'

Check that it has been deleted:

# curl 'localhost:9200/_cat/indices?v'

Try to import the obtained JSON file to Elasticsearch with the following command:

# curl -XPOST 'localhost:9200/stock/stock-market/_bulk?pretty' --data-binary @yahoo-stream-ELK.json

There will be an error because Elasticsearch needs the _id field for every JSON line.

We can solve this problem manually with a bash script (you can also try perl, python, awk ...)

# COUNT=1; while read line; do echo {\"index\":{\"_id\":\"$COUNT\"}}; echo $line; COUNT=$(( $COUNT + 1 )); done < yahoo-stream.json > yahoo-stream-ELK.json

Import the one-single line JSON to Elasticsearch:

# curl -XPOST 'localhost:9200/stock/stock-market/_bulk?pretty' --data-binary @yahoo-stream-ELK.json

Navigate to your Kibana server and check the fields for the stock index. You will see that the fields Date, Open, High, Low, Close, Volume and Adj Close are strings.

You can convert a string to a number in a JSON file, simply removing the double quotes of the numeric fields.

Solution: perl command for removing quotes around numbers:

# perl -pe 's{"((\d+)?(?:\.\d+)?)"}{$1}g' yahoo-stream-ELK.json > out2.json

Delete again the stock index, execute the import of the JSON file and check your Kibana server.

Comment: there are also two codec plugins called json and json_lines. Do you think that they could be used?