ElasticSearch Notes

Page Contents

Intro

See: https://github.com/codingexplained/complete-guide-to-elasticsearch
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html
Elasticsearch is the distributed search and analytics engine ... Elasticsearch provides near real-time search and analytics for all types of data.
Often used for Application Performance Management (APM)
- Analayse application logs and system metrics, most often to detect errors or understand resource usage etc.
Log stash
- Input > Filters > Output
- Each stage can have plugins
- E.g. reading in log files
  - File is input and log stash treats each line as an event
  - Filter stage parses the input data to make sense of it: i.e., structure unstructured data
    - E.g. use of "grok pattern"
    - Take raw line and turn into fields, to send for eg to ElasticSearch
- Allows seperation of concerns - the apps sending the data don't need to known about how it should be processed - logstash handles that logic.
X-Pack
- Add user authentication and access control to ElasticStash
- Monitoring
- Reporting
- Elasticsearch SQL
  - Normally used Query DSL - this makes it easier for SQL-familiaar developers
  - Translates SQL to Query DSL.
  - Translate APIs also exist
  - Helper tool to get started - Query DSL probably best to use once familiar
Beats
- Collection of light weight data shippers
- Single pupose - send data to logstash or ES.
  - e.g. FileBeat - sends log files to ES
  - e.g. MetricBeat - sends resource usage info to ES.
Single node Docker image for learning:
- docker pull docker.elastic.co/elasticsearch/elasticsearch:7.16.3
- docker run -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.16.3
  - Took quite some time to start up for me on an i7 with 16bg ram although system was doing other things.
- curl http://127.0.0.1:9200 to figure out if it is running okay
Run Kibana and ES:
- See https://www.elastic.co/guide/en/kibana/current/docker.html
- docker network create elastic docker pull docker.elastic.co/elasticsearch/elasticsearch:7.16.3 docker run --name es01-test --net elastic -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.16.3
- docker pull docker.elastic.co/kibana/kibana:7.16.3 docker run --name kib01-test --net elastic -p 127.0.0.1:5601:5601 -e "ELASTICSEARCH_HOSTS=http://es01-test:9200" docker.elastic.co/kibana/kibana:7.16.3
- To access Kibana, go to http://localhost:5601

Some ES directory structure:

Notable bin utils

 |-- bin
 |   |-- ...
 |   |-- elasticsearch-cli
 |   |-- elasticsearch-plugin     # Install plugins
 |   |-- elasticsearch-sql-cli    # Do SQL-like queries instead of using Query DSM
 |

Config

|-- config
|   |-- elasticsearch-plugins.example.yml
|   |-- elasticsearch.keystore
|   |-- elasticsearch.yml                 #<<< This is the main config file
|   |-- jvm.options                       #<<< Runs on JVM - HEAP size best thing to mod.
|   |-- jvm.options.d
|   |-- log4j2.file.properties
|   |-- log4j2.properties
|   |-- role_mapping.yml
|   |-- roles.yml
|   |-- users
|   `-- users_roles

elasticsearch.yaml'
            * Commented out by default so defaults are used
            *

cluster.name

: Best practice to set this!
            *

node.name`: Best practice to set this!

Basic Architecture
- Node: Esentially an ES instance
  - Can run many nodes. Each node can store part of data set: distributed storage = large storage.
  - Node == ES instance so can run many nodes on one machine.
  - Each node belongs to a cluster
  - Node is always part of a cluster, even if single node in cluster.
- Cluster - collection of nodes.
  - Split is normally for logical seperation.
  - Document - A unit of data stored in a cluster.
    - JSON object.
    - Index: Every document is stored within an index.
      - Groups documents together logicially.
      - Provide scalability and availablility settings.
      - Search queries are run against indicies.
Basic cURL queries (Kibana makes this waaay easier!)
- Local ES
  - curl -XGET "http://localhost:9200/_cluster/health"
  - curl -XGET "http://localhost:9208/_cat/indicies?v"
  - curl -XGET "http://localhost:9208/.kibana/_search" -H 'Content-Type: applicaton/json' -d'{ "query": { "match_all": {} }}'
- Cloud ES - needs authentication
  - `curl -XGET -u username:password "https://3aodfff....sa.eu-central.aws.cloud.es.io:9243/.kibana/_search" -H 'Content-Type: applicaton/json' -d'{ "query": { "match_all": {} }}'
Sharding & Scalability
- To store 1TB can use 2 nodes of 0.5TB - can aggregate node storage
  - Done using sharding.
  - Sharding is a way to divide inidicies into smaller pieces, where each piece is a shard
  - Done at the index level!
    - One shard must be on a single node and can be placed on any node. Many to one.
  - Horizontally scale the data volume.
  - Each shard in an Apache Lucene index
  - Can improve performance by running queries on multiple shards at the same time.

Links...

ElasticSearch Notes

Page Contents

Intro