ElasticSearch Notes
Page Contents
Intro
- See: https://github.com/codingexplained/complete-guide-to-elasticsearch
- See: https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html
- 
Elasticsearch is the distributed search and analytics engine ... Elasticsearch provides near real-time search and analytics for all types of data. 
- Often used for Application Performance Management (APM)- Analayse application logs and system metrics, most often to detect errors or understand resource usage etc.
 
- Log stash- Input > Filters > Output
- Each stage can have plugins
- E.g. reading in log files- File is input and log stash treats each line as an event
- Filter stage parses the input data to make sense of it: i.e., structure unstructured data- E.g. use of "grok pattern"
- Take raw line and turn into fields, to send for eg to ElasticSearch
 
 
- Allows seperation of concerns - the apps sending the data don't need to known about how it should be processed - logstash handles that logic.
 
- X-Pack- Add user authentication and access control to ElasticStash
- Monitoring
- Reporting
- Elasticsearch SQL- Normally used Query DSL - this makes it easier for SQL-familiaar developers
- Translates SQL to Query DSL.
- Translate APIs also exist
- Helper tool to get started - Query DSL probably best to use once familiar
 
 
- Beats- Collection of light weight data shippers
- Single pupose - send data to logstash or ES.- e.g. FileBeat - sends log files to ES
- e.g. MetricBeat - sends resource usage info to ES.
 
 
- Single node Docker image for learning:- docker pull docker.elastic.co/elasticsearch/elasticsearch:7.16.3
- docker run -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.16.3- Took quite some time to start up for me on an i7 with 16bg ram although system was doing other things.
 
- curl http://127.0.0.1:9200to figure out if it is running okay
 
- Run Kibana and ES:- See https://www.elastic.co/guide/en/kibana/current/docker.html
- docker network create elastic docker pull docker.elastic.co/elasticsearch/elasticsearch:7.16.3 docker run --name es01-test --net elastic -p 127.0.0.1:9200:9200 -p 127.0.0.1:9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.16.3
- docker pull docker.elastic.co/kibana/kibana:7.16.3 docker run --name kib01-test --net elastic -p 127.0.0.1:5601:5601 -e "ELASTICSEARCH_HOSTS=http://es01-test:9200" docker.elastic.co/kibana/kibana:7.16.3
- To access Kibana, go to http://localhost:5601
 
- Some ES directory structure:- Notable binutils|-- bin | |-- ... | |-- elasticsearch-cli | |-- elasticsearch-plugin # Install plugins | |-- elasticsearch-sql-cli # Do SQL-like queries instead of using Query DSM |
- Config
   *|-- config | |-- elasticsearch-plugins.example.yml | |-- elasticsearch.keystore | |-- elasticsearch.yml #<<< This is the main config file | |-- jvm.options #<<< Runs on JVM - HEAP size best thing to mod. | |-- jvm.options.d | |-- log4j2.file.properties | |-- log4j2.properties | |-- role_mapping.yml | |-- roles.yml | |-- users | `-- users_roleselasticsearch.yaml' * Commented out by default so defaults are used *cluster.name: Best practice to set this! *node.name`: Best practice to set this!
 
- Notable 
- Basic Architecture- Node: Esentially an ES instance- Can run many nodes. Each node can store part of data set: distributed storage = large storage.
- Node == ES instance so can run many nodes on one machine.
- Each node belongs to a cluster
- Node is always part of a cluster, even if single node in cluster.
 
- Cluster - collection of nodes.- Split is normally for logical seperation.
- Document - A unit of data stored in a cluster.- JSON object.
- Index: Every document is stored within an index.- Groups documents together logicially.
- Provide scalability and availablility settings.
- Search queries are run against indicies.
 
 
 
 
- Node: Esentially an ES instance
- Basic cURL queries (Kibana makes this waaay easier!)- Local ES- curl -XGET "http://localhost:9200/_cluster/health"
- curl -XGET "http://localhost:9208/_cat/indicies?v"
- curl -XGET "http://localhost:9208/.kibana/_search" -H 'Content-Type: applicaton/json' -d'{ "query": { "match_all": {} }}'
 
- Cloud ES - needs authentication- `curl -XGET -u username:password "https://3aodfff....sa.eu-central.aws.cloud.es.io:9243/.kibana/_search" -H 'Content-Type: applicaton/json' -d'{ "query": { "match_all": {} }}'
 
 
- Local ES
- Sharding & Scalability- To store 1TB can use 2 nodes of 0.5TB - can aggregate node storage- Done using sharding.
- Sharding is a way to divide inidicies into smaller pieces, where each piece is a shard
- Done at the index level!- One shard must be on a single node and can be placed on any node. Many to one.
 
- Horizontally scale the data volume.
- Each shard in an Apache Lucene index
- Can improve performance by running queries on multiple shards at the same time.
 
 
- To store 1TB can use 2 nodes of 0.5TB - can aggregate node storage