vlambda博客
学习文章列表

读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

Getting Started

在本章中,我们将介绍以下食谱:

  • Downloading and installing Elasticsearch
  • Setting up networking
  • Setting up a node
  • Setting up Linux systems
  • Setting up different node types
  • Setting up a coordinator node
  • Setting up an ingestion node
  • Installing plugins in Elasticsearch
  • Removing a plugin
  • Changing logging settings
  • Setting up a node via Docker
  • Deploying on Elasticsearch Cloud Enterprise

Technical requirements

Elasticsearch 在 Linux/macOS X/Windows 上运行,其唯一要求是安装 Java 8.x 。通常,我建议使用 Oracle JDK,它位于 https://github.com/aparo/elasticsearch- 7.x 食谱

If you don't want to go into the details of installing and configuring your Elasticsearch instance, for a quick start, you can skip to the  Setting up a node via Docker recipe at the end of this chapter and fire up Docker Compose, which will install an Elasticsearch instance with Kibana and other tools quickly.

Downloading and installing Elasticsearch

Elasticsearch 有一个活跃的社区,发布周期非常快。

因为 Elasticsearch 依赖于许多常见的 Java 库(Lucene、Guice 和 Jackson 是最著名的库),所以 Elasticsearch 社区试图让它们保持更新并修复在它们和 Elasticsearch 核心中发现的错误。庞大的用户群也是改进 Elasticsearch 用例的新想法和新功能的来源。

出于这些原因,如果可能,最好使用最新的可用版本(通常是更稳定且没有错误的版本)。

Getting ready

要安装 Elasticsearch,您需要支持的操作系统 (Linux/macOS X/Windows) 和 Java Java 虚拟机 (JVM) 1.8 或 更高的 已安装 (首选 Sun Oracle JDK。有关这方面的更多信息,请访问 http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)。下载 Elasticsearch 二进制版本需要 Web 浏览器。安装 Elasticsearch 至少需要 1 GB 的可用磁盘空间。

How to do it…

我们将从从网上下载 Elasticsearch 开始。最新版本始终可在 https://www.elastic.co/downloads/elasticsearch下载。可用于不同操作系统的版本如下:

  • elasticsearch-{version-number}.zip and elasticsearch-{version-number}.msi are for the Windows operating systems.
  • elasticsearch-{version-number}.tar.gz is for Linux/macOS X, while elasticsearch-{version-number}.deb is for Debian-based Linux distributions (this also covers the Ubuntu family); this is installable with Debian using the dpkg -i elasticsearch-*.deb command.
  • elasticsearch-{version-number}.rpm is for Red Hat-based Linux distributions (this also covers the Cent OS family). This is installable with the rpm -i elasticsearch-*.rpm command.
The preceding packages contain everything to start Elasticsearch. This book targets version 7.x or higher. The latest and most stable version of Elasticsearch was 7.0.0. To check out whether this is the latest version or not, visit  https://www.elastic.co/downloads/elasticsearch.

提取二进制内容。为您的平台下载正确的版本后,安装涉及在工作目录中展开存档。

Choose a working directory that is safe to charset problems and does not have a long path. This prevents problems when Elasticsearch creates its directories to store index data.

对于 Windows 平台,安装 Elasticsearch 的好目录可能是 c:\es,在 Unix 上和 /opt/es 在 macOS X 上。

要运行 Elasticsearch,您需要安装 JVM 1.8 或更高版本。为了获得更好的性能,我建议您使用最新的 Sun/Oracle 版本。

If you are a macOS X user and you have installed Homebrew ( http://brew.sh/ ), the first and the second steps are automatically managed by the brew install elasticsearch command .

让我们启动 Elasticsearch 来检查一切是否正常。要启动您的 Elasticsearch 服务器,只需访问该目录, 对于 Linux 和 macOS X 执行以下操作

# bin/elasticsearch

或者,您可以为 Windows 键入以下命令行:

# bin\elasticserch.bat

您的服务器现在应该启动并显示日志 类似 到以下:

[2018-10-28T16:19:41,189][INFO ][o.e.n.Node ] [] initializing ...
 [2018-10-28T16:19:41,245][INFO ][o.e.e.NodeEnvironment ] [fyBySLM] using [1] data paths, mounts [[/ (/dev/disk1s1)]], net usable_space [141.9gb], net total_space [465.6gb], types [apfs]
 [2018-10-28T16:19:41,246][INFO ][o.e.e.NodeEnvironment ] [fyBySLM] heap size [989.8mb], compressed ordinary object pointers [true]
 [2018-10-28T16:19:41,247][INFO ][o.e.n.Node ] [fyBySLM] node name derived from node ID [fyBySLMcR3uqKiYC32P5Sg]; set [node.name] to override
 [2018-10-28T16:19:41,247][INFO ][o.e.n.Node ] [fyBySLM] version[6.4.2], pid[50238], build[default/tar/04711c2/2018-09-26T13:34:09.098244Z], OS[Mac OS X/10.14/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_181/25.181-b13]
 [2018-10-28T16:19:41,247][INFO ][o.e.n.Node ] [fyBySLM] JVM arguments [-Xms1g, -Xmx1g,
... truncated ...
 [2018-10-28T16:19:42,511][INFO ][o.e.p.PluginsService ] [fyBySLM] loaded module [aggs-matrix-stats]
 [2018-10-28T16:19:42,511][INFO ][o.e.p.PluginsService ] [fyBySLM] loaded module [analysis-common]
 ...truncated...
[2018-10-28T16:19:42,513][INFO ][o.e.p.PluginsService ] [fyBySLM] no plugins loaded
 ...truncated...
[2018-10-28T16:19:46,776][INFO ][o.e.n.Node ] [fyBySLM] initialized
 [2018-10-28T16:19:46,777][INFO ][o.e.n.Node ] [fyBySLM] starting ...
 [2018-10-28T16:19:46,930][INFO ][o.e.t.TransportService ] [fyBySLM] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
 [2018-10-28T16:19:49,983][INFO ][o.e.c.s.MasterService ] [fyBySLM] zen-disco-elected-as-master ([0] nodes joined)[, ], reason: new_master {fyBySLM}{fyBySLMcR3uqKiYC32P5Sg}{-pUWNdRlTwKuhv89iQ6psg}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
 ...truncated...
[2018-10-28T16:19:50,452][INFO ][o.e.l.LicenseService ] [fyBySLM] license [b2754b17-a4ec-47e4-9175-4b2e0d714a45] mode [basic] - valid

How it works…

Elasticsearch 包一般包含以下目录:

  • bin: This contains the scripts to start and manage Elasticsearch.
  • elasticsearch.bat: This is the main executable script to start Elasticsearch.
  • elasticsearch-plugin.bat: This is a script to manage plugins.
  • config: This contains the Elasticsearch configs. The most important ones are as follows:
    • elasticsearch.yml: This is the main config file for Elasticsearch
    • log4j2.properties: This is the logging config file
  • lib: This contains all the libraries required to run Elasticsearch.
  • logs: This directory is empty at installation time, but in the future, it will contain the application logs.
  • modules: This contains the Elasticsearch default plugin modules.
  • pluginsThis directory is empty at installation time, but it's the place where custom plugins will be installed.

在 Elasticsearch 启动期间,会发生以下事件:

  • A node name is generated automatically (that is, fyBySLM) if it is not provided in elasticsearch.yml. The name is randomly generated, so it's a good idea to set it to a meaningful and memorable name instead.
  • A node name hash is generated for this node, for example, fyBySLMcR3uqKiYC32P5Sg.
  • The default installed modules are loaded. The most important ones are as follows:
    • aggs-matrix-stats: This provides support for aggregation matrix stats.
    • analysis-common: This is a common analyzer for Elasticsearch, which extends the language processing capabilities of Elasticsearch.
    • ingest-common: These include common functionalities for the ingest module.
    • lang-expression/lang-mustache/lang-painless: These are the default supported scripting languages of Elasticsearch. 
    • mapper-extras: This provides an extra mapper type to be used, such as token_count and scaled_float.
    • parent-join: This provides an extra query, such as has_children and has_parent.
    • percolator: This provides percolator capabilities.
    • rank-eval: This provides support for the experimental rank evaluation APIs. These are used to evaluate hit scoring based on queries.
    • reindex: This provides support for reindex actions (reindex/update by query).
    • x-pack-*: All the xpack modules depend on a subscription for their activation.
  • If there are plugins, they are loaded.
  • If not configured, Elasticsearch binds the following two ports on the localhost 127.0.0.1 automatically:
    • 9300: This port is used for internal intranode communication.
    • 9200: This port is used for the HTTP REST API.
  • After starting, if indices are available, they are restored and ready to be used.

如果这些端口号已经绑定,Elasticsearch 会自动增加端口号并尝试绑定它们,直到端口可用(即 92019202 等)。

在 Elasticsearch 启动期间会触发更多事件。我们将在其他食谱中详细了解它们。

There's more…

在节点启动期间,会自动启动许多所需的服务。最重要的如下:

  • Cluster services: This helps you manage the cluster state and intranode communication and synchronization
  • Indexing service: This helps you manage all the index operations, initializing all active indices and shards
  • Mapping service: This helps you manage the document types stored in the cluster (we'll discuss mapping in Chapter 2, Managing Mapping)
  • Network services: This includes services such as HTTP REST services (default on port 9200), and internal Elasticsearch protocol (port 9300) if the thrift plugin is installed
  • Plugin service: This manages loading the plugin 
  • Aggregation services: This provides advanced analytics on stored Elasticsearch documents such as statistics, histograms, and document grouping
  • Ingesting services: This provides support for document preprocessing before ingestion such as field enrichment, NLP processing, types conversion, and automatic field population
  • Language scripting services: This allows you to add new language scripting support to Elasticsearch

See also

Setting up networking

正确设置网络对您的节点和集群非常重要。

有很多不同的安装场景和网络问题。配置节点构建集群的第一步是正确设置节点发现。

Getting ready

要更改配置文件,您将需要一个有效的 Elasticsearch 安装和一个简单的文本编辑器,以及您当前的网络配置(您的 IP)。

How to do it…

要设置网络,请使用以下步骤:

  1. Using a standard Elasticsearch configuration config/elasticsearch.yml file, your node will be configured to bind on the localhost interface (by default) so that it can't be accessed by external machines or nodes.
  2. To allow another machine to connect to our node, we need to set network.host to our IP (for example, I have 192.168.1.164).
  3. To be able to discover other nodes, we need to list them in the discovery.zen.ping.unicast.hosts parameter. This means that it sends signals to the machine in a unicast list and waits for a response. If a node responds to it, they can join in a cluster.
  1. In general, from Elasticsearch version 6.x, the node versions are compatible. You must have the same cluster name (the cluster.name option in elasticsearch.yml) to let nodes join with each other.
The best practice is to have all the nodes installed with the same Elasticsearch version (major.minor.release). This suggestion is also valid for third-party plugins.
  1. To customize the network preferences, you need to change some parameters in the elasticsearch.yml file, as follows:
cluster.name: ESCookBook
node.name: "Node1"
network.host: 192.168.1.164
discovery.zen.ping.unicast.hosts: ["192.168.1.164","192.168.1.165[9300-9400]"]
  1. This configuration sets the cluster name to Elasticsearch, the node name, the network address, and it tries to bind the node to the address given in the discovery section by performing the following tasks:
    • We can check the configuration during node loading
    • We can now start the server and check whether the networking is configured, as follows:
    [2018-10-28T17:42:16,386][INFO ][o.e.c.s.MasterService ] [Node1] zen-disco-elected-as-master ([0] nodes joined)[, ], reason: new_master {Node1}{fyBySLMcR3uqKiYC32P5Sg}{IX1wpA01QSKkruZeSRPlFg}{192.168.1.164}{192.168.1.164:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
     [2018-10-28T17:42:16,390][INFO ][o.e.c.s.ClusterApplierService] [Node1] new_master {Node1}{fyBySLMcR3uqKiYC32P5Sg}{IX1wpA01QSKkruZeSRPlFg}{192.168.1.164}{192.168.1.164:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {Node1}{fyBySLMcR3uqKiYC32P5Sg}{IX1wpA01QSKkruZeSRPlFg}{192.168.1.164}{192.168.1.164:9300}{ml.machine_memory=17179869184, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)[, ]]])
     [2018-10-28T17:42:16,403][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [Node1] publish_address {192.168.1.164:9200}, bound_addresses {192.168.1.164:9200}
     [2018-10-28T17:42:16,403][INFO ][o.e.n.Node ] [Node1] started
     [2018-10-28T17:42:16,600][INFO ][o.e.l.LicenseService ] [Node1] license [b2754b17-a4ec-47e4-9175-4b2e0d714a45] mode [basic] - valid

    从我的屏幕转储中可以看到,传输绑定到 192.168.1.164:9300。 REST HTTP 接口绑定到 192.168.1.164:9200

    How it works…

    以下是网络管理的主要重要配置键:

    • cluster.name: This sets up the name of the cluster. Only nodes with the same name can join together.
    • node.name: If not defined, this is automatically assigned by Elasticsearch.

    node.name 允许为节点定义一个名称。如果您在不同的机器上有很多节点,将它们的名称设置为有意义的名称以便轻松定位它们很有用。使用有效的名称比生成的名称(例如 fyBySLMcR3uqKiYC32P5Sg)更容易记住。

    You must always  set up a node.name if you need to monitor your server. Generally, a node name is the same as a host server name for easy maintenance.

    network.host 定义了用于绑定节点的机器的 IP。如果您的服务器位于不同的 LAN 上,或者您希望将绑定限制在一个 LAN 上,则必须使用您的服务器 IP 设置此值。

    discovery.zen.ping.unicast.hosts 允许您定义主机列表(带有端口或端口范围)以用于发现其他节点以加入集群。首选端口是传输端口,通常是 9300

    主机列表的地址可以是以下几种的混合:

    • Hostname, that is, myhost1
    • IP address, that is, 192.168.1.12
    • IP address or hostname with the port, that is, myhost1:9300, 192.168.168.1.2:9300
    • IP address or hostname with a range of ports, that is, myhost1:[9300-9400], 192.168.168.1.2:[9300-9400]

    See also

    本章中的设置节点秘诀

    Setting up a node

    Elasticsearch 允许在安装中自定义多个参数。在这个秘籍中,我们将看到最常用的定义数据存储位置和提高整体性能的方法。

    Getting ready

    下载和安装 Elasticsearch 配方中所述,您需要一个有效的 Elasticsearch 安装和一个简单的文本编辑器来更改配置文件。

    How to do it…

    设置简单节点所需的步骤如下:

    1. Open the config/elasticsearch.yml file with an editor of your choice.
    2. Set up the directories that store your server data, as follows:
    • For Linux or macOS X, add the following path entries (using /opt/data as the base path):
    path.conf: /opt/data/es/conf
    path.data: /opt/data/es/data1,/opt2/data/data2
    path.work: /opt/data/work
    path.logs: /opt/data/logs
    path.plugins: /opt/data/plugins

    • For Windows, add the following path entries (using c:\Elasticsearch as the base path):
    path.conf: c:\Elasticsearch\conf
    path.data: c:\Elasticsearch\data
    path.work: c:\Elasticsearch\work
    path.logs: c:\Elasticsearch\logs
    path.plugins: c:\Elasticsearch\plugins
    1. Set up the parameters to control the standard index shard and replication at creation. These parameters are as follows:
    index.number_of_shards: 1
    index.number_of_replicas: 1

    How it works…

    path.conf 参数定义了包含你的配置的目录,主要是elasticsearch.ymllogging.yml。默认为 $ES_HOME/configES_HOME 用于安装 Elasticsearch 服务器的目录。

    It's useful to set up the config directory outside your application directory so that you don't need to copy the configuration files every time you update your Elasticsearch server.

    path.data 参数是最重要的一个。这允许我们定义一个或多个目录(在不同的磁盘中),您可以在其中存储索引数据。当您定义多个目录时,它们的管理方式类似于 RAID 0(它们的空间是总和),优先选择具有最多可用空间的位置。

    path.work parameter 是 Elasticsearch 存储临时文件的位置。

    path.log parameter 是放置日志文件的地方。这些控制如何在 logging.yml 中管理日志。

    path.plugins parameter 允许您覆盖插件路径(默认为 $ES_HOME/plugins)。将系统范围的插件放在共享路径中(通常使用 NFS)很有用,以防您想要一个地方来存储所有集群的插件。

    主要参数用于控制索引和分片,index.number_of_shards控制新创建索引的标准分片数,index.number_of_replicas控制初始副本数。

    See also

    Setting up Linux systems

    如果您使用的是 Linux 系统(通常在生产环境中),则需要管理额外的设置以提高性能或解决具有许多索引的生产问题。

    这个秘籍涵盖了生产中发生的以下两个常见错误:

    • Too many open files that can corrupt your indices and your data
    • Slow performance in search and indexing due to the garbage collector
    Big problems arise when you run out of disk space. In this scenario, some files can get corrupted. To prevent your indices from corruption and possible data, it is best to monitor the storage spaces. Default settings prevent index writing and block the cluster if your storage is over 80% full.

    Getting ready

    正如我们在 下载和安装Elasticsearch 本章中所描述的,y 你需要一个有效的 Elasticsearch 安装和一个简单的文本编辑器来更改配置文件。

    How to do it…

    为了提高 Linux 系统的性能,我们将执行以下步骤:

    1. First, you need to change the current limit for the user that runs the Elasticsearch server. In these examples, we will call this elasticsearch.
    2. To allow Elasticsearch to manage a large number of files, you need to increment the number of file descriptors (number of files) that a user can manage. To do so, you must edit your /etc/security/limits.conf file and add the following lines at the end:
    elasticsearch - nofile 65536
    elasticsearch - memlock unlimited
    1. Then, a machine restart is required to be sure that the changes have been made.
    2. The new version of Ubuntu (that is, version 16.04 or later) can skip the /etc/security/limits.conf file in the init.d scripts. In these cases, you need to edit /etc/pam.d/ and remove the following comment line:
    # session required pam_limits.so
    1. To control memory swapping, you need to set up the following parameter in elasticsearch.yml:
    bootstrap.memory_lock
    1. To fix the memory usage size of the Elasticsearch server, we need to set up the same values for Xmsand Xmx in $ES_HOME/config/jvm.options (that is, we set 1 GB of memory in this case), as follows:
    -Xms1g
    -Xmx1g

    How it works…

    文件描述符的标准限制(https://www.bottomupcs.com/file_descriptors.xhtml)(最大打开数用户的文件)通常为 1,024 或 8,096。当您在多个索引中存储大量记录时,您会很快用完文件描述符,因此您的 Elasticsearch 服务器变得无响应并且您的索引可能会损坏,从而导致您丢失数据。

    将限制更改为非常高的数字意味着您的 Elasticsearch 不会达到打开文件的最大数量。

    内存的另一个设置阻止 Elasticsearch 交换内存并在环境中提高性能。此设置是必需的,因为在索引和搜索期间,Elasticsearch 会在内存中创建和销毁大量对象。大量的创建/销毁操作会分散内存并降低性能。然后内存变得充满漏洞,当系统需要分配更多内存时,它会承受寻找压缩内存的开销。如果不设置 bootstrap.memory_lock: true,Elasticsearch 会将整个进程内存转储到磁盘上并将其碎片整理回内存中,从而冻结系统。使用此设置,碎片整理步骤全部在内存中完成,性能大幅提升。

    Setting up different node types

    Elasticsearch 是为云原生设计的,所以当你需要发布一个记录海量的生产环境,并且需要高可用性和良好的性能时,你需要在一个集群中聚合更多的节点。

    Elasticsearch 允许您定义不同类型的节点来平衡和提高整体性能。

    Getting ready

     下载和安装 Elasticsearch recipe 中所述您需要一个有效的 Elasticsearch 安装和一个简单的文本编辑器来更改配置文件。

    How to do it…

    对于集群的高级设置,必须配置一些参数来定义不同的节点类型。

    这些参数在 config/elasticsearch.yml文件中,可以通过以下步骤进行设置:

    1. Set up whether the node can be a master or not, as follows:
    node.master: true
    1. Set up whether a node must contain data or not, as follows:
    node.data: true
    1. Set up whether a node can work as an ingest node, as follows:
    node.ingest: true

    How it works…

    node.master 参数确定节点可以成为云的主节点。此参数的默认值为 true。主节点是云的仲裁者;它决定分片管理,保持集群状态,并且是每个索引操作的主要控制器。如果您的主节点过载,所有集群都会有性能 penalties。主节点是将搜索分布到所有数据节点并聚合/重新评分结果以将其返回给用户的节点。在大数据方面,它是 Elasticsearch 中 Map/Redux 搜索中的一个 Redux 层。

    主节点的数量必须始终是偶数。

    node.data 参数允许您将数据存储在节点中。此参数的默认值为 true。 此节点将是一个负责索引和搜索数据的工作人员

    通过混合这两个参数,可以有不同的节点类型,如下表所示:

    node.master

    node.data

    节点说明

    真的

    真的

    这是默认节点。它可以是包含数据的主控。

    错误的

    真的

    该节点永远不会成为主节点;它只保存数据。它可以定义为集群的主力。

    真的

    错误的

    该节点仅用作主节点,以避免存储任何数据并拥有空闲资源。这将是您的集群的协调器。

    错误的

    错误的

    该节点充当搜索负载均衡器(从节点获取数据、聚合结果等)。这种节点也称为协调器或客户端节点。

    最常用的节点类型是第一种,但如果您有非常大的集群或特殊需求,您可以更改节点的范围以更好地服务于搜索和聚合。

    There's more…

    关于master节点的数量,有一些设置需要至少一半加1才能保证集群处于安全状态(无脑裂风险: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/modules- node.html#split-brain)。此设置是 discovery.zen.minimum_master_nodes,它必须设置为以下等式:

    (master_eligible_nodes / 2) + 1

    要拥有 高可用性 (HA) 集群,您至少需要三个主节点,并且设置了  minimum_master_nodes 的值到 2.

    Setting up a coordinator node

    我们之前看到的主节点对于集群稳定性来说是最重要的。为了防止查询和聚合在集群中造成不稳定,可以使用协调器(或客户端/代理)节点来提供与集群的安全通信。

    Getting ready

    正如我们在本章的下载和安装 Elasticsearch 秘籍中所描述的那样,您需要一个有效的 Elasticsearch 安装,以及一个用于更改配置文件的简单文本编辑器。

    How to do it…

    对于集群的高级设置,必须配置一些参数来定义不同的节点类型。

    这些参数在 config/elasticsearch.yml 文件中,可以通过以下步骤设置协调器节点:

    1. Set up the node so that it's not a master, as follows:
    node.master: false
    1. Set up the node to not contain data, as follows:
    node.data: false

    How it works…

    协调节点是一个特殊的节点,作为集群的代理/传递思想。其主要优点如下:

    • It can easily be killed or removed from the cluster without causing any problems. It's not a master, so it doesn't participate in cluster functionalities and it doesn't contain data, so there are no data relocations/replications due to its failure.
    • It prevents the instability of the cluster due to a developers' /users bad queries. Sometimes, a user executes aggregations that are too large (that is, date histograms with a range of some years and intervals of 10 seconds). Here, the Elasticsearch node could crash. (In its newest version, Elasticsearch has a structure called circuit breaker to prevent similar issues, but there are always borderline cases that can bring instability using scripting, for example. The coordinator node is not a master and its overload doesn't cause any problems for cluster stability.
    • If the coordinator or client node is embedded in the application, there are less round trips for the data, speeding up the application.
    • You can add them to balance the search and aggregation throughput without generating changes and data relocation in the cluster.

    Setting up an ingestion node

    Elasticsearch 的主要目标是索引、搜索和分析,但通常需要在将文档存储到 Elasticsearch 之前对其进行修改或增强。

    以下是这种情况下最常见的场景: 

    • Preprocessing the log string to extract meaningful data
    • Enriching the content of textual fields with Natural Language Processing (NLP) tools
    • Enriching the content using machine learning (ML) computed fields
    • Adding data modification or transformation during ingestion, such as the following:
      • Converting IP in geolocalization
      • Adding datetime fields at ingestion time
      • Building custom fields (via scripting) at ingestion time

    Getting ready

    您需要一个有效的 Elasticsearch 安装,如下载和安装 Elasticsearch 配方中所述,还需要一个简单的文本编辑器来更改配置文件。

    How to do it…

    要设置摄取节点,您需要编辑 config/elasticsearch.yml 文件并将 ingest 属性设置为 true,  如下:

    node.ingest: true
    每次你改变你的 elasticsearch.yml 文件,需要重启节点。

    How it works…

    Elasticsearch 的默认配置是将节点设置为摄取节点(参考 第 12 章使用摄取模块,了解有关摄取管道的更多信息)。

    作为协调节点,使用摄取节点是一种为 Elasticsearch 提供功能而不会影响集群安全的方法。

    如果您想阻止某个节点被用于摄取,您需要使用 node.ingest: false 禁用它。最佳实践是在主节点和数据节点中禁用此功能,以防止出现摄取错误问题并保护集群。协调节点是成为摄取节点的最佳候选者。

    如果您使用 NLP、附件提取(通过附件摄取插件)或日志摄取,最佳实践是拥有一个摄取活动的协调节点池(无主节点、无数据)。

    以前版本的 Elasticsearch 中的附件和 NLP 插件在标准数据节点或主节点中可用。由于以下原因,这些给 Elasticsearch 带来了很多问题:

    • High CPU usage for NLP algorithms that saturates all CPU on the data node, giving bad indexing and searching performances
    • Instability due to the bad format of attachment and/or Apache Tika bugs (the library used for managing document extraction)
    • NLP or ML algorithms require a lot of CPU or stress the Java garbage collector, decreasing the performance of the node

    最佳实践是拥有一个启用摄取的协调节点池,以便为集群和摄取管道提供最佳安全性。

    There's more…

    了解了四种 Elasticsearch 节点后,您可以轻松理解为与 Elasticsearch 协同工作而设计的防水架构应该与此类似:

    读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

    Installing plugins in Elasticsearch

    Elasticsearch 的主要功能之一是可以使用插件对其进行扩展。插件以多种方式扩展了 Elasticsearch 的特性和功能。

    在 Elasticsearch 中,这些插件是原生插件。这些是包含应用程序代码的 JAR 文件,用于以下原因:

    • Script engines
    • Custom analyzers, tokenizers, and scoring
    • Custom mapping
    • REST entry points
    • Ingestion pipeline stages
    • Supporting new storages (Hadoop, GCP Cloud Storage)
    • Extending X-Pack (that is, with a custom authorization provider)

    Getting ready

    您需要一个有效的 Elasticsearch 安装,正如我们在下载和安装 Elasticsearch 配方中所描述的,还需要一个提示符/shell 来执行 Elasticsearch 安装目录中的命令。

    How to do it…

    Elasticsearch 在 bin/directory 中提供了一个用于自动下载和安装插件的脚本,称为 elasticsearch-plugin

    安装插件所需的步骤 如下:

    1. Calling the plugin and installing the Elasticsearch command with the plugin name reference.

    要安装用于从文件中提取文本的提取附件插件,如果您使用的是 Linux,只需调用并键入以下命令:

    bin/elasticsearch-plugin install ingest-attachment

    对于 Windows,键入以下命令:

    elasticsearch-plugin.bat install ingest-attachment
    1. If the plugin needs to change security permissions, a warning is prompted and you need to accept this if you want to continue.
    2. During the node's startup, check that the plugin is correctly loaded.

    在以下屏幕截图中,您可以看到 Elasticsearch 服务器的安装和启动,以及已安装的插件:

    读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

    Remember that a plugin installation requires an Elasticsearch server restart.

    How it works…

    elasticsearch-plugin.bat 脚本是 Elasticsearch 插件管理器的包装器。这可用于安装或删除插件(使用删除选项)。

    有几种方法可以安装插件,例如:

    • Passing the URL of the plugin (ZIP archive), as follows:
    bin/elasticsearch-plugin install http://mywoderfulserve.com/plugins/awesome-plugin.zip
    • Passing the file path of the plugin (ZIP archive), as follows:
    bin/elasticsearch-plugin install file:///tmp/awesome-plugin.zip
    • Using the install parameter with the GitHub repository of the plugin. The install parameter, which must be given, is formatted in the following way:
    <username>/<repo>[/<version>]

    在安装过程中,Elasticsearch 插件管理器能够执行以下操作:

    • Download the plugin
    • Create a plugins directory in ES_HOME/plugins, if it's missing
    • Optionally, ask if the plugin wants special permission to be executed
    • Unzip the plugin content in the plugin directory
    • Remove temporary files

    安装过程是全自动的;无需采取进一步行动。用户只需注意该过程以 Installed 消息结束这一事实,以确保安装过程已正确完成。

    始终需要重新启动服务器以确保 Elasticsearch 正确加载了插件。

    There's more…

    如果您当前的 Elasticsearch 应用程序依赖于一个或多个插件,则可以将节点配置为仅在这些插件已安装且可用时启动。要实现此行为,您可以在 elasticsearch.yml 配置文件中提供 plugin.mandatory 指令。

    对于前面的示例(ingest-attachment),要添加的配置行是 如下:

    plugin.mandatory:ingest-attachment

    安装插件时还有一些提示要记住:在节点环境中更新某些插件可能会由于不同节点中的插件版本不同而导致故障。如果您有一个大集群以确保安全,最好在单独的环境中检查更新以防止出现问题(并记住升级所有节点中的插件)。

    为了防止在 Elasticsearch 5.x 或更高版本中更新 Elasticsearch 版本服务器,这也可能由于一些内部 API 更改而破坏您的自定义二进制插件,插件需要在其清单中具有相同版本的 Elasticsearch 服务器。

    升级 Elasticsearch 服务器版本意味着升级所有已安装的插件。

    See also

    Removing a plugin

    您已经安装了一些插件,现在您需要删除一个插件,因为它不是必需的。删除 Elasticsearch 插件很容易 如果一切正常,否则您将需要手动删除它。

    这个食谱涵盖了这两种情况。

    Getting ready

    您需要一个有效的 Elasticsearch 安装,如下载和安装 Elasticsearch 配方中所述,以及在 Elasticsearch 安装目录中执行命令的提示符或 shell。在删除插件之前,停止 Elasticsearch 服务器以防止由于删除插件 JAR 引起的错误会更安全。

    How to do it…

    删除插件的步骤如下:

    1. Stop your running node to prevent exceptions that are caused due to the removal of a file.
    2. Use the Elasticsearch plugin manager, which comes with its script wrapper (bin/elasticsearch-plugin).

    在 Linux 和 macOS X 上,键入以下命令:

    elasticsearch-plugin remove ingest-attachment

    在 Windows 上,键入以下命令:

    elasticsearch-plugin.bat remove ingest-attachment
    1. Restart the server.

    How it works…

    插件管理器的删除命令尝试检测插件的正确名称并删除已安装插件的目录。

    如果您的插件目录中有不可删除的文件(或奇怪的天文事件袭击了您的服务器),则插件脚本可能无法手动删除插件,因此您需要按照以下步骤操作:

    1. Go into the plugins directory
    2. Remove the directory with your plugin name

    Changing logging settings

    标准日志记录设置非常适合一般用途。

    更改日志级别对于检查错误或了解由于配置错误或奇怪的插件行为而导致的故障很有用。可以使用 Elasticsearch 社区的详细日志来解决此类问题。

    如果您需要调试 Elasticsearch 服务器或更改日志记录的工作方式(即远程发送事件),则需要更改 log4j2.properties 文件。

    Getting ready

    正如我们在下载和安装 Elasticsearch 秘籍中所述,您需要一个有效的 Elasticsearch 安装,以及一个简单的文本编辑器来更改配置文件。

    How to do it…

    在 Elasticsearch 安装目录的 config 目录中,有一个控制工作设置的 log4j2.properties 文件。

    更改日志记录设置所需的步骤是 如下:

    1. To emit every kind of logging Elasticsearch could produce, you can change the current root level logging, which is as follows:
    rootLogger.level = info
    1. This needs to be changed to the following:
    rootLogger.level = debug
    1. Now, if you start Elasticsearch from the command line (with bin/elasticsearch -f), you should see a lot of information, like the following, which is not always useful (except to debug unexpected issues):

    读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

    How it works…

    Elasticsearch 日志系统基于 log4j 库 (http://logging.apache.org/log4j/< /a>)。

    Log4j 是一个强大的库   用于管理日志记录。涵盖其所有功能超出了本书的范围;如果用户需要高级用法,互联网上有很多关于它的书籍和文章。

    Setting up a node via Docker

    Docker ( https://www.docker.com/ ) 已成为部署应用服务器的常用方式 应用服务器  用于测试或生产。

    Docker 是一个容器系统,可以轻松部署服务器应用程序的可复制安装。使用 Docker,您无需设置主机、配置它、下载 Elasticsearch 服务器、解压缩或启动服务器——一切都由 Docker 自动完成。

    Getting ready

    How to do it…

    1. If you want to start a vanilla server, just execute the following command:
    docker pull docker.elastic.co/elasticsearch/elasticsearch:7.0.0
    1. An output similar to the following will be shown:
    7.0.0: Pulling from elasticsearch/elasticsearch
     256b176beaff: Already exists
     1af8ca1bb9f4: Pull complete
     f910411dc8e2: Pull complete
     0c0400545052: Pull complete
     6e4d2771ff41: Pull complete
     a14f19907b79: Pull complete
     ea299a414bdf: Pull complete
     a644b305c472: Pull complete
     Digest: sha256:3da16b2f3b1d4e151c44f1a54f4f29d8be64884a64504b24ebcbdb4e14c80aa1
     Status: Downloaded newer image for docker.elastic.co/elasticsearch/elasticsearch:7.0.0
    1. After downloading the Elasticsearch image, we can start a develop instance that can be accessed outside from Docker:
    docker run -p 9200:9200 -p 9300:9300 -e "http.host=0.0.0.0" -e "transport.host=0.0.0.0" docker.elastic.co/elasticsearch/elasticsearch:7.0.0

    您将看到 ElasticSearch 服务器的输出正在启动。

    1. In another window/Terminal, to check if the Elasticsearch server is running, execute the following command:
    docker ps

    输出将类似于以下内容:

    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
     b99b252732af docker.elastic.co/elasticsearch/elasticsearch:7.0.0 "/usr/local/bin/dock…" 2 minutes ago Up 2 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp gracious_bassi
    1. The default exported ports are 9200 and 9300.

    How it works…

    Docker 容器提供安装了 Elasticsearch 的 Debian Linux 安装。

    Elasticsearch Docker 安装很容易重复,不需要大量的编辑和配置。

    可以通过多种方式调整默认安装,例如:

    1. You can pass a parameter to Elasticsearch via the command line using the -e flag, as follows:
    docker run -d docker.elastic.co/elasticsearch/elasticsearch:7.0.0 elasticsearch -e "node.name=NodeName"
    1. You can customize the default settings of the environment that's providing custom Elasticsearch configuration by providing a volume mount point at /usr/share/elasticsearch/configas follows:
    docker run -d -v "$PWD/config":/usr/share/elasticsearch/config docker.elastic.co/elasticsearch/elasticsearch:7.0.0
    1. You can persist the data between Docker reboots configuring a local data mount point to store index data. The path to be used as a mount point is /usr/share/elasticsearch/configas follows:
    docker run -d -v "$PWD/esdata":/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:7.0.0

    There's more…

    官方的 Elasticsearch 镜像不仅由 Docker 提供。还有几个用于自定义目的的自定义图像。其中一些针对大型集群部署或比标准集群更复杂的 Elasticsearch 集群拓扑进行了优化。

    Docker 非常方便以干净的方式测试多个版本的 Elasticsearch,而无需在主机上安装太多东西。

    在代码存储库目录 ch01/docker/ 中,有一个 docker-compose.yaml 文件,它提供了一个完整的环境,它将设置以下元素:

    • elasticsearch, which will be available at http://localhost:9200
    • kibana, which will be available at http://localhost:5601
    • cerebro, which will be available at http://localhost:9000

    要安装所有应用程序,您只需执行 docker-compose up -d。所有必需的二进制文件都将下载并安装在 Docker 中,然后就可以使用它们了。

    See also

    Deploying on Elasticsearch Cloud Enterprise

    Elasticsearch 公司提供Elasticsearch Cloud Enterprise (ECE),该工具与 Elasticsearch Cloud (https://www.elastic.co/cloud) 并且免费提供。该解决方案可在 AWS 上的 PAAS 或 GCP(谷歌云平台)上使用,可以在本地安装,以在 Elasticsearch 之上提供企业解决方案。

    如果您需要跨团队或跨地域管理多个弹性部署,您可以利用 ECE 集中部署管理以实现以下功能:

    • Provisioning
    • Monitoring
    • Scaling
    • Replication
    • Upgrades
    • Backup and restoring

    使用 ECE 集中管理部署可强制实施统一的版本控制、数据治理、备份和用户策略。通过更好的管理提高硬件利用率也可以降低总成本。

    Getting ready

    由于此解决方案针对大量服务器的大型安装,因此最低测试要求是 8 GB RAM 节点。 ECE 解决方案位于 Docker 的顶部,必须安装在节点上。

    ECE 仅支持某些操作系统,例如:

    • Ubuntu 16.04 with Docker 18.03
    • Ubuntu 14.04 with Docker 1.11
    • RHEL/CentOS 7+ with Red Hat Docker 1.13

    在其他配置上,ECE 可以工作,但在出现问题时不支持。

    How to do it…

    在安装 ECE 之前,需要检查以下先决条件:

    1. Your user must be a Docker enabled one. In the case of an error due to a non-Docker user, add your user with sudo usermod -aG docker $USER.
    2. In the case of an error when you try to access /mnt/data, give your user permission to access this directory.
    3. You need to add the following line to your /etc/sysctl.conf (a reboot is required): vm.max_map_count = 262144.
    1. To be able to use the ECE, it must initially be installed on the first host, as follows:
    bash <(curl -fsSL https://download.elastic.co/cloud/elastic-cloud-enterprise.sh) install

    安装过程应自动管理这些步骤,如以下屏幕截图所示:

    读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

    最后,安装程序应提供您的凭据,以便您可以在类似的输出中访问您的集群,如下所示:

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     Elastic Cloud Enterprise installation completed successfully
    Ready to copy down some important information and keep it safe?
    Now you can access the Cloud UI using the following addresses:
    http://192.168.1.244:12400
    https://192.168.1.244:12443
    
    Admin username: admin
    Password: OCqHHqvF0JazwXPm48wfEHTKN0euEtn9YWyWe1gwbs8
    Read-only username: readonly
    Password: M27hoE3z3v6x5xyHnNleE5nboCDK43X9KoNJ346MEqO
    
    Roles tokens for adding hosts to this installation:
    Basic token (Don't forget to assign roles to new runners in the Cloud UI after installation.)
    eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJiZDI3NjZjZi1iNWExLTQ4YTYtYTRlZi1iYzE4NTlkYjQ5ZmEiLCJyb2xlcyI6W10sImlzcyI6ImN1cnJlbnQiLCJwZXJzaXN0ZW50Ijp0cnVlfQ.lbh9oYPiJjpy7gI3I-_yFBz9T0blwNbbwtWF_-c_D3M
    
    Allocator token (Simply need more capacity to run Elasticsearch clusters and Kibana? Use this token.)
    eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJjYTk4ZDgyNi1iMWYwLTRkZmYtODBjYS0wYWYwMTM3M2MyOWYiLCJyb2xlcyI6WyJhbGxvY2F0b3IiXSwiaXNzIjoiY3VycmVudCIsInBlcnNpc3RlbnQiOnRydWV9.v9uvTKO3zgaE4nr0SDfg6ePrpperIGtvcGVfZHtmZmY
    Emergency token (Lost all of your coordinators? This token will save your installation.)
    eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiI5N2ExMzg5Yi1jZWE4LTQ2MGItODM1ZC00MDMzZDllNjAyMmUiLCJyb2xlcyI6WyJjb29yZGluYXRvciIsInByb3h5IiwiZGlyZWN0b3IiXSwiaXNzIjoiY3VycmVudCIsInBlcnNpc3RlbnQiOnRydWV9._0IvJrBQ7RkqzFyeFGhSAQxyjCbpOO15qZqhzH2crZQ
    
    To add hosts to this Elastic Cloud Enterprise installation, include the following parameters when you install the software
    on additional hosts: --coordinator-host 192.168.1.244 --roles-token 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJiZDI3NjZjZi1iNWExLTQ4YTYtYTRlZi1iYzE4NTlkYjQ5ZmEiLCJyb2xlcyI6W10sImlzcyI6ImN1cnJlbnQiLCJwZXJzaXN0ZW50Ijp0cnVlfQ.lbh9oYPiJjpy7gI3I-_yFBz9T0blwNbbwtWF_-c_D3M'
    
    These instructions use the basic token, but you can substitute one of the other tokens provided. You can also generate your own tokens. For example:
    curl -H 'Content-Type: application/json' -u
    admin: OCqHHqvF0JazwXPm48wfEHTKN0euEtn9YWyWe1gwbs8 http://192.168.1.244:12300/api/v1/platform/configuration/security/enrollment-tokens -d '{ "persistent": true, "roles": [ "allocator"] }'
    
    To learn more about generating tokens, see Generate Role Tokens in the documentation.
    
    System secrets have been generated and stored in /mnt/data/elastic/bootstrap-state/bootstrap-secrets.json.
    Keep the information in the bootstrap-secrets.json file secure by removing the file and placing it into secure storage, for example.
    
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1. In my case, I can access the installed interface at http://192.168.1.244:12400.

    登录管理界面后,你会看到你的实际云状态,如下:

    读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

    1. You can now press on Create Deployment to fire your first Elasticsearch cluster, as follows:

      读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

      1. You need to define a name (that is, a book-cluster). Using standard options for this is okay. After pressing Create Deployment, ECE will start to build your cluster, as follows:

      读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

      1. After a few minutes, the cluster should be up and running, as follows:

      读书笔记《elasticsearch-7-0-cookbook-fourth-edition》开始

      How it works…

      Elasticsearch Cloud Enterprise 允许您管理可以通过部署创建实例的大型 Elasticsearch 云服务。默认情况下,标准部署将触发具有 4 GB RAM、32 GB 磁盘和 Kibana 实例的 ElasticSearch 节点。

      您可以在 ElasticSearch 的部署过程中定义很多参数,例如:

      • The RAM used for instances from 1 GB to 64 GB. The storage is proportional to the memory, so you can go from 1 GB RAM and 128 GB storage to 64 GB RAM and 2 TB storage.
      • If the node requires ML.
      • Master configurations if you have more than six data nodes.
      • The plugins that are required to be installed.

      对于 Kibana,您只能配置内存(从 1 GB 到 8 GB)并传递额外的参数(通常用于自定义地图)。

      ECE 执行所有配置, 如果您需要监控组件和其他 X-Pack 功能,它能够自动配置您的集群以管理所有必需的功能。

      如果您需要管理多个 Elasticsearch/Kibana 集群,Elasticsearch Cloud Enterprise 非常有用,因为它可以解决所有基础架构问题。

      使用已部署的 Elasticsearch 集群的一个好处是,在部署期间会安装代理。这对于管理 Elasticsearch 调用的调试非常方便。

      See also