在 Elasticsearch 生态系统中,监控节点和集群以管理和改进其性能和状态非常重要。在集群级别可能会出现几个问题,例如:
- Node overheads: Some nodes can have too many shards allocated and become a bottleneck for the entire cluster.
- Node shutdown: This can happen due to a number of reasons, for example, full disks, hardware failures, and power problems.
- Shard relocation problems or corruptions: Some shards can't get an online status.
- Shards that are too large: If a shard is too big, then the index performance decreases due to the merging of massive Lucene segments.
- Empty indices and shards: These waste memory and resources; however, because each shard has a lot of active threads, if there are a large number of unused indices and shards, then the general cluster performance is degraded.
可以通过 API 或前端检测集群级别的故障或性能不佳(我们将在 第 11 章, 用户界面)。这些允许用户在他们的 Elasticsearch 数据上拥有一个有效的 Web 仪表板;它通过监控集群运行状况、备份或恢复数据以及在代码中实现查询之前允许对查询进行测试来工作。
在本章中,我们将探讨以下主题:
- Using the health API to check the health of the cluster
- Using the task API that controls jobs a cluster level
- Using hot threads to check inside nodes for problems due to a high CPU usage
- Learning how to monitor Lucene segments so as not to reduce the performance of a node due to there being too many of them
在本章中,我们将介绍以下食谱:
- Controlling the cluster health using an API
- Controlling the cluster state using an API
- Getting cluster node information using an API
- Getting node statistics using an API
- Using the task management API
- Hot Threads API
- Managing the shard allocation
- Monitoring segments with the segment API
- Cleaning the cache