Monitoring

监控是任何微服务架构的关键点，尤其是任何基于云的架构。无论如何，你的架构需要有一个监控平台，这样它才能不断地观察系统的性能、可靠性、资源可用性和消耗、安全性和存储等。

但是，选择正确的平台可能很困难，因为有很多组件可以发挥作用。用于正确实施监控解决方案平台的任务如下：

Use one platform: A platform that's capable of discovering and grasping information of the running systems, and aggregate the result in a comprehensive way using charts.
Identify metrics and events: An application is responsible for exposing these metrics, and the platform should take only the ones that are the most relevant.
Split data: Store application-monitoring data separately from infrastructure-monitoring data, but centralize the monitoring view.
Alert: Provide alerts when limits are met, both for application and infrastructure. For example; when an application is performing slowly, and when the storage is running out of space.
Observe user experience: Response times, throughput, latency, and errors.

在本章中，我们将介绍以下主题：

Prometheus
Node-exporter
Grafana

Prometheus

有许多用于不同目的的不同工具，其中一些提供了我们之前描述的一些监控功能。 New Relic、Dynatrace 和 SolarWinds（仅举几例）为云环境提供强大的基于 SaaS 的监控和性能管理。然而，现在最常用的开源解决方案叫做 Prometheus。

Prometheus 是 100% 开源和社区驱动的。所有组件都在 GitHub 上的 Apache 许可证 2.0 版下可用。 Prometheus 也是云原生计算基金会的成员项目。

以下是 Prometheus 的功能，直接取自 Prometheus 网站 (https://prometheus.io/)：

Dimensional data: Prometheus implements a highly dimensional data model. Time series are identified by a metric name and a set of key-value pairs.
Powerful queries: PromQL allows for the slicing and dicing of collected time series data to generate ad hoc graphs, tables, and alerts.
Great visualization: Prometheus has multiple modes for visualizing data built-in expression browser, Grafana integration, and a console template language.
Efficient storage: Prometheus stores time series in-memory and on to a local disk in an efficient custom format. Scaling is achieved by functional sharding and federation.
Simple operation: Each server is independent for reliability, relying only on local storage. Written in Go, all binaries are statically linked and easy to deploy.
Precise alerting: Alerts are defined based on Prometheus's flexible PromQL and maintain dimensional information. An alert manager handles notifications and silencing.
Many client libraries: Client libraries allow for the easy instrumentation of services. Over ten languages are supported already and custom libraries are easy to implement.
Many integrations: Existing exporters allow for the bridging of third-party data into Prometheus. Examples include system statistics, as well as Docker, HAProxy, StatsD, and JMX metrics.

Installing Prometheus

Before installing Prometheus, make sure that your OpenShift cluster is up and running by issuing the following command:

./oc cluster status
Web console URL: https://127.0.0.1:8443/console/
Config is at host directory
Volumes are at host directory
Persistent volumes are at host directory /opt/rh/okd/3.11/openshift.local.clusterup/openshift.local.pv
Data will be discarded when cluster is destroyed

Use the following code in case of failure:

Error: OpenShift cluster is not running

Start the cluster as follows:

./oc cluster up --server-loglevel=9
Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.11 ...
...
Server Information ...
OpenShift server started.
The server is accessible via web console at:
https://127.0.0.1:8443

Once the cluster is available, login as developer, as follows:

./oc login -u developer -p developer

Create a project called monitoring:

./oc new-project monitoring

Now use project monitoring on the https://127.0.0.1:8443 server. You can add applications to this project with the new-app command. For example, try the following:

oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git

这将在 Ruby 中构建一个新的示例应用程序。

Now deploy the Prometheus platform, which is already available as a Docker image from the Docker.io repository, as follows:

./oc new-app prom/prometheus

--> Found Docker image 5517f70 (45 hours old) from Docker Hub for "prom/prometheus"

将创建一个图像流标记作为 prometheus:latest，它将跟踪该图像。此映像将部署在名为 prometheus 的部署配置中。 9090/tcp 端口将由 prometheus 服务进行负载平衡。其他容器可以通过 prometheus 主机名访问该服务。此映像声明了卷，并将默认使用非持久的主机本地存储。

You can add persistent volumes later by running volume dc/prometheus --add ...:

--> Creating resources ...
imagestream.image.openshift.io "prometheus" created
deploymentconfig.apps.openshift.io "prometheus" created
service "prometheus" created
--> Success

Expose the service by executing the following commands:

'oc expose svc/prometheus'

Run oc status to view your app.
Let's point the browser on the Web console and check the overview of the Prometheus deployment.

Web 控制台应如下所示：

读书笔记《hands-on-cloud-native-microservices-with-jakarta-ee》监控

The Prometheus platform is still not available, so its service needs to be exported as route, as follows:

./oc expose service prometheus

route.route.openshift.io/prometheus exposed

Now, click on the link representing the route to view the Prometheus application. The page should be similar to the following:

To see where Prometheus is grasping the metrics from, select the voice Targets from the Status menu item, as follows:

如您所见，Prometheus 正在从自身获取指标。

To see all of the available metrics, set the context path of the Prometheus application to metrics, and you should see a long list of metrics, along with their values.

可以在 https://github.com/PacktPublishing/Hands-On-Cloud-Native-Microservices-with-Jakarta-EE/tree/master/ch10/metrics

If you try to execute the PromQL process_cpu_seconds_total, which corresponds to the total user and system CPU time spent in seconds, you should see the Graph, as depicted in the following screenshot:

如您所见，Prometheus 界面的外观并不是您想要的最好的——这就是大多数监控仪表板依赖 Grafana 的原因。我们将在本章后面讨论 Grafana。

在安装 Grafana 之前，值得一提的是，Prometheus 提供的指标对于 OpenShift 等基于容器的云解决方案来说是不够的。有关主机、容器和 pod 的指标是强制性的。为此，节点导出器开始发挥作用。

Node-exporter

Node-exporter 是一个公开有关 Linux 内核服务器的指标的工具。这些指标与 CPU 利用率和内存进程有关，可以按时间序列方式导入 Prometheus，以便以图形方式表示。

Installing Node-exporter

安装 Node-exporter 非常简单，因为我们可以直接使用它的 Docker 镜像，它可以在 Docker.io 中通过 OpenShift 命令引用它：

To install Node-exporter, issue the following commands:

 ./oc new-app prom/node-exporter
--> Found Docker image b3e7f67 (7 weeks old) from Docker Hub for "prom/node-exporter"

将创建一个名为 node-exporter:latest 的图像流标记，它将跟踪该图像。此映像将部署在部署配置文件 node-exporter 中。 9100/tcp/ 端口将由 node-exporter 服务进行负载平衡。

Other containers can access this service through the node-exporter host name:

--> Creating resources ...
 imagestream.image.openshift.io "node-exporter" created
 deploymentconfig.apps.openshift.io "node-exporter" created
 service "node-exporter" created
 --> Success

You can expose the service by executing the following commands:

'oc expose svc/node-exporter'

Run oc status to view your app. To expose the node-exporter, use the following code:

./oc expose service node-exporter

route.route.openshift.io/node-exporter exposed

通过路由暴露 node-exporter 仅用于测试目的。我们真正需要的是让 Prometheus 掌握 Node-exporter 提供的指标。

要实现这样的配置，需要更新 Prometheus 的配置文件。

If you access the Prometheus pod, you will be able to see the current configuration, as follows:

./oc get pods

NAME READY STATUS RESTARTS AGE

node-exporter-1-lgpz2 1/1 Running 0 6m

prometheus-1-qgggp 1/1 Running 0 6h

./oc rsh prometheus-1-qgggp

$ ls -la

total 16

drwxr-xr-x 1 nobody nogroup 4096 Jan 15 20:13 .

drwxr-xr-x 1 root root 4096 Jan 19 14:36 ..

lrwxrwxrwx 1 nobody nogroup 39 Jan 15 20:13 console_libraries -> /usr/share/prometheus/console_libraries

lrwxrwxrwx 1 nobody nogroup 31 Jan 15 20:13 consoles -> /usr/share/prometheus/consoles/

lrwxrwxrwx 1 root root 11 Jan 15 20:13 data -> /prometheus

-rw-r--r-- 1 nobody nogroup 926 Jan 15 20:09 prometheus.yml

$ cat prometheus.yml
# my global config
global:

scrape_interval: 15s 
evaluation_interval: 15s

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:

rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

scrape_configs:
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']

Prometheus 的配置需要加载以下设置：

- job_name: 'node-exporter'
  static_configs:
    - targets: ['node-exporter:9100']

These new settings can be placed into a ConfigMap and mounted as a volume.
Create a prometheus.yaml file with the following content:

global:
scrape_interval: 15s
evaluation_interval: 15s

alerting:
alertmanagers:
- static_configs:
- targets:

rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

scrape_configs:
- job_name: 'prometheus'
  static_configs:
    - targets: ['localhost:9090']
- job_name: 'node-exporter'
  static_configs:
    - targets: ['node-exporter:9100']

Save the file and issue the following command:

./oc create configmap prometheus-config-map --from-file=prometheus.yaml
configmap/prometheus-config-map created

现在我们需要在 Prometheus 部署中设置 ConfigMap。

To do that, select the Prometheus deployment from the Web console and edit the YAML configuration. This will add the reference of the ConfigMap and set the updated configuration file. You can look at the code of the YAML file at this URL: https://github.com/PacktPublishing/Hands-On-Cloud-Native-Microservices-with-Jakarta-EE/blob/master/ch10/prometheus-dc.yaml.

这些新设置允许 Prometheus 加载来自 ConfigMap 的新配置文件。

Now a new Prometheus deployment should be triggered automatically, as follows:

The new Prometheus deployment after loading the new configuration file

Going back to the Prometheus application, we should see the node-exporter available on Prometheus, as depicted in the following screenshot:

现在是时候使用 Grafana 导入所有内容了。

Grafana

普罗米修斯最好的朋友是格拉法纳。 Grafana 是开源的，因此它为创建和公开仪表板提供了一个很好的界面。它主要用于用图表、饼图、绘图、条形图和仪表来可视化时间序列数据。

Grafana 支持查询 Prometheus。自 2015 年 10 月 28 日发布的 Grafana 2.5.0 以来，已包含 Prometheus 的 Grafana 数据源。

尽管如此，在云容器环境中，基础设施指标是必须的，因此密切关注运行容器的主机必须是监控视图的关键点。

有一个此类指标的导出器称为 Node-exporter，它公开来自系统内核、CPU、内存、磁盘空间、磁盘 I/O 的机器级指标，网络带宽和主板温度。

Installing Grafana

安装 Grafana 非常简单，因为我们可以直接使用它的 Docker 镜像，该镜像在 Docker.io 中可用，通过以下步骤给出的 OpenShift 命令引用它：

First of all, we need to deploy Grafana on OpenShift by issuing the following commands:

./oc new-app grafana/grafana
 
 --> Found Docker image d0454da (3 days old) from Docker Hub for "grafana/grafana"

将创建一个名为 grafana:latest 的图像流标记，它将跟踪该图像。此映像将部署在部署配置 grafana 中。 3000/tcp 端口将由 grafana 服务进行负载平衡。

Other containers can access this service through the grafana host name:

--> Creating resources ...

imagestream.image.openshift.io "grafana" created

deploymentconfig.apps.openshift.io "grafana" created

service "grafana" created

--> Success

You can expose this service by executing the following command:

'oc expose svc/grafana'

Run oc status to view your app. This will expose it, as follows:

./oc expose service grafana

route.route.openshift.io/grafana exposed

Now open your browser and point it to the route that's displayed in the Web console.

默认情况下，Grafana 是安全的，您可以通过提供 admin 的用户名和密码来登录。

Next, you will be prompted to change the default password. Set it to be anything of your choice.
Next, Grafana asks you to set a data source, as depicted in the following screenshot:

Select the Prometheus Data Sources:

Add the following settings:

For the URL, you should set the Prometheus service name, not the route. Save & Test the settings, and then click Back.
Select the Prometheus box and then select the Dashboards tab, as shown in the following screenshot:

Import all the available statistics:

By clicking on Prometheus 2.0 Stats, you should now see a better monitoring interface:

The monitoring interface data on Prometheus 2.0 Stat

这绝对是比 Prometheus 提供的更好的外观和感觉。此外，Grafana 提供了一系列插件，您可以通过其网站将这些插件用于您的图表。

Summary

这是本书的最后一章。您学习了如何为 Prometheus 等容器部署和配置监控平台。您还学习了如何通过与 node-exporter 扩展集成来扩展 Prometheus 的功能。最后，您学习了如何部署和配置 Grafana，以图形方式显示 Prometheus 收集和存储的所有指标。

我们漫长的旅程到此结束。我希望你喜欢读这本书！

vlambda博客
学习文章列表

读书笔记《hands-on-cloud-native-microservices-with-jakarta-ee》监控