vlambda博客
学习文章列表

读书笔记《hands-on-infrastructure-monitoring-with-prometheus》设置测试环境

Setting Up a Test Environment

最好的学习方式是边做边学。本章将帮助您快速启动测试环境,以便您可以安全地进行实验而不必担心太多。它将提供几个配置示例和有关如何运行的提示。这种类型的环境也将在本书的几个不同场景中使用。

简而言之,本章将涵盖以下主题:

  • Code organization
  • Machine requirements
  • Spinning up a new environment

Code organization

虽然本书中的示例和代码清单可以直接使用而无需任何支持材料,但还提供了一个配套的 Git 存储库来帮助您进行设置过程和测试环境的自动化,以便您轻松跟进。

在本节中,我们将探讨该存储库的组织方式,解释在测试环境自动化方面做出的一些选择,并就如何自定义它们提供一些指导:

.
├── Makefile
├── README.md
├── cache/
├── chapter03/
├── ...
├── chapter14/
└── utils/

这里显示的存储库的根结构应该很容易理解:

  • One directory per chapter that needs its own test environment (aptly named chapter, followed by the chapter number)
  • A cache directory, which will hold downloaded packages so that rebuilding test environments becomes as fast as possible
  • A utils directory, where default versions and parameters of the test environments can be found (and changed if wanted), along with some helper functions

接下来,我们将深入研究并仔细查看其中的每一个,如下所示:

.
├── ...
└── utils/
    ├── defaults.sh
    ├── helpers.sh
    └── vagrant_defaults.rb

utils 目录中,可以找到以下文件:

  • defaults.sh: Here, the versions of each component in the Prometheus stack (such as Prometheus itself, along with exporters and Alertmanager, among others) that will be used in the test environments can be found.
  • vagrant_defaults.rb: This file controls a couple of tunable parameters for the virtual machines that are used to run the test environments, like the amount of RAM each virtual machine will have, which base image to use, and what the environment's internal network will look like.
  • helpers.sh: This is a shell library that's used by the provisioning scripts with some helper functions to manage the downloading and caching of archives:
.
├── ...
├── chapter03/
│   ├── Vagrantfile
│   ├── configs/
│   └── provision/
└── ...

虽然每个测试环境在章节之间会有所不同,但基本结构将保持不变:

  • A Vagrantfile to describe how many virtual machines are needed for the test environment, along with how to configure and provision them
  • A configs directory to house configuration files that will be used in the provision step
  • A provision directory with scripts to download, install, and configure each of the Prometheus components that are required for the current test environment

通过查看本章的树结构,我们可以看到一个例子:

.
├── ...
├── chapter03/
│   ├── Vagrantfile
│   ├── configs/
│   │   ├── alertmanager/
│   │   ├── grafana/
│   │   ├── node_exporter/
│   │   └── prometheus/
│   └── provision/
│       ├── alertmanager.sh*
│       ├── grafana.sh*
│       ├── hosts.sh*
│       ├── node_exporter.sh*
│       └── prometheus.sh*
└── ...

configs 目录包含本章中使用的每个组件的子目录。 provision 目录遵循相同的模型,添加了一个 hosts.sh shell 脚本来自动管理客户机上的 /etc/hosts 文件.

现在,为什么不直接使用配置管理?的问题可能已经浮现在一些人的脑海中。出于以下几个原因,所有的配置自动化都是在 shell 中完成的:

  • This was a conscious effort to expose every detail, not abstract them away.
  • Shell scripting is the lowest common denominator of automation in Unix-like systems.
  • The purpose of this book is to focus on the inner workings of Prometheus, not the specific implementation of a given configuration management tool.

Machine requirements

此设置的机器要求可以在现代笔记本电脑上轻松运行,只要它启用了 CPU 虚拟化扩展并且其操作系统与软件要求兼容。此处涵盖的所有软件要求都经过深思熟虑。我们将使用免费和开源软件,因此在您试用测试环境时无需额外费用。

Hardware requirements

部署所提供示例的主机的最低要求如下:

  • At least 2 CPU cores
  • At least 4 GB of memory
  • At least 20 GB of free disk space

使用这些规范,您应该能够启动测试环境而不会遇到任何问题。

关于连接性,主机应该能够访问互联网并能够解析外部 DNS 记录。配置脚本在执行期间必须下载依赖项,尽管大多数依赖项将在本地缓存以避免在每次部署中都被下载。

示例环境的默认网络是 192.168.42.0/24,下图说明了运行本章示例时的配置:

读书笔记《hands-on-infrastructure-monitoring-with-prometheus》设置测试环境
Figure 3.1: Virtual network configuration
When launching a test environment, the subnet 192.168.42.0/24 will be used. Each environment belongs to a specific chapter, and should be destroyed before switching to a new one. If you encounter a conflict with your local address space, you can change the test environment subnet by editing the NETWORK option in the provided ./utils/vagrant_defaults.rb file.

Recommended software

该环境使用以下软件进行了测试,因此适用标准免责声明:尽管其各自主要版本中的其他版本可能无需额外更改即可工作,但在使用与我们推荐的版本不同的版本时应小心:

Software Version

虚拟盒子

6.0.4

流浪汉

2.2.4

Minikube

1.0.1

kubectl

1.14.1

关于支持的操作系统,所有测试均使用以下版本的 Linux 和 macOS 进行:

  • Ubuntu 18.04 LTS (Bionic Beaver)
  • macOS 10.14.3 (Mojave)

其他操作系统/发行版可能能够运行测试环境,尽管不能保证。

VirtualBox

Oracle VirtualBox 是一个免费的开源虚拟机管理程序,可在所有主要操作系统(macOS、Linux 和 Windows)上运行。它不仅允许您启动虚拟机映像,还允许您创建虚拟网络并将主机文件系统路径挂载到来宾中,以及其他功能。该软件需要启用硬件虚拟化。

您可以在以下位置找到 VirtualBox 的所有安装文件 https://www.virtualbox.org/wiki/Downloads

Vagrant

HashiCorp Vagrant 允许创建可移植环境。在本书的上下文中,它将成为 VirtualBox 的接口,允许启动和配置虚拟机。在我们的示例中,我们选择使用 Chef Bento 作为虚拟机映像,这是 HashiCorp 推荐的。

您可以在以下位置找到 Vagrant 的所有安装文件 https://www.vagrantup.com/downloads.html

Minikube

kubectl

Spinning up a new environment

在您确保主机上提供了所有必需的软件后,您可以继续执行以下一项或两项演练。

Automated deployment walkthrough

此方法将抽象所有部署和配置细节,让您只需几个命令即可拥有一个完全运行的测试环境。您仍然可以连接到每个来宾实例并更改配置。

启动环境的步骤如下:

  1. Clone this book's repository:
git clone https://github.com/PacktPublishing/Hands-On-Infrastructure-Monitoring-with-Prometheus.git
  1. Step into the newly created directory and chapter number:
cd Hands-On-Infrastructure-Monitoring-with-Prometheus/chapter03
  1. Spin up this chapter's test environment:
vagrant up
第一次运行需要几分钟,因为必须下载 Vagrant 映像和一些软件依赖项。在此设置过程之后,后续运行将快得多,因为所有这些资产都将保存在缓存中。
  1. 现在,您可以运行 vagrant status。您将看到以下输出:

Current machine states:

prometheus running (virtualbox)
grafana running (virtualbox)
alertmanager running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

Prometheus

您可以在 http://192.168.42.10:9090/targets 找到 Prometheus HTTP 端点:

读书笔记《hands-on-infrastructure-monitoring-with-prometheus》设置测试环境
Figure 3.2: Prometheus HTTP endpoint – showing all configured targets

Grafana

您可以在 http://192.168.42.11:3000 找到 Grafana HTTP 端点。

Grafana 的默认凭据如下:

Username Password
admin admin

您将看到两个自动配置的仪表板。我们将在本书后面的第 10 章发现和创建 Grafana 仪表板

读书笔记《hands-on-infrastructure-monitoring-with-prometheus》设置测试环境
Figure 3.3: An automatically provisioned Grafana dashboard

Alertmanager

您可以在 http://192.168.42.12:9093 找到 Alertmanager HTTP 端点。

我们还在 Prometheus 上配置了一个始终触发的警报,并通过 webhook 配置了一个自定义的 Alertmanager 集成,这样您就可以开始了解两者之间的关系。我们将在另一章中更详细地介绍 Alertmanager,但现在,您可以查看示例警报在代码存储库根目录中生成的日志 ./cache/alerting.log

读书笔记《hands-on-infrastructure-monitoring-with-prometheus》设置测试环境
Figure 3.4: Alertmanager – firing an example alert

Cleanup

完成测试后,只需确保您在 chapter03 内并执行以下命令:

vagrant destroy -f

不用太担心 如果您愿意,您可以轻松地再次启动环境。

Advanced deployment walkthrough

使用此方法,将启动来宾虚拟机,但不会进行任何配置,因此您需要自己动手设置环境。我们不会详细解释可用的配置文件和命令行参数——这些将在接下来的章节中深入探讨。因此,作为高级概述,对于每个软件组件,我们将执行以下操作:

  • Set up basic networking between the virtual machines in the environment
  • Create an individual system user
  • Download and install the software
  • Create support files and directories
  • Start the daemon

首先,克隆本书的存储库:

git clone https://github.com/PacktPublishing/Hands-On-Infrastructure-Monitoring-with-Prometheus.git

进入新创建的目录和章节编号并运行 Vagrant,无需配置来宾实例。这将为您留下现成的虚拟机:

cd Hands-On-Infrastructure-Monitoring-with-Prometheus/chapter03
vagrant up --no-provision

在所有客人都启动后,我们将继续一个一个地配置我们的实例。

Prometheus

执行以下步骤:

  1. Log in to the Prometheus guest instance:
vagrant ssh prometheus
  1. Inside the guest instance, drop to the root:
sudo -i
  1. Add all the guests' addresses to the instance host's file:
cat <<EOF >/etc/hosts
127.0.0.1       localhost
192.168.42.10   prometheus.prom.inet    prometheus
192.168.42.11   grafana.prom.inet       grafana
192.168.42.12   alertmanager.prom.inet  alertmanager


# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
EOF

  1. Create a new system user:
useradd --system prometheus
  1. Go into /tmp and download the Prometheus archive:
cd /tmp
curl -sLO "https://github.com/prometheus/prometheus/releases/download/v2.9.2/prometheus-2.9.2.linux-amd64.tar.gz"
  1. Uncompress the archive:
tar zxvf prometheus-2.9.2.linux-amd64.tar.gz
  1. Place every file in its correct location:
install -m 0644 -D -t /usr/share/prometheus/consoles prometheus-2.9.2.linux-amd64/consoles/*

install -m 0644 -D -t /usr/share/prometheus/console_libraries prometheus-2.9.2.linux-amd64/console_libraries/*

install -m 0755 prometheus-2.9.2.linux-amd64/prometheus prometheus-2.9.2.linux-amd64/promtool /usr/bin/

install -d -o prometheus -g prometheus /var/lib/prometheus   

install -m 0644 -D /vagrant/chapter03/configs/prometheus/prometheus.yml /etc/prometheus/prometheus.yml

install -m 0644 -D /vagrant/chapter03/configs/prometheus/first_rules.yml /etc/prometheus/first_rules.yml

  1. Add a systemd unit file for the Prometheus service:
install -m 0644 /vagrant/chapter03/configs/prometheus/prometheus.service /etc/systemd/system/

systemctl daemon-reload
  1. Enable and start the Prometheus service:
systemctl enable prometheus
systemctl start prometheus

您现在应该在您的主机上拥有 Prometheus HTTP 端点。

  1. Exit the root account and then the Vagrant user account:
exit

exit

Grafana

执行以下步骤:

  1. Log in to the Grafana guest instance:
vagrant ssh grafana
  1. Inside the guest instance, drop to the root:
sudo -i

  1. Add all the guests' addresses to the instance host's file:
cat <<EOF >/etc/hosts
127.0.0.1       localhost
192.168.42.10   prometheus.prom.inet    prometheus
192.168.42.11   grafana.prom.inet       grafana
192.168.42.12   alertmanager.prom.inet  alertmanager


# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
EOF
  1. Go into /tmp and download the Grafana package:
cd /tmp
curl -sLO "https://dl.grafana.com/oss/release/grafana_6.1.6_amd64.deb"
  1. Install the package and all dependencies:
DEBIAN_FRONTEND=noninteractive apt-get install -y libfontconfig

dpkg -i "grafana_6.1.6_amd64.deb"
  1. Place all the provided configurations on their correct location:
rsync -ru /vagrant/chapter03/configs/grafana/{dashboards,provisioning} /etc/grafana/
  1. Enable and start the Grafana service:
systemctl daemon-reload
systemctl enable grafana-server
systemctl start grafana-server

您现在应该在您的主机上拥有 Grafana HTTP 端点。

  1. Exit the root account and then the Vagrant user account:
exit

exit

Alertmanager

执行以下步骤:

  1. Log in to the Alertmanager guest instance:
vagrant ssh alertmanager
  1. Inside the guest instance, drop to the root:
sudo -i
  1. Add all the guests' addresses to the instance host's file:
cat <<EOF >/etc/hosts
127.0.0.1       localhost
192.168.42.10   prometheus.prom.inet    prometheus
192.168.42.11   grafana.prom.inet       grafana
192.168.42.12   alertmanager.prom.inet  alertmanager


# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
EOF
  1. Create a new system user:
useradd --system alertmanager
  1. Go into /tmp and download the Alertmanager archive:
cd /tmp
curl -sLO "https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz"
  1. Uncompress the archive:
tar zxvf alertmanager-0.17.0.linux-amd64.tar.gz

  1. Place every file in its correct location:
install -m 0755 alertmanager-0.17.0.linux-amd64/{alertmanager,amtool} /vagrant/chapter03/configs/alertmanager/alertdump /usr/bin/

install -d -o alertmanager -g alertmanager /var/lib/alertmanager

install -m 0644 -D /vagrant/chapter03/configs/alertmanager/alertmanager.yml /etc/alertmanager/alertmanager.yml
  1. Add a systemd unit file for the Alertmanager service:
install -m 0644 /vagrant/chapter03/configs/alertmanager/alertmanager.service /etc/systemd/system/

systemctl daemon-reload
  1. Enable and start the Alertmanager service:
systemctl enable alertmanager
systemctl start alertmanager

您现在应该在您的主机上拥有可用的 Alertmanager HTTP 端点。

  1. Exit the root account and then the Vagrant user account:
exit

exit

Node Exporter

为确保收集系统级指标,必须在所有三个虚拟机中安装 Node Exporter。要登录每个虚拟机,请使用我们在前几节中探讨的命令:

  1. Inside the guest instance, drop to the root:
sudo -i
  1. Create a new system user:
useradd --system node_exporter

  1. Go into /tmp and download the Node Exporter archive:
cd /tmp
curl -sLO "https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz"
  1. Place every file in its correct location:
tar zxvf "node_exporter-0.17.0.linux-amd64.tar.gz" -C /usr/bin --strip-components=1 --wildcards */node_exporter
  1. Add a systemd unit file for the Node Exporter service:
install -m 0644 /vagrant/chapter03/configs/node_exporter/node-exporter.service /etc/systemd/system/

systemctl daemon-reload
  1. Enable and start the Node Exporter service:
systemctl enable node-exporter
systemctl start node-exporter
  1. Exit the root account and then the Vagrant user account:
exit

exit

Validating your test environment

完成这些步骤后,您将能够通过在主机上使用以下端点来验证您的环境:

Service Endpoint
Prometheus http://192.168.42.10:9090
Grafana http://192.168.42.11:3000
Alertmanager http://192.168.42.12:9093

Summary

有了一个可供您使用的测试环境,您现在可以检查、更改和验证配置,而不必担心会造成破坏。在本书中,这种测试方法将被广泛使用,因为在学习新技能时没有什么能比实验更好的了。

在下一章中,我们将介绍 Prometheus 指标的基础知识。我们刚刚构建的测试环境将有助于演示它们。

Questions

  1. What are the recommended tools to set up a reproducible test environment?
  2. Where can you change the default versions of the Prometheus components for the test environment?
  3. What is the default subnet that's used on all examples?
  4. At a high level, what are the steps to get a Prometheus instance up and running?
  5. Node Exporter is installed on every guest instance. How can you quickly validate if all of them are exposing metrics correctly?
  6. In our test environment, where can you find the alert log?
  7. How can you create a clean test environment from scratch?