使用Grafana+Prometheus搭建PostgreSQL监控系统
PostgreSQL的监控方案很多,这里介绍一个比较炫酷的监控方案(Grafana+Prometheus+Alertmanager+pgSCV),这些组件的介绍可自行百度,此文不作介绍,本文重点介绍如何安装和整合这些组件。
下图以Prometheus为核心的一个系统监控架构图(图片来自Prometheus官网)
服务器列表:
| 节点名 | IP | 操作系统 | 安装软件 | 备注 |
|---|---|---|---|---|
| pg_node1 | 192.168.210.15 | CentOS 7.6 | Prometheus_2.28.1/grafana_8.1.0/pgscv_0.7.1/node_exporter_1.2.1/alertmanager-0.22.2 | 部署监控服务,被监控节点 |
| pg_node2 | 192.168.210.81 | CentOS 7.6 | pgscv_0.7.1/node_exporter-1.2.1 | 被监控节点 |
| pg_node3 | 192.168.210.33 | CentOS 7.6 | pgscv_0.7.1/node_exporter-1.2.1 | 被监控节点 |
软件安装目录说明
| 软件 | 安装主机 | 安装目录 |
|---|---|---|
| grafana | 192.168.210.15 | /data/monitor/grafana |
| prometheus | 192.168.210.15 | /data/monitor/prometheus |
| alertmanager | 192.168.210.15 | /data/monitor/prometheus/alertmanager |
| node_exporter | 192.168.210.15/192.168.210.81/192.168.210.33 | /data/monitor/prometheus/plugin/node_exporter |
| pgscv | 192.168.210.15/192.168.210.81/192.168.210.33 | /data/monitor/prometheus/plugin/postgres_exporter |
主机配置
关闭selinux(所有节点)
sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/selinux/configsetenforce 0
配置防火墙
根据节点部署的软件开放对应的端口。
grafana:3000
prometheus:9090
node_exporter:9100
pgscv:9890
alertmanager:465/9093
firewall-cmd --add-port=9090/tcp --permanentfirewall-cmd --add-port=3000/tcp --permanentfirewall-cmd --add-port=9100/tcp --permanentfirewall-cmd --add-port=9093/tcp --permanentfirewall-cmd --add-port=465/tcp --permanentfirewall-cmd --add-port=9890/tcp --permanentfirewall-cmd --reloadfirewall-cmd --list-all
安装Grafana
mkdir -p /data/monitor/wget https://dl.grafana.com/oss/release/grafana-8.1.0.linux-amd64.tar.gztar xf grafana-8.1.0.linux-amd64.tar.gzmv grafana-8.1.0 grafanarm -f grafana-8.1.0.linux-amd64.tar.gzmkdir -p /data/monitor/grafana/data/log#修改相关参数vi grafana/conf/defaults.inidata = /data/monitor/grafana/datalogs = /data/monitor/grafana/data/logplugins = /var/lib/grafana/pluginsprovisioning = /data/monitor/grafana/conf/provisioning#配置启动文件cat >> /usr/lib/systemd/system/grafana-server.service <<EOF[Unit]Description=Grafana instanceDocumentation=http://docs.grafana.orgWants=network-online.targetAfter=network-online.target#After=postgresql.service[Service]User=rootGroup=rootType=simpleRestart=on-failureWorkingDirectory=/data/monitor/grafanaRuntimeDirectory=grafanaRuntimeDirectoryMode=0750ExecStart=/data/monitor/grafana/bin/grafana-server --config=/data/monitor/grafana/conf/defaults.iniLimitNOFILE=10000TimeoutStopSec=20[Install]WantedBy=multi-user.targetEOF#启动systemctl daemon-reloadsystemctl enable grafana-serversystemctl start grafana-server#安装插件(dashboards会用到)./grafana/bin/grafana-cli plugins install digiapulssi-breadcrumb-panel./grafana/bin/grafana-cli plugins install grafana-polystat-panel./grafana/bin/grafana-cli plugins install yesoreyeram-boomtable-panel#登录界面(默认账号密码admin/admin,初始登录会提示修改密码)http://192.168.210.15:3000
安装Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gztar xf prometheus-2.28.1.linux-amd64.tar.gzmv prometheus-2.28.1.linux-amd64 prometheusrm -f prometheus-2.28.1.linux-amd64.tar.gz#修改相关参数cd prometheusvi prometheus.yml# my global configglobal:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configurationalerting:alertmanagers:- static_configs:- targets:- 192.168.210.15:9093 #Alertmanager访问信息# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:- "rules/*.yml" #配置报警规则# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:- file_sd_configs: # 注意,如果指定从某配置文件加载监控目标,则在prometheus启动之前需要确保该文件在prometheus的工作目录下事先存在,否则可能后续配置过程中出现报错- files:- host.yml #下面会创建此文件job_name: Hostmetrics_path: /metricsrelabel_configs:- source_labels: [__address__]regex: (.*)target_label: instancereplacement: $1- source_labels: [__address__]regex: (.*)target_label: __address__replacement: $1:9100- file_sd_configs: # 注意,如果指定从某配置文件加载监控目标,则在prometheus启动之前需要确保该文件在prometheus的工作目录下事先存在,否则可能后续>配置过程中出现报错- files:- pgscv.yml #下面会创建此文件job_name: pgscvmetrics_path: /metricsrelabel_configs:- source_labels: [__address__]regex: (.*)target_label: instancereplacement: $1- source_labels: [__address__]regex: (.*)target_label: __address__replacement: $1:9890- job_name: promethusstatic_configs:- targets:- localhost:9090#配置主机监控vi host.yml- labels:node_name: pg_node1service_name: pg_node1targets:- 192.168.210.15- labels:node_name: pg_node2service_name: pg_node2targets:- 192.168.210.81- labels:node_name: pg_node3service_name: pg_node3targets:- 192.168.210.33#配置PG监控vi pgscv.yml- labels:node_name: pg1service_name: pg1targets:- 192.168.210.15- labels:node_name: pg2service_name: pg2targets:- 192.168.210.81- labels:node_name: pg3service_name: pg3targets:- 192.168.210.33#配置启动文件cat >> /usr/lib/systemd/system/prometheus.service <<EOF[Unit]Description=Prometheus instanceWants=network-online.targetAfter=network-online.target#After=postgresql.service[Service]User=rootGroup=rootType=simpleRestart=on-failureWorkingDirectory=/data/monitor/prometheusRuntimeDirectory=prometheusRuntimeDirectoryMode=0750ExecStart=/data/monitor/prometheus/prometheus --storage.tsdb.retention=30d --web.enable-lifecycle --web.enable-admin-api --config.file=/data/monitor/prometheus/prometheus.ymlLimitNOFILE=10000TimeoutStopSec=20[Install]WantedBy=multi-user.targetEOF#暂不启动,等数据采集器和告警组件部署后再启动systemctl daemon-reloadsystemctl enable prometheus#systemctl start prometheus
安装Alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gztar xf alertmanager-0.22.2.linux-amd64.tar.gzmv alertmanager-0.22.2.linux-amd64 alertmanagerrm -f alertmanager-0.22.2.linux-amd64.tar.gz#配置参数vi alertmanager/alertmanager.ymlglobal:resolve_timeout: 5m #处理超时时间,默认为5minsmtp_smarthost: 'smtp.163.com:465'smtp_from: '[email protected]' #邮件发送地址smtp_auth_username: '[email protected]' #邮件发送地址用户名smtp_auth_password: 'HTBPT***********' #邮件发送地址授权码smtp_require_tls: falseroute:group_by: ['alertname']group_wait: 30sgroup_interval: 5mrepeat_interval: 1hreceiver: 'default'receivers:- name: 'default'email_configs:- to: '[email protected]'send_resolved: true#配置告警规则(上文的prometheus.yml中用到),这里是一条告警样例vi rules/memory_over.ymlgroups:- name: examplerules:- alert: 主机内存超限expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes))) * 100 > 80for: 1mlabels:severity: warningannotations:summary: "{{$labels.instance}}: 主机内存使用超过告警限制"description: "{{$labels.instance}}: 内存使用率超过80% (当前值是:{{ $value }})"#启动nohup ./alertmanager --config.file=alertmanager.yml &
#web查看
http://192.168.210.15:9093
安装数据采集器
#安装主机监控数据采集器node_exportercd /data/monitor/prometheus/pluginwget https://github.com/prometheus/node_exporter/releases/download/v1.2.1/node_exporter-1.2.1.linux-amd64.tar.gztar xf node_exporter-1.2.1.linux-amd64.tar.gzmv node_exporter-1.2.1.linux-amd64 node_exporter#启动nohup node_exporter/node_exporter &#安装PG监控数据采集器pgSCVwget https://github.com/weaponry/pgscv/releases/download/0.7.1/pgscv_0.7.1_linux_amd64.tar.gztar xf pgscv_0.7.1_linux_amd64.tar.gz -C postgres_exporter/#添加环境变量,也可以在启动是指定参数文件--config-file=,这样还可以在配置文件中添加自定义指标cat >> /etc/profile <<EOFexport PGSCV_LISTEN_ADDRESS="0.0.0.0:9890"export POSTGRES_DSN="postgresql://db_user:[email protected]:5432/postgres"EOFsource /etc/profile#启动nohup ./postgres_exporter/pgscv &
#查看node_exporter采集的指标数据
http://192.168.210.15:9100/metrics
#查看pgscv采集的指标数据
http://192.168.210.15:9890/metrics
启动Prometheus
http://192.168.210.15:9090
配置Grafana
配置数据源
注意:如果你的dashboards是从线下导入的json,数据源的Name需要和你json模板中的datasource一致
在线导入dashboards
导入主机监控dashboards:https://grafana.com/grafana/dashboards/8919
按照上面步骤再导入PG监控的dashboards(https://grafana.com/grafana/dashboards/14540)
另外https://github.com/percona/grafana-dashboards/releases中还有丰富的dashboards,可根据自己的需要导入,及安装对应的采集器。当然也可以自己定义面板,这个要求比较高,可以拷贝一份模板出来修改。
查看主机监控面板
查看PG监控面板,关注的指标基本都包括了
查看告警邮件(把阀值调低,触发告警)
#参考文章
https://grafana.com/grafana/dashboards
https://github.com/weaponry/pgscv/wiki
https://github.com/prometheus/prometheus
https://www.cnblogs.com/ilifeilong/p/10543876.html
PostgreSQL中文社区欢迎广大技术人员投稿
投稿邮箱:[email protected]
