vlambda博客
学习文章列表

使用Grafana+Prometheus搭建PostgreSQL监控系统

PostgreSQL的监控方案很多,这里介绍一个比较炫酷的监控方案(Grafana+Prometheus+Alertmanager+pgSCV),这些组件的介绍可自行百度,此文不作介绍,本文重点介绍如何安装和整合这些组件。

下图以Prometheus为核心的一个系统监控架构图(图片来自Prometheus官网)


服务器列表:

节点名 IP 操作系统 安装软件 备注
pg_node1 192.168.210.15 CentOS 7.6 Prometheus_2.28.1/grafana_8.1.0/pgscv_0.7.1/node_exporter_1.2.1/alertmanager-0.22.2 部署监控服务,被监控节点
pg_node2 192.168.210.81 CentOS 7.6 pgscv_0.7.1/node_exporter-1.2.1 被监控节点
pg_node3 192.168.210.33 CentOS 7.6 pgscv_0.7.1/node_exporter-1.2.1 被监控节点


软件安装目录说明

软件 安装主机 安装目录
grafana 192.168.210.15 /data/monitor/grafana
prometheus 192.168.210.15 /data/monitor/prometheus
alertmanager 192.168.210.15 /data/monitor/prometheus/alertmanager
node_exporter 192.168.210.15/192.168.210.81/192.168.210.33 /data/monitor/prometheus/plugin/node_exporter
pgscv 192.168.210.15/192.168.210.81/192.168.210.33 /data/monitor/prometheus/plugin/postgres_exporter


主机配置

关闭selinux(所有节点)

sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/selinux/configsetenforce 0

配置防火墙

根据节点部署的软件开放对应的端口。

  • grafana:3000

  • prometheus:9090

  • node_exporter:9100

  • pgscv:9890

  • alertmanager:465/9093

firewall-cmd --add-port=9090/tcp --permanentfirewall-cmd --add-port=3000/tcp --permanentfirewall-cmd --add-port=9100/tcp --permanentfirewall-cmd --add-port=9093/tcp --permanentfirewall-cmd --add-port=465/tcp --permanentfirewall-cmd --add-port=9890/tcp --permanentfirewall-cmd --reloadfirewall-cmd --list-all


安装Grafana

mkdir -p /data/monitor/wget https://dl.grafana.com/oss/release/grafana-8.1.0.linux-amd64.tar.gztar xf grafana-8.1.0.linux-amd64.tar.gzmv grafana-8.1.0 grafanarm -f grafana-8.1.0.linux-amd64.tar.gzmkdir -p /data/monitor/grafana/data/log
#修改相关参数vi grafana/conf/defaults.inidata = /data/monitor/grafana/datalogs = /data/monitor/grafana/data/logplugins = /var/lib/grafana/pluginsprovisioning = /data/monitor/grafana/conf/provisioning
#配置启动文件cat >> /usr/lib/systemd/system/grafana-server.service <<EOF[Unit]Description=Grafana instanceDocumentation=http://docs.grafana.orgWants=network-online.targetAfter=network-online.target#After=postgresql.service
[Service]User=rootGroup=rootType=simpleRestart=on-failureWorkingDirectory=/data/monitor/grafanaRuntimeDirectory=grafanaRuntimeDirectoryMode=0750ExecStart=/data/monitor/grafana/bin/grafana-server --config=/data/monitor/grafana/conf/defaults.iniLimitNOFILE=10000TimeoutStopSec=20
[Install]WantedBy=multi-user.targetEOF
#启动systemctl daemon-reloadsystemctl enable grafana-serversystemctl start grafana-server
#安装插件(dashboards会用到)./grafana/bin/grafana-cli plugins install digiapulssi-breadcrumb-panel./grafana/bin/grafana-cli plugins install grafana-polystat-panel./grafana/bin/grafana-cli plugins install yesoreyeram-boomtable-panel
#登录界面(默认账号密码admin/admin,初始登录会提示修改密码)http://192.168.210.15:3000

使用Grafana+Prometheus搭建PostgreSQL监控系统


安装Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gztar xf prometheus-2.28.1.linux-amd64.tar.gzmv prometheus-2.28.1.linux-amd64 prometheusrm -f prometheus-2.28.1.linux-amd64.tar.gz
#修改相关参数cd prometheusvi prometheus.yml
# my global configglobal: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s).
# Alertmanager configurationalerting: alertmanagers: - static_configs: - targets: - 192.168.210.15:9093 #Alertmanager访问信息
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files: - "rules/*.yml" #配置报警规则
# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:- file_sd_configs: # 注意,如果指定从某配置文件加载监控目标,则在prometheus启动之前需要确保该文件在prometheus的工作目录下事先存在,否则可能后续配置过程中出现报错 - files: - host.yml #下面会创建此文件 job_name: Host metrics_path: /metrics relabel_configs: - source_labels: [__address__] regex: (.*) target_label: instance replacement: $1 - source_labels: [__address__] regex: (.*) target_label: __address__ replacement: $1:9100- file_sd_configs: # 注意,如果指定从某配置文件加载监控目标,则在prometheus启动之前需要确保该文件在prometheus的工作目录下事先存在,否则可能后续>配置过程中出现报错 - files: - pgscv.yml #下面会创建此文件 job_name: pgscv metrics_path: /metrics relabel_configs: - source_labels: [__address__] regex: (.*) target_label: instance replacement: $1 - source_labels: [__address__] regex: (.*) target_label: __address__ replacement: $1:9890
- job_name: promethus static_configs: - targets: - localhost:9090
#配置主机监控vi host.yml- labels: node_name: pg_node1 service_name: pg_node1 targets: - 192.168.210.15- labels: node_name: pg_node2 service_name: pg_node2 targets: - 192.168.210.81- labels: node_name: pg_node3 service_name: pg_node3 targets: - 192.168.210.33
#配置PG监控vi pgscv.yml- labels: node_name: pg1 service_name: pg1 targets: - 192.168.210.15- labels: node_name: pg2 service_name: pg2 targets: - 192.168.210.81- labels: node_name: pg3 service_name: pg3 targets: - 192.168.210.33
#配置启动文件cat >> /usr/lib/systemd/system/prometheus.service <<EOF[Unit]Description=Prometheus instanceWants=network-online.targetAfter=network-online.target#After=postgresql.service
[Service]User=rootGroup=rootType=simpleRestart=on-failureWorkingDirectory=/data/monitor/prometheusRuntimeDirectory=prometheusRuntimeDirectoryMode=0750ExecStart=/data/monitor/prometheus/prometheus --storage.tsdb.retention=30d --web.enable-lifecycle --web.enable-admin-api --config.file=/data/monitor/prometheus/prometheus.ymlLimitNOFILE=10000TimeoutStopSec=20
[Install]WantedBy=multi-user.targetEOF
#暂不启动,等数据采集器和告警组件部署后再启动systemctl daemon-reloadsystemctl enable prometheus#systemctl start prometheus


安装Alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gztar xf alertmanager-0.22.2.linux-amd64.tar.gzmv alertmanager-0.22.2.linux-amd64 alertmanagerrm -f alertmanager-0.22.2.linux-amd64.tar.gz
#配置参数vi alertmanager/alertmanager.ymlglobal: resolve_timeout: 5m #处理超时时间,默认为5min smtp_smarthost: 'smtp.163.com:465' smtp_from: '[email protected]' #邮件发送地址 smtp_auth_username: '[email protected]' #邮件发送地址用户名 smtp_auth_password: 'HTBPT***********' #邮件发送地址授权码 smtp_require_tls: falseroute: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'default'receivers:- name: 'default' email_configs: - to: '[email protected]' send_resolved: true
#配置告警规则(上文的prometheus.yml中用到),这里是一条告警样例vi rules/memory_over.ymlgroups:- name: example rules: - alert: 主机内存超限 expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes))) * 100 > 80 for: 1m labels: severity: warning annotations: summary: "{{$labels.instance}}: 主机内存使用超过告警限制" description: "{{$labels.instance}}: 内存使用率超过80% (当前值是:{{ $value }})"
#启动nohup ./alertmanager --config.file=alertmanager.yml &

#web查看
http://192.168.210.15:9093


使用Grafana+Prometheus搭建PostgreSQL监控系统


安装数据采集器

#安装主机监控数据采集器node_exportercd /data/monitor/prometheus/pluginwget https://github.com/prometheus/node_exporter/releases/download/v1.2.1/node_exporter-1.2.1.linux-amd64.tar.gztar xf node_exporter-1.2.1.linux-amd64.tar.gzmv node_exporter-1.2.1.linux-amd64 node_exporter#启动nohup node_exporter/node_exporter &
#安装PG监控数据采集器pgSCVwget https://github.com/weaponry/pgscv/releases/download/0.7.1/pgscv_0.7.1_linux_amd64.tar.gztar xf pgscv_0.7.1_linux_amd64.tar.gz -C postgres_exporter/
#添加环境变量,也可以在启动是指定参数文件--config-file=,这样还可以在配置文件中添加自定义指标cat >> /etc/profile <<EOFexport PGSCV_LISTEN_ADDRESS="0.0.0.0:9890"export POSTGRES_DSN="postgresql://db_user:[email protected]:5432/postgres"EOFsource /etc/profile
#启动nohup ./postgres_exporter/pgscv &


#查看node_exporter采集的指标数据
http://192.168.210.15:9100/metrics


使用Grafana+Prometheus搭建PostgreSQL监控系统


#查看pgscv采集的指标数据
http://192.168.210.15:9890/metrics


使用Grafana+Prometheus搭建PostgreSQL监控系统


启动Prometheus

systemctl start prometheus
#WEB查看Prometheus
http://192.168.210.15:9090

使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统

使用Grafana+Prometheus搭建PostgreSQL监控系统

配置Grafana

  • 配置数据源

使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统


注意:如果你的dashboards是从线下导入的json,数据源的Name需要和你json模板中的datasource一致

在线导入dashboards

使用Grafana+Prometheus搭建PostgreSQL监控系统

导入主机监控dashboards:https://grafana.com/grafana/dashboards/8919


使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统

按照上面步骤再导入PG监控的dashboards(https://grafana.com/grafana/dashboards/14540)
另外https://github.com/percona/grafana-dashboards/releases中还有丰富的dashboards,可根据自己的需要导入,及安装对应的采集器。当然也可以自己定义面板,这个要求比较高,可以拷贝一份模板出来修改。


查看主机监控面板

使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统

查看PG监控面板,关注的指标基本都包括了

使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统
使用Grafana+Prometheus搭建PostgreSQL监控系统

查看告警邮件(把阀值调低,触发告警)

#参考文章
https://grafana.com/grafana/dashboards
https://github.com/weaponry/pgscv/wiki
https://github.com/prometheus/prometheus
https://www.cnblogs.com/ilifeilong/p/10543876.html






PostgreSQL中文社区欢迎广大技术人员投稿
投稿邮箱:[email protected]