【技术干货】K8S系统的监控及HPA控制器
一、资源监控及资源指标
Kubernetes有多个数据指标需要采集相关的数据,而这些指标大体上由监控集群本身和监控Pod对象两部分组成,监控集群需要监控节点资源状态、节点数量、运行的pod数量;监控Pod资源对象需要监控kubernetes指标,容器指标和应用程序指标。
二、资源指标及应用
拉取资源
clone https://github.com/kubernetes-incubator/metrics-server git
替换metrics-server/deploy/1.8+/metrics-server-deployment.yaml的镜像地址为deploy/1.8+/metrics-server-deployment.yaml
"s#k8s.gcr.io/metrics-server-amd64:v0.3.3#mirrorgooglecontainers/metrics-server-amd64:v0.3.1#g" /tmp/metrics-server/deploy/1.8+/metrics-server-deployment.yaml sed -i
# 在配置文件metrics-server/deploy/1.8+/metrics-server-deployment.yaml添加启动参数
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
# 创建资源
]# kubectl create -f metrics-server/deploy/1.8+/metrics-server-deployment.yaml
serviceaccount/metrics-server created
deployment.extensions/metrics-server created
# 查看相关pod资源状态
]# kubectl get pods -n kube-system -l k8s-app=metrics-server -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
metrics-server-8445cfc4db-m4nff 1/1 Running 0 6m47s 10.244.2.63 node02.dayi123.com <none> <none>
# 检查资源的可用性
]# kubectl proxy --port=8080
Starting to serve on 127.0.0.1:8080
# 另开一个窗口检查node节点可用性
]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/nodes
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
},
"items": [
{
"metadata": {
"name": "master01.dayi123.com",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/master01.dayi123.com",
"creationTimestamp": "2019-06-13T11:02:57Z"
},
"timestamp": "2019-06-13T11:03:19Z",
"window": "30s",
"usage": {
"cpu": "222451436n",
"memory": "1129748Ki"
}
. . . . . .
# 获取集群上所有pod对象的相关资源消耗数据
]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods
# 显示各节点的资源使用信息
]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master01.dayi123.com 200m 10% 1106Mi 64%
node01.dayi123.com 44m 4% 649Mi 46%
node02.dayi123.com 84m 4% 929Mi 54%
# 显示符合条件的pod的资源使用信息
]# kubectl top pods -n kube-system -l k8s-app=kubernetes-dashboard
NAME CPU(cores) MEMORY(bytes)
kubernetes-dashboard-76479d66bb-9xhkf 1m 20Mi
三、Prometheus
# 将kubernetes源码下载至本地
]# wget https://github.com/kubernetes/kubernetes/releases/download/v1.13.2/kubernetes.tar.gz
]# tar -xf kubernetes.tar.gz
# 修改镜像地址
]# cd kubernetes/cluster/addons/prometheus/
]# sed -i "s#k8s.gcr.io/addon-resizer:1.8.4#googlecontainer/addon-resizer:1.8.4#g" kube-state-metrics-deployment.yaml
# 部署安装kube-state-metrices
]# kubectl apply -f kube-state-metrics-rbac.yaml
]# kubectl create -f kube-state-metrics-deployment.yaml
]# kubectl create -f kube-state-metrics-service.yaml
# 查看相关pod的运行状况
]# kubectl get pods -n kube-system -l k8s-app=kube-state-metrics
NAME READY STATUS RESTARTS AGE
kube-state-metrics-7f8bd888d9-cph86 2/2 Running 0 3m25s
# 创建一个客户端去测试
]# kubectl run client-$RANDOM --image=cirros -it --rm -- sh
/ # curl -s kube-state-metrics.kube-system:8080/metrics | tail
kube_service_spec_type{namespace="default",service="glusterfs-dynamic-d4149b1c-7ad2-11e9-8957-000c29063a23",type="ClusterIP"} 1
kube_service_spec_type{namespace="default",service="kubernetes",type="ClusterIP"} 1
# 安装node_exporter
# kubectl create -f node-exporter-ds.yml
]# kubectl create -f node-exporter-service.yaml
# 查看创建的node_exporter
]# kubectl get pods -n kube-system -l k8s-app=node-exporter
NAME READY STATUS RESTARTS AGE
node-exporter-9pz7b 1/1 Running 0 15s
node-exporter-q9449 1/1 Running 0 15s
# 测试node_exporter
]# curl node01:9100/metrics | more
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 6.5346e-05
go_gc_duration_seconds{quantile="0.25"} 6.5346e-05
go_gc_duration_seconds{quantile="0.5"} 6.5346e-05
Prometheus的告警功能由两个步骤实现,首先是Prometheus服务器根据告警规则将告警信息发送给Alertmanager,然后由Alertmanager对收到的告警信息进行处理。
# 查看可供使用的pv
]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
alert-pv01 2Gi RWO Retain Available
# 创建pvc并查看创建的pvc
]# sed -i “s@storageClassName: nfs@#storageClassName: nfs @g”
]# kubectl apply -f alertmanager-pvc.yaml
]# kubectl get pvc -n kube-system
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alertmanager Bound alert-pv01 2Gi RWO 11m
# 创建其他资源
]# kubectl apply -f alertmanager-configmap.yaml
]# kubectl apply -f alertmanager-deployment.yaml
# 创建service时先修改service类型为NodePort
]# sed -i "s#ClusterIP#NodePort#g" alertmanager-service.yaml
]# sed -i "s#ClusterIP#NodePort#g" alertmanager-service.yaml
# 查看创建的svc
]# kubectl get svc -n kube-system -l kubernetes.io/name=Alertmanager
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager NodePort 10.106.61.11 <none> 80:32442/TCP 2m51s
# 先创建configmap和rbac
]# kubectl apply -f prometheus-rbac.yaml
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
]# kubectl apply -f prometheus-configmap.yaml
# 创建statefulset控制器前先查看可使用的存储类
kubectl get StorageClass
NAME PROVISIONER AGE
glusterfs kubernetes.io/glusterfs 24d
# 修改alertmanager-service.yaml中存储供给如下
volumeClaimTemplates:
metadata:
name: prometheus-data
spec:
storageClassName: glusterfs
accessModes:
ReadWriteOnce
resources:
requests:
storage: "5Gi"
# 创建该资源
kubectl create -f prometheus-statefulset.yaml
kubectl get pods -n kube-system -l k8s-app=prometheus
NAME READY STATUS RESTARTS AGE
2/2 Running 0 7m44s
# 修改service类型为NodePort
cat prometheus-service.yaml | tail -1
type: "NodePort"
# 创建service并查看
kubectl create -f prometheus-service.yaml
kubectl get svc -n kube-system -l kubernetes.io/name=Prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus NodePort 10.111.119.52 <none> 9090:30143/TCP 48s
四、HPA控制器
由于手动调整各Pod控制器的副本数量存在一定的滞后性,对此,kubernetes提供了多种弹性的伸缩工具。主要的工具有HPA、CA、VPA、AR。
先创建一个deployment控制器
'cpu=5m,memory=128Mi' --limits='cpu=50m,memory=128Mi' --labels='app=mynginx' --expose --port=80 kubectl run mynginx --image=nginx:1.12 --replicas=2 --requests=
创建HPA控制器自动管控Pod副本
kubectl autoscale deploy mynginx --min=2 --max=5 --cpu-percent=60
]# cat hpav2-mynginx.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: mynginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mynginx
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 50
- type: Resource
resource:
name: memory
targetAverageValue: 50Mi
spec中嵌套的个字段的说明如下:
领取福利
文章来源网络 侵删