【技术干货】K8S系统的监控及HPA控制器
一、资源监控及资源指标
Kubernetes有多个数据指标需要采集相关的数据,而这些指标大体上由监控集群本身和监控Pod对象两部分组成,监控集群需要监控节点资源状态、节点数量、运行的pod数量;监控Pod资源对象需要监控kubernetes指标,容器指标和应用程序指标。
二、资源指标及应用
拉取资源git clone https://github.com/kubernetes-incubator/metrics-server替换metrics-server/deploy/1.8+/metrics-server-deployment.yaml的镜像地址为deploy/1.8+/metrics-server-deployment.yamlsed -i "s#k8s.gcr.io/metrics-server-amd64:v0.3.3#mirrorgooglecontainers/metrics-server-amd64:v0.3.1#g" /tmp/metrics-server/deploy/1.8+/metrics-server-deployment.yaml
# 在配置文件metrics-server/deploy/1.8+/metrics-server-deployment.yaml添加启动参数command:- /metrics-server- --metric-resolution=30s- --kubelet-insecure-tls- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP# 创建资源]# kubectl create -f metrics-server/deploy/1.8+/metrics-server-deployment.yamlserviceaccount/metrics-server createddeployment.extensions/metrics-server created# 查看相关pod资源状态]# kubectl get pods -n kube-system -l k8s-app=metrics-server -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESmetrics-server-8445cfc4db-m4nff 1/1 Running 0 6m47s 10.244.2.63 node02.dayi123.com <none> <none># 检查资源的可用性]# kubectl proxy --port=8080Starting to serve on 127.0.0.1:8080# 另开一个窗口检查node节点可用性]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/nodes{"kind": "NodeMetricsList","apiVersion": "metrics.k8s.io/v1beta1","metadata": {"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"},"items": [{"metadata": {"name": "master01.dayi123.com","selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/master01.dayi123.com","creationTimestamp": "2019-06-13T11:02:57Z"},"timestamp": "2019-06-13T11:03:19Z","window": "30s","usage": {"cpu": "222451436n","memory": "1129748Ki"}. . . . . .# 获取集群上所有pod对象的相关资源消耗数据]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods
# 显示各节点的资源使用信息]# kubectl top nodesNAME CPU(cores) CPU% MEMORY(bytes) MEMORY%master01.dayi123.com 200m 10% 1106Mi 64%node01.dayi123.com 44m 4% 649Mi 46%node02.dayi123.com 84m 4% 929Mi 54%# 显示符合条件的pod的资源使用信息]# kubectl top pods -n kube-system -l k8s-app=kubernetes-dashboardNAME CPU(cores) MEMORY(bytes)kubernetes-dashboard-76479d66bb-9xhkf 1m 20Mi
三、Prometheus
# 将kubernetes源码下载至本地]# wget https://github.com/kubernetes/kubernetes/releases/download/v1.13.2/kubernetes.tar.gz]# tar -xf kubernetes.tar.gz
# 修改镜像地址]# cd kubernetes/cluster/addons/prometheus/]# sed -i "s#k8s.gcr.io/addon-resizer:1.8.4#googlecontainer/addon-resizer:1.8.4#g" kube-state-metrics-deployment.yaml# 部署安装kube-state-metrices]# kubectl apply -f kube-state-metrics-rbac.yaml]# kubectl create -f kube-state-metrics-deployment.yaml]# kubectl create -f kube-state-metrics-service.yaml# 查看相关pod的运行状况]# kubectl get pods -n kube-system -l k8s-app=kube-state-metricsNAME READY STATUS RESTARTS AGEkube-state-metrics-7f8bd888d9-cph86 2/2 Running 0 3m25s# 创建一个客户端去测试]# kubectl run client-$RANDOM --image=cirros -it --rm -- sh/ # curl -s kube-state-metrics.kube-system:8080/metrics | tailkube_service_spec_type{namespace="default",service="glusterfs-dynamic-d4149b1c-7ad2-11e9-8957-000c29063a23",type="ClusterIP"} 1kube_service_spec_type{namespace="default",service="kubernetes",type="ClusterIP"} 1
# 安装node_exporter# kubectl create -f node-exporter-ds.yml]# kubectl create -f node-exporter-service.yaml# 查看创建的node_exporter]# kubectl get pods -n kube-system -l k8s-app=node-exporterNAME READY STATUS RESTARTS AGEnode-exporter-9pz7b 1/1 Running 0 15snode-exporter-q9449 1/1 Running 0 15s# 测试node_exporter]# curl node01:9100/metrics | more% Total % Received % Xferd Average Speed Time Time Time CurrentDload Upload Total Spent Left Speed0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0# HELP go_gc_duration_seconds A summary of the GC invocation durations.# TYPE go_gc_duration_seconds summarygo_gc_duration_seconds{quantile="0"} 6.5346e-05go_gc_duration_seconds{quantile="0.25"} 6.5346e-05go_gc_duration_seconds{quantile="0.5"} 6.5346e-05
Prometheus的告警功能由两个步骤实现,首先是Prometheus服务器根据告警规则将告警信息发送给Alertmanager,然后由Alertmanager对收到的告警信息进行处理。
# 查看可供使用的pv]# kubectl get pvNAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGEalert-pv01 2Gi RWO Retain Available
# 创建pvc并查看创建的pvc]# sed -i “s@storageClassName: nfs@#storageClassName: nfs @g”]# kubectl apply -f alertmanager-pvc.yaml]# kubectl get pvc -n kube-systemNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGEalertmanager Bound alert-pv01 2Gi RWO 11m# 创建其他资源]# kubectl apply -f alertmanager-configmap.yaml]# kubectl apply -f alertmanager-deployment.yaml# 创建service时先修改service类型为NodePort]# sed -i "s#ClusterIP#NodePort#g" alertmanager-service.yaml]# sed -i "s#ClusterIP#NodePort#g" alertmanager-service.yaml# 查看创建的svc]# kubectl get svc -n kube-system -l kubernetes.io/name=AlertmanagerNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEalertmanager NodePort 10.106.61.11 <none> 80:32442/TCP 2m51s
# 先创建configmap和rbac]# kubectl apply -f prometheus-rbac.yamlserviceaccount/prometheus createdclusterrole.rbac.authorization.k8s.io/prometheus createdclusterrolebinding.rbac.authorization.k8s.io/prometheus created]# kubectl apply -f prometheus-configmap.yaml
# 创建statefulset控制器前先查看可使用的存储类kubectl get StorageClassNAME PROVISIONER AGEglusterfs kubernetes.io/glusterfs 24d# 修改alertmanager-service.yaml中存储供给如下volumeClaimTemplates:metadata:name: prometheus-dataspec:storageClassName: glusterfsaccessModes:ReadWriteOnceresources:requests:storage: "5Gi"# 创建该资源kubectl create -f prometheus-statefulset.yamlkubectl get pods -n kube-system -l k8s-app=prometheusNAME READY STATUS RESTARTS AGE2/2 Running 0 7m44s# 修改service类型为NodePortcat prometheus-service.yaml | tail -1type: "NodePort"# 创建service并查看kubectl create -f prometheus-service.yamlkubectl get svc -n kube-system -l kubernetes.io/name=PrometheusNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEprometheus NodePort 10.111.119.52 <none> 9090:30143/TCP 48s
四、HPA控制器
由于手动调整各Pod控制器的副本数量存在一定的滞后性,对此,kubernetes提供了多种弹性的伸缩工具。主要的工具有HPA、CA、VPA、AR。
先创建一个deployment控制器kubectl run mynginx --image=nginx:1.12 --replicas=2 --requests='cpu=5m,memory=128Mi' --limits='cpu=50m,memory=128Mi' --labels='app=mynginx' --expose --port=80创建HPA控制器自动管控Pod副本kubectl autoscale deploy mynginx --min=2 --max=5 --cpu-percent=60
]# cat hpav2-mynginx.yamlapiVersion: autoscaling/v2beta1kind: HorizontalPodAutoscalermetadata:name: mynginxspec:scaleTargetRef:apiVersion: apps/v1kind: Deploymentname: mynginxminReplicas: 2maxReplicas: 10metrics:- type: Resourceresource:name: cputargetAverageUtilization: 50- type: Resourceresource:name: memorytargetAverageValue: 50Mi
spec中嵌套的个字段的说明如下:
领取福利
文章来源网络 侵删
