K8S攻略之HPA 自动水平伸缩
按需所用,分为三种模式:
VPA: 纵向扩容
HPA:横向扩容,是指运行在k8s上的应用负载(POD),可以根据资源使用率进行自动扩容、缩容,它依赖metrics-server服务pod使用资源指标收集;我们知道应用的资源使用率通常都有高峰和低谷,所以k8s的HPA特性应运而生;它也是最能体现区别于传统运维的优势之一,不仅能够弹性伸缩,而且完全自动化!
基于QPS指标的扩容
案例:基于pod的cpu利用率来自动扩容pod数量。
注意:使用HPA前我们要确保K8s集群的dns服务和metrics服务是正常运行的,并且我们所创建的服务需要配置指标分配。
# pod内资源分配的配置格式如下:
# 默认可以只配置requests,但根据生产中的经验,建议把limits资源限制也加上,
# 因为对K8s来说,只有这两个都配置了且配置的值都要一样,这个pod资源的优先级才是最高的,
# 在node资源不够的情况下,首先是把没有任何资源分配配置的pod资源给干掉,其次是只配置了requests的,最后才是两个都配置的情况,仔细品品
resources:
limits: # 限制单个pod最多能使用1核(1000m 毫核)cpu以及2G内存
cpu: "1"
memory: 2Gi
requests: # 保证这个pod初始就能分配这么多资源
cpu: "1"
memory: 2Gi
创建HPA: 不做配置改动
# 为deployment资源web创建hpa,pod数量上限3个,最低1个,在pod平均CPU达到50%后开始扩容
kubectl autoscale deployment web --max=3 --min=1 --cpu-percent=50
horizontalpodautoscaler.autoscaling/web autoscaled
[root@node-1 ~]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web01 Deployment/web01 <unknown>/50% 1 3 1 69s
#过一会看下这个hpa资源的描述
# 下面提示说到,HPA缺少最小资源分配的request参数
[root@node-1 ~]# kubectl describe hpa web01
Name: web01
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Wed, 25 Aug 2021 20:59:58 +0800
Reference: Deployment/web01
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 1
Max replicas: 3
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 3s (x9 over 2m5s) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu
Warning FailedComputeMetricsReplicas 3s (x9 over 2m5s) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu
我们现在以上面创建的deployment资源web来实践下hpa的效果,首先用我们学到的方法导出web的yaml配置,并增加资源分配配置增加:
# cat web.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
image: nginx
name: nginx
resources:
limits: # 因为我这里是测试环境,所以这里CPU只分配50毫核(0.05核CPU)和20M的内存
cpu: "50m"
memory: 20Mi
requests: # 保证这个pod初始就能分配这么多资源
cpu: "50m"
memory: 20Mi
更新web资源
# kubectl apply -f web.yaml
deployment.apps/web configured
[root@node-1 wgw-tmp]# kubectl apply -f web01.yaml
deployment.apps/web01 unchanged
创建HPA
# kubectl autoscale deployment web --max=3 --min=1 --cpu-percent=50
horizontalpodautoscaler.autoscaling/web autoscaled
# 等待一会,可以看到相关的hpa信息(K8s上metrics服务收集所有pod资源的时间间隔大概在60s的时间)
# kubectl get hpa -w
我们来模拟业务流量增长,看看hpa自动伸缩的效果:
# 我们启动一个临时pod,来模拟大量请求
# kubectl run -it --rm busybox --image=busybox -- sh
/ # while :;do wget -q -O- http://172.20.217.78;done
# 等待2 ~ 3分钟,注意k8s为了避免频繁增删pod,对副本的增加速度有限制
[root@node-3 ~]# kubectl get hpa web -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web Deployment/web 2%/50% 1 3 1 15m
web Deployment/web 18%/50% 1 3 1 15m
web Deployment/web 100%/50% 1 3 1 16m
web Deployment/web 100%/50% 1 3 2 16m
web Deployment/web 51%/50% 1 3 2 17m
# 自动扩容了一台web服务器
[root@node-1 wgw-tmp]# kubectl get pod
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 6m30s
web-6485bd5c6b-7pg89 1/1 Running 0 2m17s
web-6485bd5c6b-bwssr 1/1 Running 0 17m
# 看下hpa的描述信息下面的事件记录
# kubectl describe hpa web
Normal SuccessfulRescale 3m25s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
好了,HPA的自动扩容已经见过了,现在停掉压测,观察下HPA的自动收缩功能:
# 可以看到,在业务流量高峰下去后,HPA并不急着马上收缩pod数量,
# 而是等待5分钟后,再进行收敛,这是稳妥的作法,是k8s为了避免频繁增删pod的一种手段
[root@node-3 ~]# kubectl get hpa web -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web Deployment/web 0%/50% 1 3 2 23m
web Deployment/web 0%/50% 1 3 2 23m
web Deployment/web 0%/50% 1 3 1 23m
# 查看伸缩过程。
kubectl describe deployments.apps web
Pod Template:
Labels: app=web
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Limits:
cpu: 50m
memory: 20Mi
Requests:
cpu: 50m
memory: 20Mi
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: web-6485bd5c6b (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 27m deployment-controller Scaled up replica set web-96d5df5c8 to 1
Normal ScalingReplicaSet 25m deployment-controller Scaled up replica set web-6485bd5c6b to 1
Normal ScalingReplicaSet 25m deployment-controller Scaled down replica set web-96d5df5c8 to 0
Normal ScalingReplicaSet 10m deployment-controller Scaled up replica set web-6485bd5c6b to 2
Normal ScalingReplicaSet 3m31s deployment-controller Scaled down replica set web-6485bd5c6b to 1