1.HPA介绍

HPA 的全称为（Horizontal Pod Autoscaling）它可以根据当前 pod 资源的使用率（如 cpu、磁盘、内存等），进行副本数的动态的扩容与缩容，以便减轻各个 pod 的压力。当 pod 负载达到一定的阈值后，会根据扩缩容的策略生成更多新的 pod 来分担压力，当 pod 的使用比较空闲时，在稳定空闲一段时间后，还会自动减少 pod 的副本数量。

官方文档
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
https://github.com/kubernetes-sigs/metrics-server

2.HPA原理

Kubernetes HPA和阿里云的弹性伸缩Auto Scalling 很类似

使用弹性伸缩（Auto Scaling），您可以根据业务需求和策略设置伸缩规则，在业务需求增长时自动为您增加ECS实例以保证计算能力，在业务需求下降时自动减少ECS实例以节约成本。弹性伸缩不仅适合业务量不断波动的应用程序，
同时也适合业务量稳定的应用程序。

伸缩对象：

HPA可以对replication controller, deployment, replica set和stateful set进行自动伸缩，但是不能对daemon set进行自动伸缩。因为daemon set只能每个节点运行一个副本，不可伸缩。

3.HPA示例

3.1安装Metrics Server

HPA需要从Metrics Server中获取Pod的cpu和内存Metrics，用来判断是否达到自动伸缩阈值。

如果还没有安装Metrics Server，可以参考：https://www.cnblogs.com/wuxinchun/p/15273213.html

3.2创建Deployment

1）yaml文件

为了测试 HPA，这里将使用 PHP-apache , PHP-apache 主要是一个之后将通过请求访问该 Pod ，用来模拟请求的负载增加和减少，查看 Pod 的数量变化

[root@k8s-master metrics-server]# pwd
/root/k8s_practice/metrics-server
[root@k8s-master metrics-server]# cat PHP-apache.yaml 
apiVersion: apps/v1
kind: Deployment
Metadata:
  name: PHP-apache
spec:
  selector:
    matchLabels:
      run: PHP-apache
  replicas: 1
  template:
    Metadata:
      labels:
        run: PHP-apache
    spec:
      containers:
        - name: PHP-apache
          image: "registry.cn-shenzhen.aliyuncs.com/cookcodeblog/hpa-example:latest"
          ports:
            - containerPort: 80
          resources:
            limits:
              cpu: 500m
            requests:
              cpu: 200m
---
apiVersion: v1
kind: Service
Metadata:
  name: PHP-apache
  labels:
    run: PHP-apache
spec:
  ports:
    - port: 80
  selector:
    run: PHP-apache

2）创建PHP-apache并验证

[root@k8s-master metrics-server]# kubectl apply -f PHP-apache.yaml
[root@k8s-master metrics-server]# kubectl get deploy,svc PHP-apache 
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/PHP-apache   1/1     1            1           2m26s

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/PHP-apache   ClusterIP   10.100.133.191   <none>        80/TCP    2m26s
[root@k8s-master metrics-server]# kubectl get pod -l run=PHP-apache -o wide
NAME                          READY   STATUS    RESTARTS   AGE    IP            NODE        NOMINATED NODE   READInesS GATES
PHP-apache-5b58575b9d-bc56q   1/1     Running   0          3m6s   10.244.1.38   k8s-node1   <none>           <none>

3.3创建HPA

1）创建默认创建的HPA名称和需要自动伸缩的对象名一致

# 可以通过--name来指定HPA名称
[root@k8s-master metrics-server]# kubectl autoscale deployment PHP-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/PHP-apache autoscaled

注：为deployment PHP-apache 创建HPA，其中最小副本数为1，最大副本数为10，保持该deployment的所有Pod的平均cpu使用率不超过50%

在本例中，deployment的pod的resources.request.cpu为200m （200 milli-cores vcpu)，所以HPA将保持所有Pod的平均cpu使用率不超过100m。

2）通过kubectl top pods查看pod的cpu使用情况。

[root@k8s-master metrics-server]# kubectl get hpa PHP-apache
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
PHP-apache   Deployment/PHP-apache   0%/50%    1         10        1          17s

注：如果TARGETS列值格式为<acutal>/<expected>, 如果actual值总是为unkown，则表示无法从Metrics Server中获取指标值。请参见上面的“安装Metrics Server”章节

HPA默认每15秒从Metrics Server取一下指标来判断是否要自动伸缩：
The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager’s 
--horizontal-pod-autoscaler-sync-period flag (with a default value of 15 seconds).

Metrics Server采集指标的默认间隔为60秒：
Default 60 seconds, can be changed using metrics-resolution flag. We are not recommending setting values below 15s, 
as this is the resolution of metrics calculated within Kubelet.

3.3模拟增加 负载

1）模拟增加负载

打开一个新的Terminal，创建一个临时的pod load-generator，并在该pod中向PHP-apache服务发起不间断循环请求，模拟增加PHP-apache的负载（cpu使用率）。

[root@k8s-master metrics-server]# kubectl run -i --tty load-generator --rm --image=busyBox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://PHP-apache; done"
If you don't see a command prompt, try pressing enter.
OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!O

注：如果无法通过http://PHP-apache访问服务，需要检查kube-dns和网络是否配置正确，可以参考：https://cookcode.blog.csdn.net/article/details/109424100

2）模拟压力测试几分钟后，观察HPA：

[root@k8s-master metrics-server]# kubectl get hpa PHP-apache
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
PHP-apache   Deployment/PHP-apache   75%/50%   1         10        5          10m

可以看到TARGETS（cpu使用率）的acutal值已升高到75% (超过了期望值50%)，副本数REPLICAS也从1自动扩容到了5。

注：如果Kubernetes集群worker节点的cpu资源已经不足，HPA自动扩容会失败，新扩容的pod会一直处在Pending状态。通过kubectl describe命令查看pod详细信息时，会看到“Insufficient cpu"的错误信息。

3）观察deployment和pod：

[root@k8s-master metrics-server]# kubectl get deployment PHP-apache
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
PHP-apache   8/8     8            8           16m
[root@k8s-master metrics-server]# kubectl get pods -l run=PHP-apache -o wide
NAME                          READY   STATUS    RESTARTS   AGE     IP            NODE        NOMINATED NODE   READInesS GATES
PHP-apache-5b58575b9d-5kxzq   1/1     Running   0          2m45s   10.244.2.40   k8s-node2   <none>           <none>
PHP-apache-5b58575b9d-6r4hn   1/1     Running   0          4m46s   10.244.2.39   k8s-node2   <none>           <none>
PHP-apache-5b58575b9d-bc56q   1/1     Running   0          17m     10.244.1.38   k8s-node1   <none>           <none>
PHP-apache-5b58575b9d-bf2lk   1/1     Running   0          4m31s   10.244.1.40   k8s-node1   <none>           <none>
PHP-apache-5b58575b9d-gs4gl   1/1     Running   0          2m45s   10.244.1.41   k8s-node1   <none>           <none>
PHP-apache-5b58575b9d-jrd72   1/1     Running   0          4m46s   10.244.2.38   k8s-node2   <none>           <none>
PHP-apache-5b58575b9d-nhb8q   1/1     Running   0          4m46s   10.244.1.39   k8s-node1   <none>           <none>
PHP-apache-5b58575b9d-rm8dm   1/1     Running   0          2m45s   10.244.2.41   k8s-node2   <none>           <none>

4）发现HPA观测到的cpu实际使用率已降低到46%(小于目标值50%)：

[root@k8s-master metrics-server]# kubectl get hpa PHP-apache
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
PHP-apache   Deployment/PHP-apache   46%/50%   1         10        8          13m

注：也就是HPA通过自动扩容到8个副本，来分摊了负载，使得所有Pod的平均cpu使用率保持在目标值内

可以通过kubectl describe hpa PHP-apache查看HPA自动伸缩的事件

3.4模拟减少负载

在运行load-generator的Terminal，按下Ctrl + C来终止进程。

等待几分钟后，观察HPA：

[root@k8s-master metrics-server]# kubectl get hpa PHP-apache
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
PHP-apache   Deployment/PHP-apache   0%/50%    1         10        1          22m

注：Kuberntes为了保证缩容时业务不中断，和防止频繁伸缩导致系统抖动，scaledown一次后需要等待一段时间才能再次scaledown，也叫伸缩冷却(cooldown)。默认伸缩冷却时间为5分钟。

通过kubectl describe hpa PHP-apache查看HPA自动伸缩的事件，可以看到“horizontal-pod-autoscaler New size: 1; reason: All metrics below target”的事件。

如果观察HPA没有scale down，需要再等待一段时间。

Kuberntes为了保证缩容时业务不中断，和防止频繁伸缩导致系统抖动，scaledown一次后需要等待一段时间才能再次scaledown，也叫伸缩冷却(cooldown)。

默认伸缩冷却时间为5分钟。

--horizontal-pod-autoscaler-downscale-stabilization: The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).

参见：https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-cooldown-delay

类似的，可以参考阿里云自动伸缩的伸缩冷却时间

Kubernetes自动横向扩展HPA详解