如何解决Azure AKS 横向扩展
我有 03 个节点的 AKS,我尝试手动从 3 个节点扩展到 4 个节点。放大是好的。 大约 20 分钟后,所有 04 Node 都处于 NotReady Service 状态,所有 kube-system 服务都不是 Ready 状态。
NAME STATUS ROLES AGE VERSION
aks-agentpool-40760006-vmss000000 Ready agent 16m v1.18.14
aks-agentpool-40760006-vmss000001 Ready agent 17m v1.18.14
aks-agentpool-40760006-vmss000002 Ready agent 16m v1.18.14
aks-agentpool-40760006-vmss000003 Ready agent 11m v1.18.14
NAME STATUS ROLES AGE VERSION
aks-agentpool-40760006-vmss000000 NotReady agent 23m v1.18.14
aks-agentpool-40760006-vmss000002 NotReady agent 24m v1.18.14
aks-agentpool-40760006-vmss000003 NotReady agent 19m v1.18.14
k get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-748cdb7bf4-7frq2 0/1 Pending 0 10m
coredns-748cdb7bf4-vg5nn 0/1 Pending 0 10m
coredns-748cdb7bf4-wrhxs 1/1 Terminating 0 28m
coredns-autoscaler-868b684fd4-2gb8f 0/1 Pending 0 10m
kube-proxy-p6wmv 1/1 Running 0 28m
kube-proxy-sksz6 1/1 Running 0 23m
kube-proxy-vpb2g 1/1 Running 0 28m
metrics-server-58fdc875d5-sbckj 0/1 Pending 0 10m
tunnelfront-5d74798f6b-w6rvn 0/1 Pending 0 10m
节点日志显示:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
normal Starting 25m kubelet Starting kubelet.
normal NodeHasSufficientMemory 25m (x2 over 25m) kubelet Node aks-agentpool-40760006-vmss000000 status is Now: NodeHasSufficientMemory
normal NodeHasNodiskPressure 25m (x2 over 25m) kubelet Node aks-agentpool-40760006-vmss000000 status is Now: NodeHasNodiskPressure
normal NodeHasSufficientPID 25m (x2 over 25m) kubelet Node aks-agentpool-40760006-vmss000000 status is Now: NodeHasSufficientPID
normal NodeAllocatableEnforced 25m kubelet Updated Node Allocatable limit across pods
normal Starting 25m kube-proxy Starting kube-proxy.
normal NodeReady 24m kubelet Node aks-agentpool-40760006-vmss000000 status is Now: NodeReady
Warning FailedToCreateRoute 5m5s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 50.264754ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m55s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 45.945658ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m45s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 46.180158ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m35s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 46.550858ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m25s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 44.74355ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m15s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 42.428456ms: timed out waiting for the condition
Warning FailedToCreateRoute 4m5s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 41.664858ms: timed out waiting for the condition
Warning FailedToCreateRoute 3m55s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 48.456954ms: timed out waiting for the condition
Warning FailedToCreateRoute 3m45s route_controller Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 38.611964ms: timed out waiting for the condition
Warning FailedToCreateRoute 65s (x16 over 3m35s) route_controller (combined from similar events): Could not create route e496c1aa-be11-412b-b820-178d83b42f29 10.244.2.0/24 for node aks-agentpool-40760006-vmss000000 after 13.972487ms: timed out waiting for the condition
解决方法
您可以使用 cluster autoscaler 选项来避免将来出现此类情况。
为了满足 Azure Kubernetes 服务 (AKS) 中的应用需求, 您可能需要调整运行工作负载的节点数。 集群自动缩放器组件可以监视集群中的 pod 由于资源限制而无法安排。当问题 检测到,节点池中的节点数量增加以满足 应用需求。还会定期检查节点是否缺少 运行 Pod,然后根据需要减少节点数量。这 能够自动增加或减少您的节点数量 AKS 集群可让您运行高效、经济的集群。
您可以 Update an existing AKS cluster to enable the cluster autoscaler 以使用您当前的资源组。
az aks update \
--resource-group myResourceGroup \
--name myAKSCluster \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 3
,
现在看起来可以了。我没有扩大节点的权利。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。