如何解决Kubernetes API 容器不断死亡
我刚刚在 4 个 Armbian/Odroid_MC1 (Debian 10) 节点中从头开始安装了一个小型 Kubernetes 测试集群。安装过程就是这个1,没什么特别的,添加k8s apt repo并用apt安装。
问题是 API 服务器 不断地死掉,就像每 5 到 10 分钟一次,在 controller-manager 和 scheduler 死后一起,谁似乎同时停下了之前。显然,API 会在一分钟内无法使用。当循环重复时,所有三个服务都会重新启动,并且在接下来的四到九分钟内一切正常。日志在这里2。这是摘录:
$ kubectl get pods -o wide --all-namespaces
The connection to the server 192.168.1.91:6443 was refused - did you specify the right host or port?
(a minute later)
$ kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-74ff55c5b-8pm9r 1/1 Running 2 88m 10.244.0.7 mc1 <none> <none>
kube-system coredns-74ff55c5b-pxdqz 1/1 Running 2 88m 10.244.0.6 mc1 <none> <none>
kube-system etcd-mc1 1/1 Running 2 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-apiserver-mc1 0/1 Running 12 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-controller-manager-mc1 1/1 Running 5 31m 192.168.1.91 mc1 <none> <none>
kube-system kube-flannel-ds-fxg2s 1/1 Running 5 45m 192.168.1.94 mc4 <none> <none>
kube-system kube-flannel-ds-jvvmp 1/1 Running 5 48m 192.168.1.92 mc2 <none> <none>
kube-system kube-flannel-ds-qlvbc 1/1 Running 6 45m 192.168.1.93 mc3 <none> <none>
kube-system kube-flannel-ds-ssb9t 1/1 Running 3 77m 192.168.1.91 mc1 <none> <none>
kube-system kube-proxy-7t9ff 1/1 Running 2 45m 192.168.1.93 mc3 <none> <none>
kube-system kube-proxy-8jhc7 1/1 Running 2 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-proxy-cg75m 1/1 Running 2 45m 192.168.1.94 mc4 <none> <none>
kube-system kube-proxy-mq8j7 1/1 Running 2 48m 192.168.1.92 mc2 <none> <none>
kube-system kube-scheduler-mc1 1/1 Running 5 31m 192.168.1.91 mc1 <none> <none>
$ docker ps -a # (check the exited and restarted services)
CONTAINER ID NAMES STATUS IMAGE NETWORKS PORTS
0e179c6495db k8s_kube-apiserver_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_13 Up About a minute 66eaad223e2c
2ccb014beb73 k8s_kube-scheduler_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_6 Up 3 minutes 21e17680ca2d
3322f6ec1546 k8s_kube-controller-manager_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_6 Up 3 minutes a1ab72ce4ba2
583129da455f k8s_kube-apiserver_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_12 Exited (137) About a minute ago 66eaad223e2c
72268d8e1503 k8s_install-cni_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_0 Exited (0) 5 minutes ago 263b01b3ca1f
fe013d07f186 k8s_kube-controller-manager_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_5 Exited (255) 3 minutes ago a1ab72ce4ba2
34ef8757b63d k8s_kube-scheduler_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_5 Exited (255) 3 minutes ago 21e17680ca2d
fd8e0c0ba27f k8s_coredns_coredns-74ff55c5b-8pm9r_kube-system_3b813dc9-827d-4cf6-88cc-027491b350f1_2 Up 32 minutes 15c1a66b013b
f44e2c45ed87 k8s_coredns_coredns-74ff55c5b-pxdqz_kube-system_c3b7fbf2-2064-4f3f-b1b2-dec5dad904b7_2 Up 32 minutes 15c1a66b013b
04fa4eca1240 k8s_POD_coredns-74ff55c5b-8pm9r_kube-system_3b813dc9-827d-4cf6-88cc-027491b350f1_42 Up 32 minutes k8s.gcr.io/pause:3.2 none
f00c36d6de75 k8s_POD_coredns-74ff55c5b-pxdqz_kube-system_c3b7fbf2-2064-4f3f-b1b2-dec5dad904b7_42 Up 32 minutes k8s.gcr.io/pause:3.2 none
a1d6814e1b04 k8s_kube-flannel_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_3 Up 32 minutes 263b01b3ca1f
94b231456ed7 k8s_kube-proxy_kube-proxy-8jhc7_kube-system_cc637e27-3b14-41bd-9f04-c1779e500a3a_2 Up 33 minutes 377de0f45e5c
df91856450bd k8s_POD_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_2 Up 34 minutes k8s.gcr.io/pause:3.2 host
b480b844671a k8s_POD_kube-proxy-8jhc7_kube-system_cc637e27-3b14-41bd-9f04-c1779e500a3a_2 Up 34 minutes k8s.gcr.io/pause:3.2 host
1d4a7bcaad38 k8s_etcd_etcd-mc1_kube-system_14b7b6d6446e21cc57f0b40571ae3958_2 Up 35 minutes 2e91dde7e952
e5d517a9c29d k8s_POD_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_1 Up 35 minutes k8s.gcr.io/pause:3.2 host
3a3da7dbf3ad k8s_POD_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_2 Up 35 minutes k8s.gcr.io/pause:3.2 host
eef29cdebf5f k8s_POD_etcd-mc1_kube-system_14b7b6d6446e21cc57f0b40571ae3958_2 Up 35 minutes k8s.gcr.io/pause:3.2 host
3631d43757bc k8s_POD_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_1 Up 35 minutes k8s.gcr.io/pause:3.2 host
我在日志中没有看到奇怪的问题(我是 k8s 初学者)。直到一个月前它才起作用,当我重新安装它进行练习时,这可能是我的第十次安装尝试,我尝试了不同的选项、版本并在谷歌上搜索了很多,但找不到任何解决方案。
可能是什么原因?我还能尝试什么?我怎样才能找到问题的根源?
更新 2021/02/06
问题不再发生。显然,问题出在这种特定情况下的版本。没有提交问题,因为我没有找到有关要报告的具体问题的线索。
所有情况下的安装过程都是这样的:
# swapoff -a
# curl -sL get.docker.com|sh
# usermod -aG docker rodolfoap
# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
# echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
# apt-get update
# apt-get install -y kubeadm kubectl kubectx # Master
# kubeadm config images pull
# kubeadm init --apiserver-advertise-address=0.0.0.0 --pod-network-cidr=10.244.0.0/16
- Arbian-20.08.1 运行良好。从那以后,我的安装过程没有改变。
- Armbian-20.11.3 存在问题:API、调度程序和 coredns 每 5 分钟重新启动一次,平均每 8 分钟阻止对 API 5 的访问。
- Arbian-21.02.1 工作正常。在第一次安装时工作,相同的过程。
所有版本都更新到最后一个内核,在安装时,当前是 5.10.12-odroidxu4。
如您所见,大约两个小时后,没有 API 重新启动:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE LABELS
kube-system coredns-74ff55c5b-gnvf2 1/1 Running 0 173m 10.244.0.2 mc1 k8s-app=kube-dns,pod-template-hash=74ff55c5b
kube-system coredns-74ff55c5b-wvnnz 1/1 Running 0 173m 10.244.0.3 mc1 k8s-app=kube-dns,pod-template-hash=74ff55c5b
kube-system etcd-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=etcd,tier=control-plane
kube-system kube-apiserver-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-apiserver,tier=control-plane
kube-system kube-controller-manager-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-controller-manager,tier=control-plane
kube-system kube-flannel-ds-c4jgv 1/1 Running 0 123m 192.168.1.93 mc3 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-flannel-ds-cl6n5 1/1 Running 0 75m 192.168.1.94 mc4 app=flannel,tier=node
kube-system kube-flannel-ds-z2nmw 1/1 Running 0 75m 192.168.1.92 mc2 app=flannel,tier=node
kube-system kube-flannel-ds-zqxh7 1/1 Running 0 150m 192.168.1.91 mc1 app=flannel,tier=node
kube-system kube-proxy-bd596 1/1 Running 0 75m 192.168.1.94 mc4 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-n6djp 1/1 Running 0 75m 192.168.1.92 mc2 controller-revision-hash=b89db7f56,pod-template-generation=1
kube-system kube-proxy-rf4cr 1/1 Running 0 173m 192.168.1.91 mc1 controller-revision-hash=b89db7f56,pod-template-generation=1
kube-system kube-proxy-xhl95 1/1 Running 0 123m 192.168.1.93 mc3 controller-revision-hash=b89db7f56,pod-template-generation=1
kube-system kube-scheduler-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-scheduler,tier=control-plane
集群功能齐全:)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。