클러스터 정보:
Kubernetes version: v1.28.2
Cloud being used: Virtualbox
Installation method: Kubernetes Cluster VirtualBox
Host OS: Ubuntu 22.04.3 LTS
CNI and version: calico
CRI and version: containerd://1.7.2
클러스터에는 마스터 노드 1개와 작업자 노드 2개가 포함되어 있습니다. 클러스터는 잠시 동안(시작 후 1~2분) 양호해 보입니다.
lab@master:~$ kubectl -nkube-system get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-7ddc4f45bc-4qx7l 1/1 Running 12 (2m11s ago) 13d 10.10.219.98 master <none> <none>
calico-node-bqlnm 1/1 Running 3 (2m11s ago) 4d2h 192.168.1.164 master <none> <none>
calico-node-mrd86 1/1 Running 105 (2d20h ago) 4d2h 192.168.1.165 worker01 <none> <none>
calico-node-r6w9s 1/1 Running 110 (2d20h ago) 4d2h 192.168.1.166 worker02 <none> <none>
coredns-5dd5756b68-njtpf 1/1 Running 11 (2m11s ago) 13d 10.10.219.100 master <none> <none>
coredns-5dd5756b68-pxn8l 1/1 Running 10 (2m11s ago) 13d 10.10.219.99 master <none> <none>
etcd-master 1/1 Running 67 (2m11s ago) 13d 192.168.1.164 master <none> <none>
kube-apiserver-master 1/1 Running 43 (2m11s ago) 13d 192.168.1.164 master <none> <none>
kube-controller-manager-master 1/1 Running 47 (2m11s ago) 13d 192.168.1.164 master <none> <none>
kube-proxy-ffnzb 1/1 Running 122 (95s ago) 12d 192.168.1.165 worker01 <none> <none>
kube-proxy-hf4mx 1/1 Running 108 (78s ago) 12d 192.168.1.166 worker02 <none> <none>
kube-proxy-ql576 1/1 Running 15 (2m11s ago) 13d 192.168.1.164 master <none> <none>
kube-scheduler-master 1/1 Running 46 (2m11s ago) 13d 192.168.1.164 master <none> <none>
metrics-server-54cb77cffd-q292x 0/1 CrashLoopBackOff 68 (18s ago) 3d21h 10.10.30.94 worker02 <none> <none>
그러나 몇 분 후 kube-system 네임스페이스의 포드가 스래싱/크래싱을 시작했습니다.
lab@master:~$ kubectl -nkube-system get po
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7ddc4f45bc-4qx7l 1/1 Running 12 (19m ago) 13d
calico-node-bqlnm 0/1 Running 3 (19m ago) 4d2h
calico-node-mrd86 0/1 CrashLoopBackOff 111 (2m28s ago) 4d2h
calico-node-r6w9s 0/1 CrashLoopBackOff 116 (2m15s ago) 4d2h
coredns-5dd5756b68-njtpf 1/1 Running 11 (19m ago) 13d
coredns-5dd5756b68-pxn8l 1/1 Running 10 (19m ago) 13d
etcd-master 1/1 Running 67 (19m ago) 13d
kube-apiserver-master 1/1 Running 43 (19m ago) 13d
kube-controller-manager-master 1/1 Running 47 (19m ago) 13d
kube-proxy-ffnzb 0/1 CrashLoopBackOff 127 (42s ago) 12d
kube-proxy-hf4mx 0/1 CrashLoopBackOff 113 (2m17s ago) 12d
kube-proxy-ql576 1/1 Running 15 (19m ago) 13d
kube-scheduler-master 1/1 Running 46 (19m ago) 13d
metrics-server-54cb77cffd-q292x 0/1 CrashLoopBackOff 73 (64s ago) 3d22h
포드 설명을 확인하면 중복된 이벤트가 표시됩니다. 무엇이 잘못되었는지 전혀 알 수 없습니다.
lab@master:~$ kubectl -nkube-system logs kube-proxy-ffnzb
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 2d20h (x50 over 3d1h) kubelet Stopping container kube-proxy
Warning BackOff 2d20h (x1146 over 3d1h) kubelet Back-off restarting failed container kube-proxy in pod kube-proxy-ffnzb_kube-system(79f808ba-f450-4103-80a9-0e75af2e77cf)
Normal Pulled 8m11s (x3 over 10m) kubelet Container image "registry.k8s.io/kube-proxy:v1.28.6" already present on machine
Normal Created 8m10s (x3 over 10m) kubelet Created container kube-proxy
Normal Started 8m10s (x3 over 10m) kubelet Started container kube-proxy
Normal SandboxChanged 6m56s (x4 over 10m) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Killing 4m41s (x4 over 10m) kubelet Stopping container kube-proxy
Warning BackOff 12s (x28 over 10m) kubelet Back-off restarting failed container kube-proxy in pod kube-proxy-ffnzb_kube-system(79f808ba-f450-4103-80a9-0e75af2e77cf)
노트! 이러한 상황으로 인해 일부 샘플 배포(nginx)를 배포하는 데 방해가 되지는 않습니다. 안정적으로 실행되는 것 같습니다. 그러나 메트릭 서버를 추가하려고 시도했지만 이 서버가 충돌했습니다(아마도 kube-system 네임스페이스의 CrashLoopBackOff 포드와 관련이 있을 것임).
무엇이 잘못되었을 수 있는지/문제를 해결하기 위해 어디로 갈 수 있는지에 대한 아이디어가 있습니까?
답변1
SystemdCgroup
구성 파일을 확인하라는 메시지가 표시되었습니다 containerd
. 다음과 같은이 링크.
/etc/containerd/config.toml
제 경우에는 마스터 노드에서 누락된 것으로 나타났습니다 .
- 생성하세요:
sudo containerd config default | sudo tee /etc/containerd/config.toml
- 다음 변경 사항
SystemdCgroup = true
은/etc/containerd/config.toml
containerd
서비스를 다시 시작합니다 .systemctl restart containerd
그러나 이로 인해 내 클러스터는 다음과 같은 상태가 됩니다.
lab@master:~$ kubectl -nkube-system get po
The connection to the server master:6443 was refused - did you specify the right host or port?
lab@master:~$ kubectl get nodes
The connection to the server master:6443 was refused - did you specify the right host or port?
복원 false
하고 재부팅했습니다 containerd
. 그러나 작업자 노드에서는 true
.
이로 인해 문제가 해결되었습니다.
lab@master:~$ kubectl -nkube-system get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-7ddc4f45bc-4qx7l 1/1 Running 8 (18m ago) 14d 10.10.219.86 master <none> <none>
calico-node-c4rxp 1/1 Running 7 (14m ago) 89m 192.168.1.166 worker02 <none> <none>
calico-node-dhzr8 1/1 Running 7 (18m ago) 14d 192.168.1.164 master <none> <none>
calico-node-wqv8w 1/1 Running 1 (14m ago) 27m 192.168.1.165 worker01 <none> <none>
coredns-5dd5756b68-njtpf 1/1 Running 7 (18m ago) 14d 10.10.219.88 master <none> <none>
coredns-5dd5756b68-pxn8l 1/1 Running 6 (18m ago) 14d 10.10.219.87 master <none> <none>
etcd-master 1/1 Running 62 (18m ago) 14d 192.168.1.164 master <none> <none>
kube-apiserver-master 1/1 Running 38 (18m ago) 14d 192.168.1.164 master <none> <none>
kube-controller-manager-master 1/1 Running 42 (18m ago) 14d 192.168.1.164 master <none> <none>
kube-proxy-mgsdr 1/1 Running 7 (14m ago) 89m 192.168.1.166 worker02 <none> <none>
kube-proxy-ql576 1/1 Running 10 (18m ago) 14d 192.168.1.164 master <none> <none>
kube-proxy-zl68t 1/1 Running 8 (14m ago) 106m 192.168.1.165 worker01 <none> <none>
kube-scheduler-master 1/1 Running 41 (18m ago) 14d 192.168.1.164 master <none> <none>
metrics-server-98bc7f888-xtdxd 1/1 Running 7 (14m ago) 99m 10.10.5.8 worker01 <none> <none>
참고 사항: 또한 비활성화했습니다 apparmor
(마스터 및 작업자).
sudo systemctl stop apparmor && sudo systemctl disable apparmor