kube-system 네임스페이스의 Pod에 대한 CrashLoopBackOff는 작동하는 것 같습니다.

kube-system 네임스페이스의 Pod에 대한 CrashLoopBackOff는 작동하는 것 같습니다.

클러스터 정보:

Kubernetes version: v1.28.2
Cloud being used: Virtualbox
Installation method: Kubernetes Cluster VirtualBox
Host OS: Ubuntu 22.04.3 LTS
CNI and version: calico
CRI and version: containerd://1.7.2

클러스터에는 마스터 노드 1개와 작업자 노드 2개가 포함되어 있습니다. 클러스터는 잠시 동안(시작 후 1~2분) 양호해 보입니다.

lab@master:~$ kubectl -nkube-system get po -o wide
NAME                                       READY   STATUS             RESTARTS          AGE     IP              NODE       NOMINATED NODE   READINESS GATES
calico-kube-controllers-7ddc4f45bc-4qx7l   1/1     Running            12 (2m11s ago)    13d     10.10.219.98    master     <none>           <none>
calico-node-bqlnm                          1/1     Running            3 (2m11s ago)     4d2h    192.168.1.164   master     <none>           <none>
calico-node-mrd86                          1/1     Running            105 (2d20h ago)   4d2h    192.168.1.165   worker01   <none>           <none>
calico-node-r6w9s                          1/1     Running            110 (2d20h ago)   4d2h    192.168.1.166   worker02   <none>           <none>
coredns-5dd5756b68-njtpf                   1/1     Running            11 (2m11s ago)    13d     10.10.219.100   master     <none>           <none>
coredns-5dd5756b68-pxn8l                   1/1     Running            10 (2m11s ago)    13d     10.10.219.99    master     <none>           <none>
etcd-master                                1/1     Running            67 (2m11s ago)    13d     192.168.1.164   master     <none>           <none>
kube-apiserver-master                      1/1     Running            43 (2m11s ago)    13d     192.168.1.164   master     <none>           <none>
kube-controller-manager-master             1/1     Running            47 (2m11s ago)    13d     192.168.1.164   master     <none>           <none>
kube-proxy-ffnzb                           1/1     Running            122 (95s ago)     12d     192.168.1.165   worker01   <none>           <none>
kube-proxy-hf4mx                           1/1     Running            108 (78s ago)     12d     192.168.1.166   worker02   <none>           <none>
kube-proxy-ql576                           1/1     Running            15 (2m11s ago)    13d     192.168.1.164   master     <none>           <none>
kube-scheduler-master                      1/1     Running            46 (2m11s ago)    13d     192.168.1.164   master     <none>           <none>
metrics-server-54cb77cffd-q292x            0/1     CrashLoopBackOff   68 (18s ago)      3d21h   10.10.30.94     worker02   <none>           <none>

그러나 몇 분 후 kube-system 네임스페이스의 포드가 스래싱/크래싱을 시작했습니다.

lab@master:~$ kubectl -nkube-system get po
NAME                                       READY   STATUS             RESTARTS          AGE
calico-kube-controllers-7ddc4f45bc-4qx7l   1/1     Running            12 (19m ago)      13d
calico-node-bqlnm                          0/1     Running            3 (19m ago)       4d2h
calico-node-mrd86                          0/1     CrashLoopBackOff   111 (2m28s ago)   4d2h
calico-node-r6w9s                          0/1     CrashLoopBackOff   116 (2m15s ago)   4d2h
coredns-5dd5756b68-njtpf                   1/1     Running            11 (19m ago)      13d
coredns-5dd5756b68-pxn8l                   1/1     Running            10 (19m ago)      13d
etcd-master                                1/1     Running            67 (19m ago)      13d
kube-apiserver-master                      1/1     Running            43 (19m ago)      13d
kube-controller-manager-master             1/1     Running            47 (19m ago)      13d
kube-proxy-ffnzb                           0/1     CrashLoopBackOff   127 (42s ago)     12d
kube-proxy-hf4mx                           0/1     CrashLoopBackOff   113 (2m17s ago)   12d
kube-proxy-ql576                           1/1     Running            15 (19m ago)      13d
kube-scheduler-master                      1/1     Running            46 (19m ago)      13d
metrics-server-54cb77cffd-q292x            0/1     CrashLoopBackOff   73 (64s ago)      3d22h

포드 설명을 확인하면 중복된 이벤트가 표시됩니다. 무엇이 잘못되었는지 전혀 알 수 없습니다.

lab@master:~$ kubectl -nkube-system logs kube-proxy-ffnzb
.
.
.
Events:
  Type     Reason          Age                      From     Message
  ----     ------          ----                     ----     -------
  Normal   Killing         2d20h (x50 over 3d1h)    kubelet  Stopping container kube-proxy
  Warning  BackOff         2d20h (x1146 over 3d1h)  kubelet  Back-off restarting failed container kube-proxy in pod kube-proxy-ffnzb_kube-system(79f808ba-f450-4103-80a9-0e75af2e77cf)
  Normal   Pulled          8m11s (x3 over 10m)      kubelet  Container image "registry.k8s.io/kube-proxy:v1.28.6" already present on machine
  Normal   Created         8m10s (x3 over 10m)      kubelet  Created container kube-proxy
  Normal   Started         8m10s (x3 over 10m)      kubelet  Started container kube-proxy
  Normal   SandboxChanged  6m56s (x4 over 10m)      kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Killing         4m41s (x4 over 10m)      kubelet  Stopping container kube-proxy
  Warning  BackOff         12s (x28 over 10m)       kubelet  Back-off restarting failed container kube-proxy in pod kube-proxy-ffnzb_kube-system(79f808ba-f450-4103-80a9-0e75af2e77cf)

노트! 이러한 상황으로 인해 일부 샘플 배포(nginx)를 배포하는 데 방해가 되지는 않습니다. 안정적으로 실행되는 것 같습니다. 그러나 메트릭 서버를 추가하려고 시도했지만 이 서버가 충돌했습니다(아마도 kube-system 네임스페이스의 CrashLoopBackOff 포드와 관련이 있을 것임).

무엇이 잘못되었을 수 있는지/문제를 해결하기 위해 어디로 갈 수 있는지에 대한 아이디어가 있습니까?

답변1

SystemdCgroup구성 파일을 확인하라는 메시지가 표시되었습니다 containerd. 다음과 같은이 링크.

/etc/containerd/config.toml제 경우에는 마스터 노드에서 누락된 것으로 나타났습니다 .

  • 생성하세요:
    sudo containerd config default | sudo tee /etc/containerd/config.toml
    
  • 다음 변경 사항 SystemdCgroup = true/etc/containerd/config.toml
  • containerd서비스를 다시 시작합니다 .
    systemctl restart containerd
    

그러나 이로 인해 내 클러스터는 다음과 같은 상태가 됩니다.

lab@master:~$ kubectl -nkube-system get po
The connection to the server master:6443 was refused - did you specify the right host or port?
lab@master:~$ kubectl get nodes
The connection to the server master:6443 was refused - did you specify the right host or port?

복원 false하고 재부팅했습니다 containerd. 그러나 작업자 노드에서는 true.

이로 인해 문제가 해결되었습니다.

lab@master:~$ kubectl -nkube-system get po -o wide
NAME                                       READY   STATUS    RESTARTS       AGE    IP              NODE       NOMINATED NODE   READINESS GATES
calico-kube-controllers-7ddc4f45bc-4qx7l   1/1     Running   8 (18m ago)    14d    10.10.219.86    master     <none>           <none>
calico-node-c4rxp                          1/1     Running   7 (14m ago)    89m    192.168.1.166   worker02   <none>           <none>
calico-node-dhzr8                          1/1     Running   7 (18m ago)    14d    192.168.1.164   master     <none>           <none>
calico-node-wqv8w                          1/1     Running   1 (14m ago)    27m    192.168.1.165   worker01   <none>           <none>
coredns-5dd5756b68-njtpf                   1/1     Running   7 (18m ago)    14d    10.10.219.88    master     <none>           <none>
coredns-5dd5756b68-pxn8l                   1/1     Running   6 (18m ago)    14d    10.10.219.87    master     <none>           <none>
etcd-master                                1/1     Running   62 (18m ago)   14d    192.168.1.164   master     <none>           <none>
kube-apiserver-master                      1/1     Running   38 (18m ago)   14d    192.168.1.164   master     <none>           <none>
kube-controller-manager-master             1/1     Running   42 (18m ago)   14d    192.168.1.164   master     <none>           <none>
kube-proxy-mgsdr                           1/1     Running   7 (14m ago)    89m    192.168.1.166   worker02   <none>           <none>
kube-proxy-ql576                           1/1     Running   10 (18m ago)   14d    192.168.1.164   master     <none>           <none>
kube-proxy-zl68t                           1/1     Running   8 (14m ago)    106m   192.168.1.165   worker01   <none>           <none>
kube-scheduler-master                      1/1     Running   41 (18m ago)   14d    192.168.1.164   master     <none>           <none>
metrics-server-98bc7f888-xtdxd             1/1     Running   7 (14m ago)    99m    10.10.5.8       worker01   <none>           <none>

참고 사항: 또한 비활성화했습니다 apparmor(마스터 및 작업자).

sudo systemctl stop apparmor && sudo systemctl disable apparmor

관련 정보