응답 SSH를 통해 2000개 이상의 노드에 연결하는 방법은 무엇입니까?나는 다음을 발견했습니다 :
ssh
다음 상황 중 하나라도 발생하면 500~500대의 서버를 병렬로 실행하는 데 문제가 없습니다.
- 서버가 동일한 LAN에 있지 않습니다(즉, 서로 다른 라우터를 통해 라우팅됩니다).
- 서버는 동일한 머신(localhost)에 있는 Docker 컨테이너입니다.
- 30밀리초마다 1개 이상의 작업이 시작되지 않습니다.
그래서 이것들은 모두 작동합니다:
head -n 500 ext.ipaddr | parallel -j 500 ssh {} uptime
head -n 500 localhost.docker.ipaddr | parallel -j 500 ssh {} uptime
head -n 500 local.lan.docker.ipaddr | parallel --delay 0.03 -j 500 ssh {} uptime
내가 이해하지 못하는 것은 이것이 작동하지 않는 이유입니다.
head -n 500 local.lan.docker.ipaddr | parallel -j 500 ssh {} uptime
로컬 LAN의 서버에 500개의 Docker 컨테이너가 있고 ssh
지연이 없습니다(때때로 5개의 Docker 컨테이너에서만 문제가 발생함).
이렇게 하면 "호스트 경로 없음"이 많이 표시됩니다.
나는 그것이 arp와 관련이 있다는 결론에 도달했습니다.
작업 사례에서는 다음과 같은 결과를 얻습니다.
06:15:06.605997 ARP, Request who-has 172.24.0.113 tell 172.24.254.254, length 28
06:15:06.617110 ARP, Reply 172.24.0.113 is-at 02:42:ac:18:00:71, length 46
06:15:06.636660 ARP, Request who-has 172.24.0.115 tell 172.24.254.254, length 28
06:15:06.648457 ARP, Reply 172.24.0.115 is-at 02:42:ac:18:00:73, length 46
06:15:06.660832 ARP, Request who-has 172.22.0.116 tell 172.22.254.254, length 28
06:15:06.672328 ARP, Reply 172.22.0.116 is-at 02:42:ac:16:00:74, length 46
06:15:06.692116 ARP, Request who-has 172.21.0.117 tell 172.21.254.254, length 28
06:15:06.703215 ARP, Reply 172.21.0.117 is-at 02:42:ac:15:00:75, length 46
06:15:06.717891 ARP, Request who-has 172.23.0.117 tell 172.23.254.254, length 28
06:15:06.729403 ARP, Reply 172.23.0.117 is-at 02:42:ac:17:00:75, length 46
06:15:06.752089 ARP, Request who-has 172.24.0.114 tell 172.24.254.254, length 28
06:15:06.764744 ARP, Reply 172.24.0.114 is-at 02:42:ac:18:00:72, length 46
06:15:06.783677 ARP, Request who-has 172.24.0.116 tell 172.24.254.254, length 28
06:15:06.795258 ARP, Reply 172.24.0.116 is-at 02:42:ac:18:00:74, length 46
06:15:06.809392 ARP, Request who-has 172.23.0.118 tell 172.23.254.254, length 28
06:15:06.820770 ARP, Reply 172.23.0.118 is-at 02:42:ac:17:00:76, length 46
06:15:06.842422 ARP, Request who-has 172.21.0.118 tell 172.21.254.254, length 28
06:15:06.853491 ARP, Reply 172.21.0.118 is-at 02:42:ac:15:00:76, length 46
06:15:06.871436 ARP, Request who-has 172.22.0.117 tell 172.22.254.254, length 28
06:15:06.882957 ARP, Reply 172.22.0.117 is-at 02:42:ac:16:00:75, length 46
06:15:06.902872 ARP, Request who-has 172.23.0.120 tell 172.23.254.254, length 28
06:15:06.913643 ARP, Reply 172.23.0.120 is-at 02:42:ac:17:00:78, length 46
06:15:06.932819 ARP, Request who-has 172.21.0.119 tell 172.21.254.254, length 28
06:15:06.944045 ARP, Reply 172.21.0.119 is-at 02:42:ac:15:00:77, length 46
따라서 요청 후에는 즉시 응답이 이어집니다.
실패할 경우 다음을 얻습니다.
06:17:35.764287 ARP, Request who-has 172.21.0.169 tell 172.21.254.254, length 28
06:17:35.768654 ARP, Request who-has 172.22.0.169 tell 172.22.254.254, length 28
06:17:35.771642 ARP, Request who-has 172.24.0.169 tell 172.24.254.254, length 28
06:17:35.772369 ARP, Request who-has 172.24.0.109 tell 172.24.254.254, length 28
06:17:35.772384 ARP, Request who-has 172.23.0.110 tell 172.23.254.254, length 28
06:17:35.772387 ARP, Request who-has 172.21.0.111 tell 172.21.254.254, length 28
06:17:35.772388 ARP, Request who-has 172.22.0.109 tell 172.22.254.254, length 28
06:17:35.772395 ARP, Request who-has 172.23.0.107 tell 172.23.254.254, length 28
06:17:35.776378 ARP, Request who-has 172.22.0.108 tell 172.22.254.254, length 28
06:17:35.776398 ARP, Request who-has 172.24.0.108 tell 172.24.254.254, length 28
06:17:35.776401 ARP, Request who-has 172.23.0.106 tell 172.23.254.254, length 28
06:17:35.776408 ARP, Request who-has 172.21.0.109 tell 172.21.254.254, length 28
06:17:35.777417 ARP, Request who-has 172.21.0.170 tell 172.21.254.254, length 28
06:17:35.783320 ARP, Request who-has 172.24.0.170 tell 172.24.254.254, length 28
06:17:35.789594 ARP, Request who-has 172.21.0.171 tell 172.21.254.254, length 28
06:17:35.792286 ARP, Request who-has 172.22.0.171 tell 172.22.254.254, length 28
06:17:35.798649 ARP, Request who-has 172.24.0.171 tell 172.24.254.254, length 28
06:17:35.803277 ARP, Request who-has 172.23.0.173 tell 172.23.254.254, length 28
06:17:35.804366 ARP, Request who-has 172.23.0.112 tell 172.23.254.254, length 28
06:17:35.804383 ARP, Request who-has 172.23.0.113 tell 172.23.254.254, length 28
06:17:35.804385 ARP, Request who-has 172.24.0.110 tell 172.24.254.254, length 28
06:17:35.804387 ARP, Request who-has 172.21.0.112 tell 172.21.254.254, length 28
06:17:35.804388 ARP, Request who-has 172.22.0.112 tell 172.22.254.254, length 28
06:17:35.804389 ARP, Request who-has 172.21.0.114 tell 172.21.254.254, length 28
06:17:35.804390 ARP, Request who-has 172.22.0.111 tell 172.22.254.254, length 28
06:17:35.804391 ARP, Request who-has 172.23.0.109 tell 172.23.254.254, length 28
06:17:35.804393 ARP, Request who-has 172.23.0.108 tell 172.23.254.254, length 28
06:17:35.806772 ARP, Request who-has 172.22.0.170 tell 172.22.254.254, length 28
06:17:35.811874 ARP, Request who-has 172.22.0.172 tell 172.22.254.254, length 28
06:17:35.816238 ARP, Request who-has 172.21.0.172 tell 172.21.254.254, length 28
06:17:35.820150 ARP, Request who-has 172.23.0.174 tell 172.23.254.254, length 28
06:17:35.826595 ARP, Request who-has 172.23.0.175 tell 172.23.254.254, length 28
06:17:35.832707 ARP, Request who-has 172.21.0.173 tell 172.21.254.254, length 28
06:17:35.835588 ARP, Request who-has 172.23.0.176 tell 172.23.254.254, length 28
06:17:35.836369 ARP, Request who-has 172.23.0.114 tell 172.23.254.254, length 28
06:17:35.836384 ARP, Request who-has 172.24.0.112 tell 172.24.254.254, length 28
06:17:35.836392 ARP, Request who-has 172.21.0.113 tell 172.21.254.254, length 28
06:17:35.840372 ARP, Request who-has 172.21.0.115 tell 172.21.254.254, length 28
06:17:35.840394 ARP, Request who-has 172.22.0.110 tell 172.22.254.254, length 28
06:17:35.840397 ARP, Request who-has 172.23.0.111 tell 172.23.254.254, length 28
06:17:35.840400 ARP, Request who-has 172.24.0.111 tell 172.24.254.254, length 28
06:17:35.840408 ARP, Request who-has 172.22.0.113 tell 172.22.254.254, length 28
06:17:35.842467 ARP, Request who-has 172.24.0.172 tell 172.24.254.254, length 28
06:17:35.844844 ARP, Request who-has 172.22.0.173 tell 172.22.254.254, length 28
06:17:35.853446 ARP, Request who-has 172.21.0.174 tell 172.21.254.254, length 28
06:17:35.855394 ARP, Request who-has 172.24.0.173 tell 172.24.254.254, length 28
06:17:35.860520 ARP, Request who-has 172.23.0.178 tell 172.23.254.254, length 28
06:17:35.865012 ARP, Request who-has 172.21.0.175 tell 172.21.254.254, length 28
06:17:35.868369 ARP, Request who-has 172.22.0.116 tell 172.22.254.254, length 28
06:17:35.868391 ARP, Request who-has 172.23.0.116 tell 172.23.254.254, length 28
06:17:35.868394 ARP, Request who-has 172.22.0.115 tell 172.22.254.254, length 28
06:17:35.868395 ARP, Request who-has 172.21.0.117 tell 172.21.254.254, length 28
06:17:35.868397 ARP, Request who-has 172.24.0.113 tell 172.24.254.254, length 28
06:17:35.868398 ARP, Request who-has 172.23.0.115 tell 172.23.254.254, length 28
요청이 너무 많은데 답변이 없습니다. 이는 "호스트에 대한 경로 없음"을 설명합니다. 그러나 이는 새로운 질문을 제기합니다. 왜 응답이 없습니까?
dmesg
위 명령을 실행하면 컨테이너 서버의 syslog에는 아무 것도 표시되지 않습니다. no "서버가 arp 플러드 공격을 받아 arp 요청 응답이 중지되었습니다." 또는 이와 유사한 내용이 표시됩니다.
LAN은 1Gbit/s이며 다른 트래픽은 없습니다. 트래픽 점유 수준은 1Mbit/s입니다.
컨테이너 서버는 90% 유휴 상태이지만 여러 코어가 최대치에 도달한 것으로 top
표시됩니다 .ksoftirqd
top - 06:38:38 up 6:33, 4 users, load average: 3.94, 5.16, 4.34
Tasks: 17106 total, 7 running, 17098 sleeping, 1 stopped, 0 zombie
%Cpu(s): 0.8 us, 1.1 sy, 0.0 ni, 91.7 id, 0.0 wa, 0.0 hi, 6.4 si, 0.0 st
GiB Mem : 503.9 total, 162.4 free, 303.6 used, 37.9 buff/cache
GiB Swap: 200.0 total, 200.0 free, 0.0 used. 199.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25 root 20 0 0 0 0 R 100.0 0.0 24:13.88 ksoftirqd/2
31 root 20 0 0 0 0 R 100.0 0.0 1:39.16 ksoftirqd/3
37 root 20 0 0 0 0 R 99.8 0.0 14:31.28 ksoftirqd/4
49 root 20 0 0 0 0 R 99.8 0.0 22:29.80 ksoftirqd/6
2899 root 20 0 20.5g 1.3g 51704 S 29.2 0.3 171:22.30 dockerd
3230170 root 20 0 29912 23504 3428 R 25.7 0.0 0:39.58 top
37 root 20 0 0 0 0 R 100.0 0.0 14:23.61 ksoftirqd/4
이러한 최대화는 ssh
s가 실행될 때 지연 없이 정확히 발생합니다. 30밀리초 지연 후 코어는 최대치에 도달하지 않고 30% 속도로 실행됩니다.
따라서 가능한 설명은 ksoftirqd
arp 요청 서비스를 시작했지만 응답 요청을 완료하기 전에 새로운 요청으로 인해 중단되었다는 것입니다. 이 경우 잘못된 디자인처럼 보입니다. 컨테이너를 DoS하는 데 사용될 수 있습니다. arp 요청을 처리할 때 새로운 arp 요청을 단순히 무시하면 더 좋을 것입니다.
이게 설명인가요? 아니면 다른 이유가 있나요? 지연 이외의 해결 방법이 있나요?
서버와 클라이언트 모두 Ubuntu 20.04를 실행하고 있습니다.