일부 EC2 서버에서는 며칠 또는 몇 주 동안 메모리 누수가 발생했습니다. 결국 free
에는 및 같은 도구 에 따라 수 기가바이트의 메모리를 사용하게 되며 htop
서버를 다시 시작하지 않으면 프로세스가 OOM에 의해 종료되기 시작합니다.
그러한 서버 중 하나에는 15GB의 메모리가 있습니다. 출력은 다음과 같습니다 free -m
.
total used free shared buffers cached
Mem: 15039 3921 11118 0 0 7
-/+ buffers/cache: 3913 11126
Swap: 0 0 0
서버가 유휴 상태입니다. 대부분의 사용자 영역 프로세스를 종료했습니다. htop의 어떤 프로세스도 >100k VIRT를 표시하지 않습니다. 최근에 실행했는데 echo 3 > /proc/sys/vm/drop_caches
효과가 없었습니다(이유는 다음과 같습니다. buffers
크기 cached
가 너무 작았습니다). 또한:
- 주변을 탐색해도 유망한 내용이 표시되지 않았습니다
/proc/slabinfo
.slabtop
- /run/shm에는 아무것도 없습니다
출력은 다음과 같습니다 cat /proc/meminfo
.
MemTotal: 15400880 kB
MemFree: 11385688 kB
Buffers: 564 kB
Cached: 7792 kB
SwapCached: 0 kB
Active: 27668 kB
Inactive: 2012 kB
Active(anon): 21368 kB
Inactive(anon): 380 kB
Active(file): 6300 kB
Inactive(file): 1632 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 21380 kB
Mapped: 7208 kB
Shmem: 380 kB
Slab: 39260 kB
SReclaimable: 16456 kB
SUnreclaim: 22804 kB
KernelStack: 1352 kB
PageTables: 2872 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 7700440 kB
Committed_AS: 39072 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 30336 kB
VmallocChunk: 34359691552 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 36864 kB
DirectMap2M: 15822848 kB
MemFree
와 MemTotal
다른 meminfo 측정항목이 설명할 수 없는 큰 격차가 있음을 알 수 있습니다 .
이 메모리가 어디로 가는지, 추가로 디버깅할 수 있는 방법은 무엇입니까?
추가 서버 정보(해당되는 경우):
$ lsb_release -d
Description: Ubuntu 14.04.1 LTS
$ uname -r
3.13.0-36-generic
고쳐 쓰다:추가 명령과 출력은 다음과 같습니다.
# dmesg | fgrep 'Memory:'
[ 0.000000] Memory: 15389980K/15728244K available (7373K kernel code, 1144K rwdata, 3404K rodata, 1336K init, 1440K bss, 338264K reserved)
# awk '{print $2 " " $1}' /proc/modules | sort -nr | head -5
106678 psmouse
97812 raid6_pq
86484 raid456
69418 floppy
55624 aesni_intel
# cat /proc/mounts | grep tmp
udev /dev devtmpfs rw,relatime,size=7695004k,nr_inodes=1923751,mode=755 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=1540088k,mode=755 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
# df -h /dev /run /sys/fs/cgroup /run/lock /run/shm /run/user
Filesystem Size Used Avail Use% Mounted on
udev 7.4G 12K 7.4G 1% /dev
tmpfs 1.5G 368K 1.5G 1% /run
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 7.4G 0 7.4G 0% /run/shm
none 100M 0 100M 0% /run/user
업데이트 2:전체 출력은 다음과 같습니다 ps aux
.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 33636 2368 ? Ss 2015 0:03 /sbin/init
root 2 0.0 0.0 0 0 ? S 2015 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 2015 0:11 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< 2015 0:00 [kworker/0:0H]
root 7 0.0 0.0 0 0 ? S 2015 1:31 [rcu_sched]
root 8 0.0 0.0 0 0 ? S 2015 0:30 [rcuos/0]
root 9 0.0 0.0 0 0 ? S 2015 0:25 [rcuos/1]
root 10 0.0 0.0 0 0 ? S 2015 0:33 [rcuos/2]
root 11 0.0 0.0 0 0 ? S 2015 0:25 [rcuos/3]
root 12 0.0 0.0 0 0 ? S 2015 0:14 [rcuos/4]
root 13 0.0 0.0 0 0 ? S 2015 0:14 [rcuos/5]
root 14 0.0 0.0 0 0 ? S 2015 0:14 [rcuos/6]
root 15 0.0 0.0 0 0 ? S 2015 0:33 [rcuos/7]
root 16 0.0 0.0 0 0 ? S 2015 0:00 [rcuos/8]
root 17 0.0 0.0 0 0 ? S 2015 0:00 [rcuos/9]
root 18 0.0 0.0 0 0 ? S 2015 0:00 [rcuos/10]
root 19 0.0 0.0 0 0 ? S 2015 0:00 [rcuos/11]
root 20 0.0 0.0 0 0 ? S 2015 0:00 [rcuos/12]
root 21 0.0 0.0 0 0 ? S 2015 0:00 [rcuos/13]
root 22 0.0 0.0 0 0 ? S 2015 0:00 [rcuos/14]
root 23 0.0 0.0 0 0 ? S 2015 0:00 [rcu_bh]
root 24 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/0]
root 25 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/1]
root 26 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/2]
root 27 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/3]
root 28 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/4]
root 29 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/5]
root 30 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/6]
root 31 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/7]
root 32 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/8]
root 33 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/9]
root 34 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/10]
root 35 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/11]
root 36 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/12]
root 37 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/13]
root 38 0.0 0.0 0 0 ? S 2015 0:00 [rcuob/14]
root 39 0.0 0.0 0 0 ? S 2015 0:01 [migration/0]
root 40 0.0 0.0 0 0 ? S 2015 0:06 [watchdog/0]
root 41 0.0 0.0 0 0 ? S 2015 0:05 [watchdog/1]
root 42 0.0 0.0 0 0 ? S 2015 0:00 [migration/1]
root 43 0.0 0.0 0 0 ? S 2015 0:08 [ksoftirqd/1]
root 45 0.0 0.0 0 0 ? S< 2015 0:00 [kworker/1:0H]
root 46 0.0 0.0 0 0 ? S 2015 0:05 [watchdog/2]
root 47 0.0 0.0 0 0 ? S 2015 0:01 [migration/2]
root 48 0.0 0.0 0 0 ? S 2015 0:08 [ksoftirqd/2]
root 50 0.0 0.0 0 0 ? S< 2015 0:00 [kworker/2:0H]
root 51 0.0 0.0 0 0 ? S 2015 0:06 [watchdog/3]
root 52 0.0 0.0 0 0 ? S 2015 0:01 [migration/3]
root 53 0.0 0.0 0 0 ? S 2015 0:17 [ksoftirqd/3]
root 55 0.0 0.0 0 0 ? S< 2015 0:00 [kworker/3:0H]
root 56 0.0 0.0 0 0 ? S 2015 0:07 [watchdog/4]
root 57 0.0 0.0 0 0 ? S 2015 0:01 [migration/4]
root 58 0.0 0.0 0 0 ? S 2015 0:02 [ksoftirqd/4]
root 60 0.0 0.0 0 0 ? S< 2015 0:00 [kworker/4:0H]
root 61 0.0 0.0 0 0 ? S 2015 0:06 [watchdog/5]
root 62 0.0 0.0 0 0 ? S 2015 0:01 [migration/5]
root 63 0.0 0.0 0 0 ? S 2015 0:07 [ksoftirqd/5]
root 65 0.0 0.0 0 0 ? S< 2015 0:00 [kworker/5:0H]
root 66 0.0 0.0 0 0 ? S 2015 0:06 [watchdog/6]
root 67 0.0 0.0 0 0 ? S 2015 0:01 [migration/6]
root 68 0.0 0.0 0 0 ? S 2015 0:04 [ksoftirqd/6]
root 70 0.0 0.0 0 0 ? S< 2015 0:00 [kworker/6:0H]
root 71 0.0 0.0 0 0 ? S 2015 0:06 [watchdog/7]
root 72 0.0 0.0 0 0 ? S 2015 0:02 [migration/7]
root 73 0.0 0.0 0 0 ? S 2015 0:17 [ksoftirqd/7]
root 74 0.0 0.0 0 0 ? S 2015 0:14 [kworker/7:0]
root 75 0.0 0.0 0 0 ? S< 2015 0:00 [kworker/7:0H]
root 76 0.0 0.0 0 0 ? S< 2015 0:00 [khelper]
root 77 0.0 0.0 0 0 ? S 2015 0:00 [kdevtmpfs]
root 78 0.0 0.0 0 0 ? S< 2015 0:00 [netns]
root 79 0.0 0.0 0 0 ? S 2015 0:00 [xenwatch]
root 80 0.0 0.0 0 0 ? S 2015 0:00 [xenbus]
root 81 0.0 0.0 0 0 ? S 2015 0:39 [kworker/0:1]
root 82 0.0 0.0 0 0 ? S< 2015 0:00 [writeback]
root 83 0.0 0.0 0 0 ? S< 2015 0:00 [kintegrityd]
root 84 0.0 0.0 0 0 ? S< 2015 0:00 [bioset]
root 86 0.0 0.0 0 0 ? S< 2015 0:00 [kblockd]
root 88 0.0 0.0 0 0 ? S< 2015 0:00 [ata_sff]
root 89 0.0 0.0 0 0 ? S 2015 0:00 [khubd]
root 90 0.0 0.0 0 0 ? S< 2015 0:00 [md]
root 91 0.0 0.0 0 0 ? S< 2015 0:00 [devfreq_wq]
root 92 0.0 0.0 0 0 ? S 2015 0:12 [kworker/1:1]
root 95 0.0 0.0 0 0 ? S 2015 0:10 [kworker/4:1]
root 97 0.0 0.0 0 0 ? S 2015 0:11 [kworker/6:1]
root 99 0.0 0.0 0 0 ? S 2015 0:00 [khungtaskd]
root 100 0.0 0.0 0 0 ? S 2015 7:26 [kswapd0]
root 101 0.0 0.0 0 0 ? SN 2015 0:00 [ksmd]
root 102 0.0 0.0 0 0 ? SN 2015 0:29 [khugepaged]
root 103 0.0 0.0 0 0 ? S 2015 0:00 [fsnotify_mark]
root 104 0.0 0.0 0 0 ? S 2015 0:00 [ecryptfs-kthrea]
root 105 0.0 0.0 0 0 ? S< 2015 0:00 [crypto]
root 117 0.0 0.0 0 0 ? S< 2015 0:00 [kthrotld]
root 119 0.0 0.0 0 0 ? S 2015 0:00 [scsi_eh_0]
root 120 0.0 0.0 0 0 ? S 2015 0:00 [scsi_eh_1]
root 141 0.0 0.0 0 0 ? S< 2015 0:00 [deferwq]
root 142 0.0 0.0 0 0 ? S< 2015 0:00 [charger_manager]
root 199 0.0 0.0 0 0 ? S< 2015 0:00 [kpsmoused]
root 223 0.0 0.0 0 0 ? S< 2015 0:00 [bioset]
root 265 0.0 0.0 0 0 ? S< 2015 0:00 [raid5wq]
root 291 0.0 0.0 0 0 ? S 2015 0:22 [jbd2/xvda1-8]
root 292 0.0 0.0 0 0 ? S< 2015 0:00 [ext4-rsv-conver]
root 445 0.0 0.0 0 0 ? S 2015 0:16 [jbd2/md0-8]
root 446 0.0 0.0 0 0 ? S< 2015 0:00 [ext4-rsv-conver]
root 516 0.0 0.0 19604 564 ? S 2015 0:00 upstart-udev-bridge --daemon
root 522 0.0 0.0 49864 1048 ? Ss 2015 0:00 /lib/systemd/systemd-udevd --daemon
root 671 0.0 0.0 15256 408 ? S 2015 0:00 upstart-socket-bridge --daemon
root 800 0.0 0.0 10220 2900 ? Ss 2015 0:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -l
message+ 1048 0.0 0.0 39224 1048 ? Ss 2015 0:00 dbus-daemon --system --fork
root 1077 0.0 0.0 0 0 ? S Jan04 0:00 [kworker/u30:2]
root 1082 0.0 0.0 43448 1196 ? Ss 2015 0:00 /lib/systemd/systemd-logind
root 1116 0.0 0.0 15272 512 ? S 2015 0:00 upstart-file-bridge --daemon
root 1339 0.0 0.0 14536 412 tty4 Ss+ 2015 0:00 /sbin/getty -8 38400 tty4
root 1344 0.0 0.0 14536 416 tty5 Ss+ 2015 0:00 /sbin/getty -8 38400 tty5
root 1360 0.0 0.0 14536 408 tty2 Ss+ 2015 0:00 /sbin/getty -8 38400 tty2
root 1361 0.0 0.0 14536 416 tty3 Ss+ 2015 0:00 /sbin/getty -8 38400 tty3
root 1363 0.0 0.0 14536 404 tty6 Ss+ 2015 0:00 /sbin/getty -8 38400 tty6
root 1418 0.0 0.0 61364 1296 ? Ss 2015 0:07 /usr/sbin/sshd -D
root 1432 0.0 0.0 23652 552 ? Ss 2015 0:02 cron
daemon 1433 0.0 0.0 19136 180 ? Ss 2015 0:00 atd
root 1461 0.0 0.0 19316 644 ? Ss 2015 1:57 /usr/sbin/irqbalance
root 1518 0.0 0.0 4364 404 ? Ss 2015 0:00 acpid -c /etc/acpi/events -s /var/run/acpid.
root 1521 0.0 0.0 0 0 ? S 2015 0:00 [kworker/5:1]
root 1641 0.0 0.0 0 0 ? S Jan04 0:00 [kworker/u30:1]
root 1863 0.0 0.0 14536 404 tty1 Ss+ 2015 0:00 /sbin/getty -8 38400 tty1
root 1864 0.0 0.0 12784 388 ttyS0 Ss+ 2015 0:00 /sbin/getty -8 38400 ttyS0
ntp 2075 0.0 0.0 31448 1252 ? Ss 2015 1:17 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 10
root 2087 0.0 0.0 0 0 ? S 2015 0:00 [kauditd]
ubuntu 2393 0.0 0.0 105628 2028 ? S Jan04 0:00 sshd: ubuntu@notty
root 2496 0.0 0.0 0 0 ? S Jan04 0:00 [kworker/2:2]
root 2713 0.0 0.0 0 0 ? S 2015 0:00 [kworker/6:2]
root 2722 0.0 0.0 0 0 ? S 2015 0:12 [kworker/5:2]
root 3678 0.0 0.0 0 0 ? S Jan05 0:01 [kworker/0:0]
root 3716 0.0 0.0 0 0 ? S Jan05 0:00 [kworker/3:0]
root 3941 0.0 0.0 0 0 ? S Jan05 0:00 [kworker/2:0]
root 4732 0.0 0.0 0 0 ? S Jan05 0:00 [kworker/1:2]
root 6896 0.0 0.0 105628 4228 ? Ss 08:00 0:00 sshd: ubuntu [priv]
ubuntu 7008 0.0 0.0 105628 1876 ? S 08:00 0:00 sshd: ubuntu@pts/0
ubuntu 7014 0.0 0.0 21308 3908 pts/0 Ss 08:00 0:00 -bash
root 7234 0.0 0.0 63668 2096 pts/0 S 08:10 0:00 sudo su
root 7235 0.0 0.0 63248 1776 pts/0 S 08:10 0:00 su
root 7236 1.0 0.0 21088 3456 pts/0 S 08:10 0:00 bash
root 7248 0.0 0.0 17164 1320 pts/0 R+ 08:10 0:00 ps aux
root 13299 0.0 0.0 0 0 ? S 2015 0:19 [kworker/3:2]
root 19933 0.0 0.0 0 0 ? S 2015 0:00 [kworker/7:1]
root 20305 0.0 0.0 0 0 ? S 2015 0:00 [kworker/4:2]
root 29814 0.0 0.0 0 0 ? S< Jan04 0:00 [kworker/u31:2]
root 30693 0.0 0.0 0 0 ? S< Jan04 0:00 [kworker/u31:1]
답변1
대규모 시스템에서 메모리 누수를 추적하는 것은 정말 힘들고 매우 실망스러울 수 있습니다. 문제를 파악하기 위해 한 번에 하나의 서비스를 시작하면서 전체 서버를 테스트 환경에 복사해 보았습니다.
각 서비스(사용자 모드 프로세스)를 개별적으로 확인한 후에도 여전히 누출 원인을 찾을 수 없는 경우 커널을 확인해야 합니다. 커널을 다루려면 시간과 경험이 필요하므로 커널 전문가와 상담하는 것이 좋습니다.
또 다른 가능성은 악성 코드의 존재입니다. 맬웨어를 다루는 것은 완전히 다른 문제입니다.
때로는 바로가기가 없는 경우도 있습니다.\