Ansible이 oom-killer를 트리거합니다.

Ansible이 oom-killer를 트리거합니다.

ArchLinux
uname -a를 실행합니다:

Linux localhost 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux

16GB 메모리 14GB 스왑 공간

대규모 Ansible 작업을 실행하면 oom-killer가 실행됩니다. 이런 유형의 작업을 실행하려면 16GB이면 충분하다고 생각하지만 저는 oom 로그 전문가(또는 Linux 메모리 전문가)는 아닙니다. 로그는 다음과 같습니다.

Feb 14 11:35:36 localhost kernel: Out of memory: Kill process 22698 (systemd-coredum) score 503 or sacrifice child
Feb 14 11:35:36 localhost kernel: Killed process 22698 (systemd-coredum) total-vm:880316kB, anon-rss:37604kB, file-rss:67380kB, shmem-rss:0kB
Feb 14 11:42:52 localhost kernel: ansible invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
Feb 14 11:42:52 localhost kernel: ansible cpuset=/ mems_allowed=0
Feb 14 11:42:52 localhost kernel: CPU: 0 PID: 27123 Comm: ansible Not tainted 4.7.2-1-ARCH #1
Feb 14 11:42:52 localhost kernel: Hardware name: Dell Inc. OptiPlex 7020/08WKV3, BIOS A02 11/20/2014
Feb 14 11:42:52 localhost kernel:  0000000000000286 00000000a544d0e1 ffff8803b3147b48 ffffffff812eb132
Feb 14 11:42:52 localhost kernel:  ffff8803b3147d28 ffff88024193f000 ffff8803b3147bb8 ffffffff811f6e5c
Feb 14 11:42:52 localhost kernel:  ffff8803b3148000 0000000000000000 ffffffff81b28920 ffffffff811789c0
Feb 14 11:42:52 localhost kernel: Call Trace:
Feb 14 11:42:52 localhost kernel:  [<ffffffff812eb132>] dump_stack+0x63/0x81
Feb 14 11:42:52 localhost kernel:  [<ffffffff811f6e5c>] dump_header+0x60/0x1e8
Feb 14 11:42:52 localhost kernel:  [<ffffffff811789c0>] ? page_alloc_cpu_notify+0x50/0x50
Feb 14 11:42:52 localhost kernel:  [<ffffffff811762fa>] oom_kill_process+0x22a/0x440
Feb 14 11:42:52 localhost kernel:  [<ffffffff8117696a>] out_of_memory+0x40a/0x4b0
Feb 14 11:42:52 localhost kernel:  [<ffffffff812ffe08>] ? find_next_bit+0x18/0x20
Feb 14 11:42:52 localhost kernel:  [<ffffffff8117c05b>] __alloc_pages_nodemask+0xf0b/0xf30
Feb 14 11:42:52 localhost kernel:  [<ffffffff8117c3d4>] alloc_kmem_pages_node+0x54/0xd0
Feb 14 11:42:52 localhost kernel:  [<ffffffff81077c06>] copy_process.part.8+0x136/0x19a0
Feb 14 11:42:52 localhost kernel:  [<ffffffff811a974a>] ? handle_mm_fault+0xa7a/0x1f60
Feb 14 11:42:52 localhost kernel:  [<ffffffff81079647>] _do_fork+0xd7/0x3d0
Feb 14 11:42:52 localhost kernel:  [<ffffffff810655f5>] ? __do_page_fault+0x1f5/0x510
Feb 14 11:42:52 localhost kernel:  [<ffffffff810799e9>] SyS_clone+0x19/0x20
Feb 14 11:42:52 localhost kernel:  [<ffffffff81003c07>] do_syscall_64+0x57/0xb0
Feb 14 11:42:52 localhost kernel:  [<ffffffff815de861>] entry_SYSCALL64_slow_path+0x25/0x25
Feb 14 11:42:52 localhost kernel: Mem-Info:
Feb 14 11:42:52 localhost kernel: active_anon:548787 inactive_anon:232682 isolated_anon:0
                                   active_file:28394 inactive_file:24931 isolated_file:8
                                   unevictable:0 dirty:1 writeback:0 unstable:0
                                   slab_reclaimable:1897009 slab_unreclaimable:19547
                                   mapped:51240 shmem:28342 pagetables:20339 bounce:0
                                   free:1284106 free_pcp:446 free_cma:0
Feb 14 11:42:52 localhost kernel: Node 0 DMA free:15628kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900k
Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 3468 15978 15978
Feb 14 11:42:52 localhost kernel: Node 0 DMA32 free:1221320kB min:14632kB low:18288kB high:21944kB active_anon:274224kB inactive_anon:273556kB active_file:40556kB inactive_file:36556kB unevictable:0kB isolated(anon):0kB isolated(file):32k
Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 0 12510 12510
Feb 14 11:42:52 localhost kernel: Node 0 Normal free:3899476kB min:52884kB low:66104kB high:79324kB active_anon:1920924kB inactive_anon:657172kB active_file:73020kB inactive_file:63168kB unevictable:0kB isolated(anon):0kB isolated(file):0
Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 0 0 0
Feb 14 11:42:52 localhost kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (ME) = 15628kB
Feb 14 11:42:52 localhost kernel: Node 0 DMA32: 166992*4kB (UME) 68889*8kB (UE) 7*16kB (H) 11*32kB (H) 11*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1220760kB
Feb 14 11:42:52 localhost kernel: Node 0 Normal: 721354*4kB (UME) 126667*8kB (UEH) 16*16kB (H) 2*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3899072kB
Feb 14 11:42:52 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Feb 14 11:42:52 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb 14 11:42:52 localhost kernel: 125644 total pagecache pages
Feb 14 11:42:52 localhost kernel: 43931 pages in swap cache
Feb 14 11:42:52 localhost kernel: Swap cache stats: add 2753281, delete 2709350, find 730647/1154037
Feb 14 11:42:52 localhost kernel: Free swap  = 12677364kB
Feb 14 11:42:52 localhost kernel: Total swap = 14124028kB
Feb 14 11:42:52 localhost kernel: 4179504 pages RAM
Feb 14 11:42:52 localhost kernel: 0 pages HighMem/MovableOnly
Feb 14 11:42:52 localhost kernel: 84923 pages reserved
Feb 14 11:42:52 localhost kernel: 0 pages hwpoisoned
(...)
Feb 14 11:42:52 localhost kernel: Out of memory: Kill process 27876 (firefox) score 41 or sacrifice child
Feb 14 11:42:52 localhost kernel: Killed process 27876 (firefox) total-vm:4003016kB, anon-rss:1091960kB, file-rss:41516kB, shmem-rss:80216kB

다음은 약간의 도움이 되는 몇 가지 sysctl 값을 사용했지만 더 큰 작업에서는 여전히 발생합니다.

vm.overcommit_memory = 2
vm.overcommit_ratio = 100

내 Ansible 작업 중 일부가 시스템의 메모리 + 스왑 공간을 모두 사용하고 있다는 것이 사실입니까?

답변1

Ansible은 확실히 그렇게 많은 메모리를 사용해서는 안 됩니다. 당신이 하고 있는 일에 대해 좀 더 자세히 설명해주실 수 있나요? (몇 개가 있고, 무엇을 하고 있는지, 사용된 모듈, 예제 등) 거기에서 Firefox가 종료되는 것을 봤습니다. Firefox로 많은 일을 시작하셨나요?

관련 정보