자주 발생하는 "CPU 하드 잠금"을 이해하고 디버그합니다.

자주 발생하는 "CPU 하드 잠금"을 이해하고 디버그합니다.

내 Ubuntu 상자가 자주(하루에 여러 번) 중단되어 다음과 같은 메시지(때로는 잘림 syslog) 가 안팎으로 남습니다 kern.log.

Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843824] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843826] Modules linked in: nls_utf8 btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) snd_hda_codec_hdmi nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 lrw gf128mul glue_helper input_leds ablk_helper cryptd serio_raw snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep sb_edac edac_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq lpc_ich snd_seq_device snd_timer snd mei_me mei soundcore shpchp 8250_fintek mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid nouveau mxm_wmi video i2c_algo_bit ttm drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e drm ahci libahci ptp nvme pps_core fjes wmi
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843881] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G           OE   4.4.0-34-generic #53-Ubuntu
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843883] Hardware name: ASUS All Series/X99-A/USB 3.1, BIOS 3005 04/11/2016
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843884] task: ffff8807fb493700 ti: ffff8807fb4a8000 task.ti: ffff8807fb4a8000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843885] RIP: 0010:[<ffffffff816c3f61>]  [<ffffffff816c3f61>] cpuidle_enter_state+0x111/0x2b0
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843890] RSP: 0018:ffff8807fb4abe70  EFLAGS: 00000246
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843891] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000018
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843892] RDX: 00195eb06e5732b1 RSI: 0000000000500101 RDI: 0000000000000000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843892] RBP: ffff8807fb4abea8 R08: 000000000032b396 R09: 0000000000000018
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843893] R10: ffff8807fb4abe20 R11: 000000000000bf7e R12: 0000000000000004
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843894] R13: ffffe8ffffd40a00 R14: 0000032c4dc034f3 R15: ffffffff81eb1f38
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843895] FS:  0000000000000000(0000) GS:ffff8807ff540000(0000) knlGS:0000000000000000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843895] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843896] CR2: 00001496c022a008 CR3: 0000000002e0a000 CR4: 00000000003426e0
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843898] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843898] Stack:
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843899]  00000000ff553b00 0000032c4e87d60d ffffffff81f36140 ffff8807fb4ac000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843900]  ffffe8ffffd40a00 ffffffff81eb1da0 ffff8807fb4a8000 ffff8807fb4abeb8
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843901]  ffffffff816c4137 ffff8807fb4abed0 ffffffff810c3fe2 ffffffff816c4113
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843903] Call Trace:
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843905]  [<ffffffff816c4137>] cpuidle_enter+0x17/0x20
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843908]  [<ffffffff810c3fe2>] call_cpuidle+0x32/0x60
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843910]  [<ffffffff816c4113>] ? cpuidle_select+0x13/0x20
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843911]  [<ffffffff810c42a0>] cpu_startup_entry+0x290/0x350
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843914]  [<ffffffff810516e4>] start_secondary+0x154/0x190
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843915] Code: 48 41 89 c4 e8 01 1a a3 ff 48 89 45 d0 0f 1f 44 00 00 31 ff e8 41 ff 9f ff 8b 45 cc 85 c0 0f 85 31 01 00 00 fb 66 0f 1f 44 00 00 <48> 8b 5d d0 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f3 48 89 d8 48 
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682295] INFO: rcu_sched detected stalls on CPUs/tasks:
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682301]  13-...: (1 GPs behind) idle=54b/1/0 softirq=125133/125133 fqs=13613 
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682302]  (detected by 9, t=15002 jiffies, g=108062, c=108061, q=1917)
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682304] Task dump for CPU 13:
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682305] swapper/13      R  running task        0     0      1 0x00000008
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682307]  ffff8807fb4abe70 0000000000000018 00000000ff553b00 0000032c4e87d60d
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682308]  ffffffff81f36140 ffff8807fb4ac000 ffffe8ffffd40a00 ffffffff81eb1da0
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682309]  ffff8807fb4a8000 ffff8807fb4abeb8 ffffffff816c4137 ffff8807fb4abed0
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682311] Call Trace:
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682317]  [<ffffffff816c4137>] ? cpuidle_enter+0x17/0x20
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682320]  [<ffffffff810c3fe2>] ? call_cpuidle+0x32/0x60
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682322]  [<ffffffff816c4113>] ? cpuidle_select+0x13/0x20
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682323]  [<ffffffff810c42a0>] ? cpu_startup_entry+0x290/0x350
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682326]  [<ffffffff810516e4>] ? start_secondary+0x154/0x190

인터넷에는 드라이버 설치, 드라이버 제거, 커널 설정 변경, BIOS 설정 변경 및 기타 여러 가지 마법을 포함하여 이러한 문제를 해결하는 방법에 대한 조언이 가득합니다. 주어진 상황에서 특정 치료법을 선택하는 방법에 대한 설명을 본 적이 없습니다.

이와 같은 "하드 잠금" 디버깅을 어떻게 시작해야 합니까? 메시지 출력은 무엇을 의미합니까? 수리를 시작하려면 여기에 포함된 정보에 따라 어떻게 조치를 취해야 합니까?

관련 정보