무작위 커널 패닉, 뚜렷한 범인 없음

무작위 커널 패닉, 뚜렷한 범인 없음

얼마 전 오래된 데스크톱을 데비안 서버로 전환했는데 반년 동안 완벽하게 작동했습니다.

그러나 나는 인터넷 연결이 더 나은 곳으로 컴퓨터를 옮기고 하드 드라이브를 추가하여 적절한 스토리지 서버(가령 홈브류 NAS)로 만들기로 결정했습니다.

이제부터 서버가 무작위로 충돌합니다. 때로는 분해하는 데 한 달 이상이 걸립니다. 때로는 하루가 걸릴 때도 있습니다. 최근에는 충돌 빈도가 2~3일 정도입니다.

dmesg를 보면 크래시 원인이 각각 다른 것 같습니다. 충돌 원인이 무엇인지 전혀 모르겠습니다.

설정

  • CPU: 인텔(R) 코어(TM) i5-4670K CPU @ 3.40GHz
  • 마더보드: MSI MS-7821/Z87-G45 게이밍
  • 머신이 Linux 4.9.0-8-amd64에서 Debian Stretch를 실행 중입니다.
  • Kdump가 설치되었습니다
  • Samsung SSD 840 PRO(128GB)에 설치된 시스템
  • 저장용 8TB Western Digital Red HDD 5개
  • HDD는 원래 소프트웨어 RAID5용 mdadm을 사용하여 구성되었지만 이제는 raidz2를 사용하여 ZFS에서 관리됩니다.
  • Apache2(nextcloud 포함) 및 전송 데몬 실행 중

정보

dmesg.201904140557
[230866.137537] PANIC: double fault, error_code: 0x0
[230866.137548] PANIC: double fault, error_code: 0x0
[230866.137550] CPU: 2 PID: 25608 Comm: apache2 Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[230866.137551] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[230866.137551] task: ffff8d7d1eabe0c0 task.stack: ffffa02483d5c000
[230866.137555] RIP: 0010:[<ffffffffad8192fa>]  [<ffffffffad8192fa>] syscall_return_via_sysret+0x3e/0x4d
[230866.137556] RSP: 0018:ffffa02483d5ff50  EFLAGS: 00010002
[230866.137556] RAX: 0000000510035080 RBX: 0000000000000000 RCX: 00007fec9d79eacf
[230866.137557] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[230866.137557] RBP: 0000000000000000 R08: 00007fec6461ee20 R09: 0000000000000000
[230866.137558] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[230866.137558] R13: 0000000000000000 R14: 00007fec6461ee20 R15: 0000000000000000
[230866.137559] FS:  00007fec6461f700(0000) GS:ffff8d7e9fb00000(0000) knlGS:0000000000000000
[230866.137560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[230866.137560] CR2: ffffa02483d5ff48 CR3: 0000000510034000 CR4: 0000000000160670
[230866.137561] Stack:
[230866.137563]  0000000000000000 0000000000000000 00007fec6461ee20 0000000000000000
[230866.137564]  0000000000000000 0000000000000000 0000000000000000 0000000000000293
[230866.137565]  0000000000000000 0000000000000000 00007fec6461ee20 0000000000000000
[230866.137565] Call Trace:
[230866.137580] Code: 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 50 90 0f 20 d8 65 48 0b 04 25 e0 02 01 00 78 08 65 88 04 25 e7 02 01 00 0f 22 d8 <58> 48 8b a4 24 98 00 00 00 0f 01 f8 48 0f 07 50 90 0f 20 d8 65 
[230866.137580] Kernel panic - not syncing: Machine halted.
[230866.137581] CPU: 2 PID: 25608 Comm: apache2 Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[230866.137582] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[230866.137583]  0000000000000000 ffffffffad534524 ffff8d7e9fb07f00 ffff8d7e9fb07f18
[230866.137584]  ffffffffad380ecd ffffffff00000008 ffff8d7e9fb07f28 ffff8d7e9fb07ec0
[230866.137585]  88dd6d6a799c212f 00000000000000c8 0000000000000092 0000000000000000
[230866.137585] Call Trace:
[230866.137589]  <#DF> 
[230866.137589]  [<ffffffffad534524>] ? dump_stack+0x5c/0x78
[230866.137591]  [<ffffffffad380ecd>] ? panic+0xe4/0x23f
[230866.137592]  [<ffffffffad258ac9>] ? df_debug+0x29/0x30
[230866.137594]  [<ffffffffad227b0f>] ? do_double_fault+0x9f/0x130
[230866.137595]  [<ffffffffad81a038>] ? double_fault+0x28/0x30
[230866.137596]  [<ffffffffad8192fa>] ? syscall_return_via_sysret+0x3e/0x4d

dmesg.201904172335
[322137.449206] general protection fault: 0000 [#1] SMP
[322137.464088] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc xt_multiport iptable_filter wireguard(O) ip6_udp_tunnel udp_tunnel overlay nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic zfs(PO) intel_rapl zunicode(PO) x86_pkg_temp_thermal zavl(PO) intel_powerclamp zcommon(PO) znvpair(PO) snd_hda_intel kvm_intel spl(O) kvm i915 snd_hda_codec irqbypass snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul iTCO_wdt ghash_clmulni_intel drm_kms_helper intel_cstate mei_me iTCO_vendor_support snd_timer drm intel_uncore snd
[322137.678356]  soundcore evdev i2c_algo_bit mxm_wmi mei efi_pstore intel_rapl_perf lpc_ich sg shpchp serio_raw mfd_core pcspkr efivars wmi intel_smartconnect video button nfsd auth_rpcgss oid_registry nfs_acl lockd grace nct6775 hwmon_vid coretemp sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic usbhid hid dm_mod sd_mod xhci_pci ahci ehci_pci xhci_hcd ehci_hcd crc32c_intel libahci libata aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper psmouse cryptd scsi_mod i2c_i801 i2c_smbus alx usbcore mdio thermal usb_common fan
[322137.867812] CPU: 2 PID: 2034 Comm: transmission-da Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[322137.898560] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[322137.922267] task: ffff9d0366de8040 task.stack: ffffb6ca48838000
[322137.940254] RIP: 0010:[<ffffffffc0dc49e2>]  [<ffffffffc0dc49e2>] zio_create+0x52/0x470 [zfs]
[322137.965860] RSP: 0018:ffffb6ca4883b970  EFLAGS: 00010282
[322137.982034] RAX: fbff9cff4e756040 RBX: fbff9cff4e756040 RCX: fbff9cff4e756040
[322138.003667] RDX: 0000000000000000 RSI: 0000000002404200 RDI: fbff9cff4e756048
[322138.025297] RBP: ffff9d03710ec680 R08: 000039c6a0245fd0 R09: 0000000000000002
[322138.046929] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb6ca4883bb30
[322138.068560] R13: 0000000000000001 R14: 00000000000f99d1 R15: ffff9cff040b1a10
[322138.090191] FS:  00007fee5e413700(0000) GS:ffff9d039fb00000(0000) knlGS:0000000000000000
[322138.114681] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[322138.132151] CR2: 000056466d3a1060 CR3: 00000005e6e22000 CR4: 0000000000160670
[322138.153783] Stack:
[322138.160066]  0000000000004000 ffff9cfebc544000 ffff9d0373c44000 ffff9d03710ec680
[322138.182681]  ffffffffc0d1eae0 ffff9cff040b1a10 ffff9cfebc544000 0000000000004000
[322138.205299]  ffff9d0373c44000 ffffffffc0dc551c ffffffffc0d1eae0 ffff9d027d98eaa8
[322138.227918] Call Trace:
[322138.235528]  [<ffffffffc0d1eae0>] ? arc_hdr_destroy+0x1e0/0x1e0 [zfs]
[322138.255086]  [<ffffffffc0dc551c>] ? zio_read+0xcc/0xe0 [zfs]
[322138.272293]  [<ffffffffc0d1eae0>] ? arc_hdr_destroy+0x1e0/0x1e0 [zfs]
[322138.291847]  [<ffffffffc0d21eb0>] ? arc_read+0x520/0xa30 [zfs]
[322138.309576]  [<ffffffffc0d28b8e>] ? dbuf_read+0x29e/0x7d0 [zfs]
[322138.327569]  [<ffffffffc0d294f8>] ? __dbuf_hold_impl+0x438/0x4d0 [zfs]
[322138.347379]  [<ffffffffc0d295fb>] ? dbuf_hold_impl+0x6b/0x90 [zfs]
[322138.366147]  [<ffffffffc0d298fb>] ? dbuf_hold+0x2b/0x60 [zfs]
[322138.383622]  [<ffffffffc0d30799>] ? dmu_buf_hold_array_by_dnode+0xf9/0x460 [zfs]
[322138.406034]  [<ffffffffc0d313d0>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
[322138.426487]  [<ffffffffc0d323cd>] ? dmu_read_uio_dbuf+0x3d/0x60 [zfs]
[322138.446691]  [<ffffffffc0db0b97>] ? zfs_read+0x127/0x3b0 [zfs]
[322138.465045]  [<ffffffffc0dcae24>] ? zpl_read_common_iovec+0x84/0xd0 [zfs]
[322138.486274]  [<ffffffffc0dcb8e1>] ? zpl_iter_read+0xa1/0xe0 [zfs]
[322138.505406]  [<ffffffff8ae0aacd>] ? new_sync_read+0xdd/0x130
[322138.523175]  [<ffffffff8ae0b261>] ? vfs_read+0x91/0x130
[322138.539686]  [<ffffffff8ae0c8f0>] ? SyS_pread64+0x90/0xb0
[322138.556649]  [<ffffffff8ac03b7d>] ? do_syscall_64+0x8d/0xf0
[322138.574196]  [<ffffffff8b21924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[322138.595828] Code: 10 31 f6 4c 89 44 24 08 4c 89 0c 24 4c 8b a4 24 88 00 00 00 44 8b ac 24 90 00 00 00 e8 68 02 f4 ff 48 8d 78 08 48 89 c1 48 89 c3 <48> c7 00 00 00 00 00 48 c7 80 30 04 00 00 00 00 00 00 31 c0 48 
[322138.656162] RIP  [<ffffffffc0dc49e2>] zio_create+0x52/0x470 [zfs]
[322138.675286]  RSP <ffffb6ca4883b970>

dmesg.201904260559
[72133.666580] general protection fault: 0000 [#1] SMP
[72133.681200] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter overlay wireguard(O) ip6_udp_tunnel udp_tunnel nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp zfs(PO) zunicode(PO) kvm_intel snd_hda_codec_realtek kvm zavl(PO) snd_hda_codec_generic irqbypass crct10dif_pclmul zcommon(PO) crc32_pclmul snd_hda_intel znvpair(PO) i915 snd_hda_codec spl(O) ghash_clmulni_intel intel_cstate snd_hda_core snd_hwdep snd_pcm intel_uncore iTCO_wdt efi_pstore iTCO_vendor_support drm_kms_helper snd_timer drm
[72133.895207]  mxm_wmi intel_rapl_perf mei_me sg snd serio_raw mei i2c_algo_bit lpc_ich pcspkr soundcore mfd_core evdev efivars shpchp wmi video intel_smartconnect button nct6775 hwmon_vid coretemp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic dm_mod usbhid hid sd_mod ahci libahci ehci_pci xhci_pci xhci_hcd ehci_hcd crc32c_intel libata aesni_intel psmouse aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd i2c_i801 scsi_mod i2c_smbus alx mdio usbcore usb_common fan thermal
[72134.084709] CPU: 3 PID: 4246 Comm: java Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[72134.112335] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[72134.135784] task: ffff8dbb009d7100 task.stack: ffffb42103b38000
[72134.153510] RIP: 0010:[<ffffffffa9eea7a8>]  [<ffffffffa9eea7a8>] hrtimer_active+0x28/0x50
[72134.178049] RSP: 0018:ffffb42103b3be28  EFLAGS: 00010046
[72134.193962] RAX: 0000000000000000 RBX: ffff8dbb00c3c600 RCX: 0000000000000023
[72134.215337] RDX: fffd8dbb1fb94c00 RSI: 0000000000000008 RDI: ffff8dbb00c3c600
[72134.236710] RBP: 0000000000000000 R08: ffffffffaaa3eee0 R09: ffff8dbac7341380
[72134.258082] R10: 0000000000000013 R11: ffff8dbb01041b38 R12: ffff8dbb00c3c600
[72134.279452] R13: ffffb42103b3bec0 R14: 0000000000000000 R15: 0000000000000000
[72134.300824] FS:  00007fd2336ce700(0000) GS:ffff8dbb1fb80000(0000) knlGS:0000000000000000
[72134.325054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[72134.342261] CR2: 00007f36d94688a0 CR3: 00000005f211e000 CR4: 0000000000160670
[72134.363633] Stack:
[72134.369656]  ffffffffa9eeac77 0000000000000000 8a7c0674a85ffec5 ffff8dbb00c3c688
[72134.392008]  ffffb42103b3beb0 ffff8dbb00c3c600 ffffffffaa057b59 00007fd24811c410
[72134.414343]  ffffb42103b3bee0 ffff8dbb01041b00 0000000000000001 8a7c0674a85ffec5
[72134.436702] Call Trace:
[72134.444039]  [<ffffffffa9eeac77>] ? hrtimer_try_to_cancel+0x27/0x110
[72134.463080]  [<ffffffffaa057b59>] ? do_timerfd_settime+0x119/0x430
[72134.481590]  [<ffffffffaa058127>] ? SyS_timerfd_settime+0x57/0xb0
[72134.499837]  [<ffffffffa9e03b7d>] ? do_syscall_64+0x8d/0xf0
[72134.516529]  [<ffffffffaa41924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[72134.537380] Code: 00 00 00 0f 1f 44 00 00 48 8b 57 30 eb 1d 80 7f 38 00 75 32 48 3b 78 08 74 2c 39 50 04 75 e9 48 8b 57 30 48 8b 0a 48 39 c8 74 21 <48> 8b 02 8b 50 04 f6 c2 01 74 d8 f3 90 8b 50 04 f6 c2 01 75 f6 
[72134.596590] RIP  [<ffffffffa9eea7a8>] hrtimer_active+0x28/0x50
[72134.614098]  RSP <ffffb42103b3be28>

dmesg.201904270957
[100366.341655] general protection fault: 0000 [#1] SMP
[100366.356517] Modules linked in: veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter overlay wireguard(O) ip6_udp_tunnel udp_tunnel nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel zfs(PO) zunicode(PO) kvm zavl(PO) irqbypass zcommon(PO) crct10dif_pclmul znvpair(PO) crc32_pclmul spl(O) ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic i915 intel_cstate iTCO_wdt iTCO_vendor_support snd_hda_intel intel_uncore mxm_wmi evdev serio_raw efi_pstore intel_rapl_perf snd_hda_codec pcspkr snd_hda_core
[100366.570669]  snd_hwdep drm_kms_helper mei_me sg snd_pcm lpc_ich snd_timer drm snd mfd_core mei i2c_algo_bit soundcore shpchp intel_smartconnect wmi efivars video button nct6775 hwmon_vid coretemp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic dm_mod usbhid hid sd_mod ahci libahci libata xhci_pci crc32c_intel aesni_intel ehci_pci psmouse aes_x86_64 glue_helper i2c_i801 lrw xhci_hcd ehci_hcd gf128mul i2c_smbus ablk_helper cryptd usbcore alx scsi_mod mdio usb_common fan thermal
[100366.760030] CPU: 3 PID: 28567 Comm: apache2 Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[100366.788960] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[100366.812667] task: ffff8c41b1eb4100 task.stack: ffffac678f30c000
[100366.830659] RIP: 0010:[<ffffffff8549800a>]  [<ffffffff8549800a>] __task_pid_nr_ns+0x3a/0x90
[100366.855979] RSP: 0018:ffffac678f30fcc8  EFLAGS: 00010282
[100366.872152] RAX: 0000000000000508 RBX: ffff8c4292b7ba40 RCX: 0000000000000001
[100366.893787] RDX: ffffffff86045d20 RSI: 0000000000000004 RDI: f7ff8c428aaa95c8
[100366.915418] RBP: ffffac678f30ff30 R08: 0000000000000000 R09: 0000000000000000
[100366.937052] R10: 0000000000000000 R11: 0000000000000000 R12: ffffac678f30fd78
[100366.958683] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
[100366.980317] FS:  00007f29e0c20700(0000) GS:ffff8c445fb80000(0000) knlGS:0000000000000000
[100367.004809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100367.022279] CR2: 00007f773f92a1f8 CR3: 00000002575ee000 CR4: 0000000000160670
[100367.043913] Stack:
[100367.050195]  ffffffff8569cb93 00007f29e0c1fe20 0000000000000000 0000000000000000
[100367.072811]  0000000000000000 ffffffff8608b548 ffff8c400bc4ef80 ffff8c4292b7bb08
[100367.095407]  ffffac678f30fd20 00000000000b0008 0000000000000000 ffffac678f30fd20
[100367.118027] Call Trace:
[100367.125627]  [<ffffffff8569cb93>] ? SYSC_semtimedop+0x3b3/0xc50
[100367.143623]  [<ffffffff8552bd04>] ? __seccomp_filter+0x74/0x270
[100367.161615]  [<ffffffff8542f1f0>] ? recalibrate_cpu_khz+0x10/0x10
[100367.180130]  [<ffffffff854f01dc>] ? ktime_get_ts64+0x4c/0xf0
[100367.197342]  [<ffffffff85620bbf>] ? poll_select_copy_remaining+0xdf/0x150
[100367.217934]  [<ffffffff85403337>] ? syscall_trace_enter+0x117/0x2c0
[100367.236964]  [<ffffffff85403b7d>] ? do_syscall_64+0x8d/0xf0
[100367.253918]  [<ffffffff85a1924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[100367.275029] Code: 00 00 00 74 4e 85 f6 b8 08 05 00 00 74 1a 83 fe 04 74 0e 89 f6 48 8d 04 76 48 8d 04 c5 08 05 00 00 48 8b bf d0 04 00 00 48 01 c7 <48> 8b 0f 48 85 c9 74 20 8b b2 30 08 00 00 31 c0 3b 71 04 77 0d 
[100367.334428] RIP  [<ffffffff8549800a>] __task_pid_nr_ns+0x3a/0x90
[100367.352738]  RSP <ffffac678f30fcc8>

명령 출력

# uname -a
Linux example.com 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
# lsmod
Module                  Size  Used by
ipt_REJECT             16384  6
nf_reject_ipv4         16384  1 ipt_REJECT
veth                   16384  0
xt_nat                 16384  1
xt_tcpudp              16384  3
ipt_MASQUERADE         16384  2
nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
nf_conntrack_netlink    36864  0
nfnetlink              16384  2 nf_conntrack_netlink
xfrm_user              36864  1
xfrm_algo              16384  1 xfrm_user
iptable_nat            16384  1
nf_conntrack_ipv4      16384  2
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_nat_ipv4            16384  1 iptable_nat
xt_addrtype            16384  2
xt_conntrack           16384  1
nf_nat                 24576  3 xt_nat,nf_nat_masquerade_ipv4,nf_nat_ipv4
nf_conntrack          114688  6 nf_conntrack_ipv4,nf_conntrack_netlink,nf_nat_masquerade_ipv4,xt_conntrack,nf_nat_ipv4,nf_nat
br_netfilter           24576  0
bridge                135168  1 br_netfilter
stp                    16384  1 bridge
llc                    16384  2 bridge,stp
xt_multiport           16384  1
iptable_filter         16384  1
wireguard             217088  0
ip6_udp_tunnel         16384  1 wireguard
udp_tunnel             16384  1 wireguard
overlay                49152  1
nls_ascii              16384  1
nls_cp437              20480  1
vfat                   20480  1
fat                    69632  1 vfat
snd_hda_codec_hdmi     49152  1
intel_rapl             20480  0
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
kvm_intel             200704  0
kvm                   598016  1 kvm_intel
zfs                  2707456  8
irqbypass              16384  1 kvm
crct10dif_pclmul       16384  0
zunicode              331776  1 zfs
crc32_pclmul           16384  0
zavl                   16384  1 zfs
ghash_clmulni_intel    16384  0
zcommon                53248  1 zfs
intel_cstate           16384  0
znvpair                90112  2 zcommon,zfs
snd_hda_codec_realtek    90112  1
snd_hda_codec_generic    69632  1 snd_hda_codec_realtek
snd_hda_intel          36864  0
i915                 1257472  2
snd_hda_codec         135168  4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
drm_kms_helper        155648  1 i915
intel_uncore          118784  0
spl                    98304  3 znvpair,zcommon,zfs
snd_hda_core           90112  5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
iTCO_wdt               16384  0
mei_me                 36864  0
efi_pstore             16384  0
snd_hwdep              16384  1 snd_hda_codec
mxm_wmi                16384  0
iTCO_vendor_support    16384  1 iTCO_wdt
evdev                  24576  2
drm                   360448  3 i915,drm_kms_helper
snd_pcm               110592  4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi
snd_timer              32768  1 snd_pcm
intel_rapl_perf        16384  0
efivars                20480  1 efi_pstore
serio_raw              16384  0
lpc_ich                24576  0
sg                     32768  0
snd                    86016  8 snd_hda_intel,snd_hwdep,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek,snd_pcm
pcspkr                 16384  0
mei                   102400  1 mei_me
i2c_algo_bit           16384  1 i915
soundcore              16384  1 snd
mfd_core               16384  1 lpc_ich
shpchp                 36864  0
wmi                    16384  1 mxm_wmi
intel_smartconnect     16384  0
video                  40960  1 i915
button                 16384  1 i915
nfsd                  331776  13
auth_rpcgss            61440  1 nfsd
oid_registry           16384  1 auth_rpcgss
nfs_acl                16384  1 nfsd
lockd                  90112  1 nfsd
grace                  16384  2 nfsd,lockd
sunrpc                344064  18 auth_rpcgss,nfsd,nfs_acl,lockd
nct6775                57344  0
hwmon_vid              16384  1 nct6775
coretemp               16384  0
efivarfs               16384  1
ip_tables              24576  2 iptable_filter,iptable_nat
x_tables               36864  9 xt_multiport,ipt_REJECT,xt_nat,ip_tables,iptable_filter,xt_tcpudp,ipt_MASQUERADE,xt_addrtype,xt_conntrack
autofs4                40960  3
ext4                  585728  2
crc16                  16384  1 ext4
jbd2                  106496  1 ext4
fscrypto               28672  1 ext4
ecb                    16384  0
mbcache                16384  3 ext4
raid10                 49152  0
raid456               106496  0
async_raid6_recov      20480  1 raid456
async_memcpy           16384  2 raid456,async_raid6_recov
async_pq               16384  2 raid456,async_raid6_recov
async_xor              16384  3 async_pq,raid456,async_raid6_recov
async_tx               16384  5 async_xor,async_pq,raid456,async_memcpy,async_raid6_recov
xor                    24576  1 async_xor
raid6_pq              110592  3 async_pq,raid456,async_raid6_recov
libcrc32c              16384  1 raid456
crc32c_generic         16384  0
raid1                  36864  0
raid0                  20480  0
multipath              16384  0
linear                 16384  0
md_mod                135168  6 raid1,raid10,multipath,linear,raid0,raid456
hid_generic            16384  0
usbhid                 53248  0
hid                   122880  2 hid_generic,usbhid
dm_mod                118784  6
sd_mod                 49152  14
ehci_pci               16384  0
xhci_pci               16384  0
xhci_hcd              188416  1 xhci_pci
ahci                   40960  8
ehci_hcd               81920  1 ehci_pci
crc32c_intel           24576  5
libahci                32768  1 ahci
aesni_intel           167936  1
aes_x86_64             20480  1 aesni_intel
libata                249856  2 ahci,libahci
glue_helper            16384  1 aesni_intel
lrw                    16384  1 aesni_intel
usbcore               253952  6 usbhid,ehci_hcd,xhci_pci,xhci_hcd,ehci_pci
gf128mul               16384  1 lrw
ablk_helper            16384  1 aesni_intel
i2c_i801               24576  0
cryptd                 24576  3 ablk_helper,ghash_clmulni_intel,aesni_intel
psmouse               135168  0
i2c_smbus              16384  1 i2c_i801
alx                    45056  0
scsi_mod              225280  3 sd_mod,libata,sg
mdio                   16384  1 alx
usb_common             16384  1 usbcore
fan                    16384  0
thermal                20480  0

고쳐 쓰다

RAM 모듈을 다시 설치하기 전과 후에 memtest86(memtest86.com의 원본 버전)을 실행했습니다. 메모리 테스트 로그

오류가 발견되지 않았습니다.

고쳐 쓰다

RAM 모듈을 다시 설치해도 아무런 효과가 없습니다. 그래서 나는 새로운 가설을 탐구했습니다.

전기적 간섭이 있는지 확인했지만 충돌 시간과 모터 사용량 사이에는 상관 관계가 없었습니다.

또한 디스크 액세스와 충돌 간의 상관관계도 확인했습니다. 디스크 활동이 적은 경우에도 충돌이 발생할 수 있지만 일부 디스크 활동에서는 충돌이 훨씬 빠르게 발생합니다. 예를 들어, 모든 디스크를 병렬로 읽는다면( cat /dev/sdX > /dev/null), 한 시간 안에 머신이 충돌할 수 있습니다. 그러나 SMART 데이터에서는 아무런 문제가 없는 것으로 나타났습니다. 출력은 다음과 같습니다 smartctl -a /dev/sdb(다른 디스크도 동일하게 보임).

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       112
  3 Spin_Up_Time            0x0007   160   160   024    Pre-fail  Always       -       401 (Average 420)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       40
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   140   140   020    Pre-fail  Offline      -       15
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       7274
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       35
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       260
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       260
194 Temperature_Celsius     0x0002   224   224   000    Old_age   Always       -       29 (Min/Max 10/46)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

그래서 충돌은 어떻게든 디스크와 관련이 있지만 어떻게 되는지는 모르겠습니다.

답변1

로그를 확인하고,커널이 오염되었습니다., 또는 지원되지 않는 상태에서 실행 중:

Tainted: P IO

오염 플래그 목록은 다음에서 확인할 수 있습니다.커널 문서. P 및 O 섹션은 GPL과 호환되지 않는 라이센스를 받고 외부에서 구축된 커널 모듈을 나타내며, 특히 ZFS 및 관련 모듈이 여기에 나열되어 있습니다. 제공한 로그 조각 중 하나는 ZFS 모듈에서 일반 보호 오류가 발생했음을 나타내지만 나머지는 커널의 다른 곳에 있습니다. 또한 GPF 및 이중 오류는 프로세서 자체에서 생성되므로 모듈에 오류가 없을 수도 있습니다.

내가 더 걱정하는 것은 I taint 플래그입니다. I 플래그는 "애플리케이션 플랫폼 펌웨어의 버그 해결"을 의미합니다. 이는 오류를 일으킬 수 있는 시스템의 UEFI/BIOS 펌웨어에 잠재적으로 심각한 문제가 있음을 나타냅니다. 이 작업을 시작하기 전에 BIOS 업데이트를 수행했습니까? 하드웨어 업그레이드를 수행하기 전에 이 플래그를 설정했습니까?

안타깝게도 전체 로그에 대한 링크가 더 이상 작동하지 않아 더 구체적인 도움을 드릴 수 없습니다. 전체 로그는 시스템이 해결 중인 펌웨어 오류에 대한 세부 정보와 기타 가능한 오류 표시기를 제공할 수 있습니다.

관련 정보