NVIDIA 드라이버로 인한 Opensuse Tumbleweed 시스템 하드 잠금

NVIDIA 드라이버로 인한 Opensuse Tumbleweed 시스템 하드 잠금

최근 업데이트 후, 내 노트북은 부팅 후 몇 시간 내에 무작위로 충돌이 발생하기 시작했습니다. 충돌이 발생하면 마지막 이미지가 모니터에 남아 있지만 컴퓨터가 완전히 응답하지 않게 됩니다(번호 잠금 표시등이 업데이트되지 않음). 저는 Opensuse Tumbleweed, 커널 5.12.2-1, Nvidia 드라이버 460.73.01, Quadro M620 모바일 GPU 및 i7-7700HQ가 포함된 Thinkpad P71을 사용하고 있습니다.

관련이 없는 질문에 대해 네트워크 인터페이스는 몇 초마다 지속적으로 오르락내리락합니다. 충돌이 발생하는 시작에서는 네트워크 인터페이스 예외와 관련된 항목을 제외하고 충돌이 발생하기 몇 분 동안 jourenctl 항목이 없습니다. NetworkManager에서 관리하는 공식 도킹 스테이션을 통해 내부 이더넷 카드를 사용합니다. 다음은 충돌 전의 Journalctl 예입니다. 몇 시간 동안 로그에서 동일한 내용이 반복되는 것을 확인하세요.

May 12 07:10:31 thiccboii nscd[1153]: 1153 checking for monitored file `/etc/services': No such file or directory
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1084] device (enp0s31f6): carrier: link connected
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1086] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1094] policy: auto-activating connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1099] device (enp0s31f6): Activation: starting connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1101] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 12 07:10:32 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.2154] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.2221] device (enp0s31f6): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.2225] dhcp4 (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.2238] dhcp4 (enp0s31f6): dhclient started with pid 27289
May 12 07:10:38 thiccboii NetworkManager[1332]: <info>  [1620828638.2172] device (enp0s31f6): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:38 thiccboii NetworkManager[1332]: <info>  [1620828638.2497] dhcp4 (enp0s31f6): canceled DHCP transaction, DHCP client pid 27289
May 12 07:10:38 thiccboii NetworkManager[1332]: <info>  [1620828638.2497] dhcp4 (enp0s31f6): state changed unknown -> terminated
May 12 07:10:39 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 12 07:10:39 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 12 07:10:43 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8755] device (enp0s31f6): carrier: link connected
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8758] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8765] policy: auto-activating connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8770] device (enp0s31f6): Activation: starting connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8771] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.9856] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.9943] device (enp0s31f6): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.9946] dhcp4 (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.9959] dhcp4 (enp0s31f6): dhclient started with pid 27301
May 12 07:10:49 thiccboii NetworkManager[1332]: <info>  [1620828649.9870] device (enp0s31f6): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:50 thiccboii NetworkManager[1332]: <info>  [1620828650.0196] dhcp4 (enp0s31f6): canceled DHCP transaction, DHCP client pid 27301
May 12 07:10:50 thiccboii NetworkManager[1332]: <info>  [1620828650.0196] dhcp4 (enp0s31f6): state changed unknown -> terminated
May 12 07:10:50 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 12 07:10:50 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 12 07:10:52 thiccboii nscd[1153]: 1153 checking for monitored file `/etc/services': No such file or directory
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5960] device (enp0s31f6): carrier: link connected
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5964] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5971] policy: auto-activating connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5976] device (enp0s31f6): Activation: starting connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5978] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 12 07:10:55 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.7096] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.7164] device (enp0s31f6): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.7168] dhcp4 (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.7181] dhcp4 (enp0s31f6): dhclient started with pid 27309
May 12 07:11:01 thiccboii NetworkManager[1332]: <info>  [1620828661.7110] device (enp0s31f6): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:11:01 thiccboii NetworkManager[1332]: <info>  [1620828661.7434] dhcp4 (enp0s31f6): canceled DHCP transaction, DHCP client pid 27309
May 12 07:11:01 thiccboii NetworkManager[1332]: <info>  [1620828661.7435] dhcp4 (enp0s31f6): state changed unknown -> terminated
May 12 07:11:03 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 12 07:11:03 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down

또한, 정상적인 종료 중에 Journalctl -k는 잠재적으로 흥미로운 경고를 표시합니다.

May 13 18:07:45 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 13 18:07:49 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 13 18:07:57 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 13 18:07:57 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 13 18:08:01 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 13 18:08:09 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 13 18:08:09 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 13 18:08:13 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 13 18:08:19 thiccboii kernel: ------------[ cut here ]------------
May 13 18:08:19 thiccboii kernel: WARNING: CPU: 6 PID: 16754 at /usr/src/kernel-modules/nvidia-460.73.01-default/nvidia-drm/nvidia-drm-drv.c:531 nv_drm_master_set+0x22/0x30 [nvidia_drm]
May 13 18:08:19 thiccboii kernel: Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast ccm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct af_packet nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security nvidia_drm(POE) nvidia_modeset(POE) ip_set nfnetlink ebtable_filter ebtables nvidia_uvm(POE) ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter nvidia(POE) cmac algif_hash algif_skcipher af_alg bnep dmi_sysfs uas snd_usb_audio snd_usbmidi_lib usb_storage snd_rawmidi snd_seq_device btusb btrtl btbcm btintel bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev ecdh_generic mc ecc
May 13 18:08:19 thiccboii kernel:  snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common iwlmvm snd_hda_codec_realtek mac80211 snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi libarc4 snd_hda_codec ee1004 iTCO_wdt intel_pmc_bxt iTCO_vendor_support mei_hdcp snd_hda_core iwlwifi x86_pkg_temp_thermal snd_hwdep intel_powerclamp coretemp thinkpad_acpi pcspkr cfg80211 joydev platform_profile efi_pstore snd_pcm wmi_bmof intel_wmi_thunderbolt i2c_i801 mei_me intel_lpss_pci ledtrig_audio intel_lpss rfkill snd_timer i2c_smbus mei idma64 intel_pch_thermal thermal snd soundcore ac tiny_power_button acpi_pad nls_iso8859_1 nls_cp437 vfat fat fuse binfmt_misc configfs hid_generic usbhid i915 kvm_intel kvm rtsx_pci_sdmmc crct10dif_pclmul crc32_pclmul mmc_core ghash_clmulni_intel aesni_intel i2c_algo_bit e1000e(OE) drm_kms_helper crypto_simd cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops xhci_pci cec xhci_pci_renesas xhci_hcd rc_core rtsx_pci drm nvme serio_raw usbcore nvme_core wmi battery
May 13 18:08:19 thiccboii kernel:  i2c_hid_acpi i2c_hid video pinctrl_sunrisepoint button vfio_mdev mdev vhost_net tun tap vhost vhost_iotlb vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr bbswitch(O) efivarfs
May 13 18:08:19 thiccboii kernel: CPU: 6 PID: 16754 Comm: plymouthd Tainted: P     U     OE     5.12.0-2-default #1 openSUSE Tumbleweed
May 13 18:08:19 thiccboii kernel: Hardware name: LENOVO 20HK0013US/20HK0013US, BIOS N1TET56W (1.30 ) 02/10/2020
May 13 18:08:19 thiccboii kernel: RIP: 0010:nv_drm_master_set+0x22/0x30 [nvidia_drm]
May 13 18:08:19 thiccboii kernel: Code: f4 2c 44 d7 0f 1f 40 00 0f 1f 44 00 00 48 8b 47 38 48 8b 78 20 48 8b 05 9c 5c 00 00 48 8b 40 28 e8 d3 9f 7e d7 84 c0 74 01 c3 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d 7c
May 13 18:08:19 thiccboii kernel: RSP: 0018:ffffb41680933bd0 EFLAGS: 00010246
May 13 18:08:19 thiccboii kernel: RAX: 0000000000000000 RBX: ffff999cd278d000 RCX: 0000000000000008
May 13 18:08:19 thiccboii kernel: RDX: ffffffffc37a7e58 RSI: 0000000000000292 RDI: ffffffffc37a7e20
May 13 18:08:19 thiccboii kernel: RBP: ffff999f567c19c0 R08: 0000000000000008 R09: ffffb41680933bb8
May 13 18:08:19 thiccboii kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff999c0cb25800
May 13 18:08:19 thiccboii kernel: R13: 0000000000000000 R14: ffff999c0cb25800 R15: 000000001370a9a8
May 13 18:08:19 thiccboii kernel: FS:  00007fd15d540740(0000) GS:ffff99a577580000(0000) knlGS:0000000000000000
May 13 18:08:19 thiccboii kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 13 18:08:19 thiccboii kernel: CR2: 00007fd15d8fa000 CR3: 000000017f3d2001 CR4: 00000000003706e0
May 13 18:08:19 thiccboii kernel: Call Trace:
May 13 18:08:19 thiccboii kernel:  drm_new_set_master+0x7a/0x100 [drm]
May 13 18:08:19 thiccboii kernel:  drm_master_open+0x68/0x90 [drm]
May 13 18:08:19 thiccboii kernel:  drm_open+0xf5/0x240 [drm]
May 13 18:08:19 thiccboii kernel:  drm_stub_open+0xab/0x130 [drm]
May 13 18:08:19 thiccboii kernel:  chrdev_open+0xed/0x210
May 13 18:08:19 thiccboii kernel:  ? cdev_device_add+0x90/0x90
May 13 18:08:19 thiccboii kernel:  do_dentry_open+0x14e/0x380
May 13 18:08:19 thiccboii kernel:  path_openat+0xaf6/0x10a0
May 13 18:08:19 thiccboii kernel:  ? release_pages+0x153/0x4a0
May 13 18:08:19 thiccboii kernel:  ? flush_tlb_func_common.constprop.0+0x93/0x1e0
May 13 18:08:19 thiccboii kernel:  ? free_unref_page+0x99/0xb0
May 13 18:08:19 thiccboii kernel:  do_filp_open+0x99/0x140
May 13 18:08:19 thiccboii kernel:  ? __check_object_size+0x136/0x150
May 13 18:08:19 thiccboii kernel:  do_sys_openat2+0x97/0x150
May 13 18:08:19 thiccboii kernel:  __x64_sys_openat+0x54/0x90
May 13 18:08:19 thiccboii kernel:  do_syscall_64+0x33/0x80
May 13 18:08:19 thiccboii kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
May 13 18:08:19 thiccboii kernel: RIP: 0033:0x7fd15d7cbffb
May 13 18:08:19 thiccboii kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
May 13 18:08:19 thiccboii kernel: RSP: 002b:00007ffd4e3fa1b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
May 13 18:08:19 thiccboii kernel: RAX: ffffffffffffffda RBX: 00007fd15d5406c8 RCX: 00007fd15d7cbffb
May 13 18:08:19 thiccboii kernel: RDX: 0000000000000002 RSI: 000056549ac3d730 RDI: 00000000ffffff9c
May 13 18:08:19 thiccboii kernel: RBP: 000056549ac3d730 R08: 000056549ac3c930 R09: 00007fd15d89ea60
May 13 18:08:19 thiccboii kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
May 13 18:08:19 thiccboii kernel: R13: 00007fd15d8c5da8 R14: 0000000000000000 R15: 000056549ac3d080
May 13 18:08:19 thiccboii kernel: ---[ end trace 24fb17530164c622 ]---
May 13 18:08:19 thiccboii kernel: usb 1-4.3.1: reset high-speed USB device number 13 using xhci_hcd
May 13 18:08:21 thiccboii kernel: wlp4s0: deauthenticating from 3c:37:86:14:73:fa by local choice (Reason: 3=DEAUTH_LEAVING)
May 13 18:08:21 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 13 18:08:21 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 13 18:08:23 thiccboii kernel: kauditd_printk_skb: 44 callbacks suppressed
May 13 18:08:23 thiccboii kernel: audit: type=1305 audit(1620954503.216:16943): op=set audit_pid=0 old=1577 auid=4294967295 ses=4294967295 subj==unconfined res=1
May 13 18:08:23 thiccboii kernel: audit: type=1131 audit(1620954503.216:16944): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 13 18:08:23 thiccboii kernel: audit: type=1131 audit(1620954503.216:16945): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

다른 곳에 물어보면 알려주세요. RAM 테스트를 해봤는데 이 컴퓨터를 1년 넘게 사용해왔기 때문에 이것이 알려진 cstate 문제가 있는 CPU 중 하나라고 생각하지 않습니다. 충돌이 발생하는 데 2시간에서 20시간이 걸리기 때문에 문제 해결은 고통스러웠지만, 시도해야 할 다른 것이 있는지 알고 있었습니다.

답변1

지난주에도 비슷한 일을 본 적이 있습니다. 이는 kernel-default-5.12.13-1 및 Nvidia 드라이버 460.84에서 발생할 수 있지만 설치 직후에는 발생하지 않으므로 다른 업데이트(플라즈마, 크롬 등)와 관련이 있을 수 있습니다. 커널 기본 5.13.0-1.1에서 계속 발생합니다. 꽤 오랫동안 안정적으로 실행되어 온 데스크탑에서 이런 일이 세 번 발생했습니다.

Chrome은 몇 년 전에 비슷한 일을 겪었습니다. Google chrome-beta 92.0.4515.80-1에서 GPU 가속을 위한 고급 옵션을 껐습니다. 지금까지 나는 또 다른 봉쇄를 본 적이 없습니다. 하지만 지금은 커널 기본값 5.13.0-1.2와 크롬 베타 92.0.4515.93-1도 사용하고 있으므로 상황이 바뀔 수 있습니다.

나는 보통 nvidia 포럼에서 이 질문을 합니다(과거에는 nvidia 지원 직원이 매우 도움이 되었다고 들었습니다). 하지만 로그나 /var/log/Xorg.0.log에서 패턴이나 흥미로운 내용을 볼 때까지 이 작업을 주저합니다. 최근 충돌로 인한 /var/log/Xorg.0.log가 있다면 여기에 단서가 있을 수 있습니다.

관련 정보