drivers/net/ethernet/intel/e1000e/netdev.c:3804의 커널 버그! [복사]

drivers/net/ethernet/intel/e1000e/netdev.c:3804의 커널 버그! [복사]

피크 시간 동안 트래픽이 높아지면 서버가 거의 매일 충돌하기 시작하고 syslog는 항상 여러 번의 eth0 재설정으로 스팸 처리됩니다. 그런 다음 네트워크가 완전히 충돌하고 컴퓨터에 다시 원격 액세스하려면 컴퓨터를 재부팅해야 합니다.

이 오류는 NIC 카드가 손상되었음을 의미합니까, 아니면 단지 소프트웨어 문제입니까?

실행 커널: 4.19.0-10-amd64 운영 체제: Debian 10

Jan 25 18:00:41 Debian-83-jessie-64-minimal kernel: [161879.702795] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:00:45 Debian-83-jessie-64-minimal kernel: [161883.545928] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:04:41 Debian-83-jessie-64-minimal kernel: [162119.835193] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:04:45 Debian-83-jessie-64-minimal kernel: [162123.214074] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:05:50 Debian-83-jessie-64-minimal kernel: [162188.695254] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:05:54 Debian-83-jessie-64-minimal kernel: [162192.610229] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:06:14 Debian-83-jessie-64-minimal kernel: [162212.759251] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:06:18 Debian-83-jessie-64-minimal kernel: [162216.990139] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:07:27 Debian-83-jessie-64-minimal kernel: [162285.975361] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:07:31 Debian-83-jessie-64-minimal kernel: [162289.814340] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:07:47 Debian-83-jessie-64-minimal kernel: [162305.687558] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:07:51 Debian-83-jessie-64-minimal kernel: [162309.506389] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:07:59 Debian-83-jessie-64-minimal systemd[1]: session-247.scope: Succeeded.
    Jan 25 18:08:48 Debian-83-jessie-64-minimal kernel: [162366.871583] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:08:52 Debian-83-jessie-64-minimal kernel: [162370.734613] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:09:01 Debian-83-jessie-64-minimal CRON[27975]: (root) CMD (  [ -x /usr/lib/php5/sessionclean ] && /usr/lib/php5/sessionclean)
    Jan 25 18:09:01 Debian-83-jessie-64-minimal CRON[27974]: (root) CMD (  [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
    Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: Starting Clean php session files...
    Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: phpsessionclean.service: Succeeded.
    Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: Started Clean php session files.
    Jan 25 18:09:42 Debian-83-jessie-64-minimal kernel: [162420.891568] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:09:46 Debian-83-jessie-64-minimal kernel: [162424.734698] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:10:57 Debian-83-jessie-64-minimal kernel: [162495.895693] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:11:01 Debian-83-jessie-64-minimal kernel: [162499.750608] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.895786] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.915877] ------------[ cut here ]------------
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.915964] kernel BUG at drivers/net/ethernet/intel/e1000e/netdev.c:3804!
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916048] invalid opcode: 0000 [#1] SMP PTI
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916126] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G        W         4.19.0-10-amd64 #1 Debian 4.19.132-1
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916222] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.7.0.SR.2 for D3401-H1x                11/25/2015
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916328] Workqueue: events e1000_reset_task [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916410] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916486] Code: ff ff 31 c0 31 ed 66 41 89 45 20 e9 a8 fe ff ff 4c 89 e7 e8 89 f3 ff ff e9 af fe ff ff 4c 89 e7 e8 7c f3 ff ff e9 30 fe ff ff <0f> 0b 4c 89 e7 e8 6d f3 ff ff eb ac 4c 89 e7 e8 63 f3 ff ff e9 68
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916615] RSP: 0018:ffffaf708629fde0 EFLAGS: 00010202
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916689] RAX: 0000000000000067 RBX: ffff9043211f48c0 RCX: 000000000000007d
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916780] RDX: 0000000000000067 RSI: 0000000000000246 RDI: 0000000000000246
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916872] RBP: 000000003103f0fa R08: 0000000000000002 R09: ffffaf708629fdc4
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916963] R10: 00000000000000fe R11: 0000000000000000 R12: ffff9043211f4e38
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917055] R13: ffff90432a33f800 R14: 0000000004008000 R15: ffff9043211f4940
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917147] FS:  0000000000000000(0000) GS:ffff904331200000(0000) knlGS:0000000000000000
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917240] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917315] CR2: 00007f31849487f8 CR3: 00000005a660a003 CR4: 00000000003606f0
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917497] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917588] Call Trace:
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917663]  e1000e_reset+0x574/0x790 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917743]  e1000e_down+0x1cf/0x200 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917819]  e1000e_reinit_locked+0x46/0x60 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917898]  process_one_work+0x1a7/0x3a0
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917974]  worker_thread+0x30/0x390
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918046]  ? create_worker+0x1a0/0x1a0
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918118]  kthread+0x112/0x130
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918188]  ? kthread_bind+0x30/0x30
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918260]  ret_from_fork+0x35/0x40
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918331] Modules linked in: unix_diag ip6t_rpfilter ipt_rpfilter binfmt_misc veth ip6t_MASQUERADE ipt_MASQUERADE xt_CHECKSUM xt_comment xt_tcpudp bridge stp llc dm_mod ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat nf_nat_ipv6 ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter nf_tables nfnetlink cpufreq_conservative cpufreq_userspace cpufreq_powersave fuse intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul evdev crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore squashfs iTCO_wdt pcc_cpufreq sg iTCO_vendor_support intel_pch_thermal intel_rapl_perf fujitsu_laptop wmi loop sparse_keymap video acpi_pad button ip_tables x_tables autofs4
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918698]  ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 md_mod sd_mod crc32c_intel ahci xhci_pci libahci xhci_hcd libata aesni_intel e1000e usbcore scsi_mod aes_x86_64 crypto_simd cryptd glue_helper i2c_i801 usb_common thermal fan
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918920] ---[ end trace fc8f12793b39335d ]---
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918998] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919078] Code: ff ff 31 c0 31 ed 66 41 89 45 20 e9 a8 fe ff ff 4c 89 e7 e8 89 f3 ff ff e9 af fe ff ff 4c 89 e7 e8 7c f3 ff ff e9 30 fe ff ff <0f> 0b 4c 89 e7 e8 6d f3 ff ff eb ac 4c 89 e7 e8 63 f3 ff ff e9 68
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919206] RSP: 0018:ffffaf708629fde0 EFLAGS: 00010202
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919281] RAX: 0000000000000067 RBX: ffff9043211f48c0 RCX: 000000000000007d
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919372] RDX: 0000000000000067 RSI: 0000000000000246 RDI: 0000000000000246
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919464] RBP: 000000003103f0fa R08: 0000000000000002 R09: ffffaf708629fdc4
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919555] R10: 00000000000000fe R11: 0000000000000000 R12: ffff9043211f4e38
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919647] R13: ffff90432a33f800 R14: 0000000004008000 R15: ffff9043211f4940
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919739] FS:  0000000000000000(0000) GS:ffff904331200000(0000) knlGS:0000000000000000
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919937] CR2: 00007f31849487f8 CR3: 00000005a660a003 CR4: 00000000003606f0
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.920030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.920123] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

답변1

kernel BUG at drivers/net/ethernet/intel/e1000e/netdev.c:3804!
kernel: [162559.916048] invalid opcode: 0000 [#1] SMP PTI
kernel: [162559.916126] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G        W         4.19.0-10-amd64 #1 Debian 4.19.132-1
kernel: [162559.916222] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.7.0.SR.2 for D3401-H1x                11/25/2015
kernel: [162559.916328] Workqueue: events e1000_reset_task [e1000e]
kernel: [162559.916410] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]

귀하는 업스트림 소스 코드 위에 일부 패치가 적용된 데비안의 배포 커널을 실행하고 있으므로 빠른 분석이 완전히 정확하지 않을 수 있습니다. 하지만보고drivers/net/ethernet/intel/e1000e/netdev.c4.19.170 업스트림 소스 코드의 3804행그러면 다음과 같은 결과가 나옵니다.

BUG_ON(tdt != tx_ring->next_to_use);

kernel BUG at...지정된 조건이 true이면 스택 추적 및 모든 메시지가 트리거됩니다.

이 줄은 function e1000_flush_tx_ring()에 의해 호출되는 function 에 있으며, e1000_flush_desc_rings()이는 오류 메시지에서 명령 포인터 위치로 참조됩니다.

RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]

아마도 컴파일러가 함수를 인라인화했거나 다른 방법으로 최적화하여 해당 줄에 대한 인식 가능한 기호 e1000_flush_tx_ring()로 명확하지 않을 수 있습니다 . RIP:그러나 일치하는 것 같습니다. 호출 추적은 드라이버가 NIC를 재설정하고 있으며 TX 링을 플러시하는 것이 분명히 해당 프로세스의 일부임을 강력하게 나타냅니다.

그런데 재설정이 필요한 이유는 무엇입니까? 그것은 밝혀Intel, I218/I219 NIC에 대한 사양 업데이트 출시.

5.I219 DMA 트랜잭션 처리 시 버퍼 오버플로

문제: 인텔® 100/200 시리즈 칩셋 플랫폼은 LAN 컨트롤러 DMA 액세스의 왕복 대기 시간을 줄여 특정 고성능 조건에서 DMA 트랜잭션을 처리하는 I219 LAN 연결 장치에 버퍼 오버플로가 발생합니다.

의미: UDP 트래픽이 매우 많고 여러 이더넷 케이블을 다시 연결하는 조건에서 I219LM 및 I219V 장치는 복구할 수 없는 Tx 정지 상태에 걸릴 수 있습니다. LAN 컨트롤러의 Tx 정지는 시스템이 재부팅될 때만 재개됩니다.

해결 방법: 미해결 요청 수를 줄여 DMA 액세스 속도를 약간 늦춥니다. 이 해결 방법은 TCP 트래픽 성능에 영향을 미칠 수 있으며 플랫폼에 따라 최대 5%~15% 성능 저하가 발생할 수 있습니다. TSO를 비활성화하면 CPU 성능에 눈에 띄는 영향을 주지 않고 TCP 트래픽에 대한 성능 저하가 제거됩니다.

상태: Intel® 100/200 시리즈 칩셋 – NoFix

인텔® 300 시리즈 칩셋 - 고정

따라서 근본 원인은 하드웨어(또는 네트워크 카드 펌웨어) 오류인 것 같습니다. 드라이버는 TX 링 버퍼의 구조가 손상되었음을 발견하고 그 원인이 드라이버의 결함이라고 가정했습니다. 하지만 이 경우에는 네트워크 카드 자체에 결함이 있는 것 같습니다.

권장되는 해결 방법은 tsoNIC의 TCP 분할 오프로드 기능을 비활성화하는 것입니다( ).

ethtool -K eth0 tso off

Fujitsu D3401-H1에는 Skylake 세대인 Intel Core i7-6700 프로세서가 탑재된 것으로 보이므로 Intel 100 시리즈 칩셋이 나올 것으로 예상됩니다. 이 칩셋에 사용할 수 있는 수정 사항이 없는 것 같으므로 해결 방법을 적용해야 할 수도 있습니다.

관련 정보