amd r9 5995x가 있는 호스트에서 freeNas OS가 kdb 모드로 멈춘 게스트 VM

amd r9 5995x가 있는 호스트에서 freeNas OS가 kdb 모드로 멈춘 게스트 VM

문제 설명:

한때 amd r3 3100이 장착된 호스트에 ubuntu20.04를 설치하고 kvm을 설치하고 freeNas vm을 시작한 적이 있는데 모든 것이 잘 진행되었습니다. 그런데 CPU를 바꾸고 나니 freeNas 게스트는 작동하지 않는데 우분투를 사용하는 다른 게스트는 작동하더군요.

무료Nas 게스트 로그인

db> reboot
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 1
GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.3-RELEASE-p14 #0 r325575+c936002dbe2(HEAD): Mon Sep 28 10:48:27 EDT 2020
    [email protected]:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64-DEBUG amd64
FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): text 80x25
CPU: AMD EPYC-Milan Processor (3400.05-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0xa00f11  Family=0x19  Model=0x1  Stepping=1
  Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0xfff83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0xc003f7<LAHF,CMP,SVM,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,Topology,PCXC>
  Structured Extended Features=0x211c07ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLWB,SHA>
  Structured Extended Features2=0x40060c<UMIP,PKU,RDPID>
  Structured Extended Features3=0xac000010<IBPB,STIBP,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0x69<RDCL_NO,SKIP_L1DFL_VME>
  AMD Extended Feature Extensions ID EBX=0x300d205<CLZERO,XSaveErPtr>
  SVM: NP,NRIP,NAsids=16
Hypervisor: Origin = "KVMKVMKVM"
real memory  = 8489271296 (8096 MB)
avail memory = 8143572992 (7766 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <BOCHS  BXPCAPIC>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s)
WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
ioapic0 <Version 1.1> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
random: entropy device external interface
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
mlx5en: Mellanox Ethernet driver 3.5.1 (April 2019)
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard
padlock0: No ACE support.
acpi0: <BOCHS BXPCRSDT> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71,0x72-0x77 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x608-0x60b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc1a0-0xc1af at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> port 0xc100-0xc11f mem 0xf4000000-0xf7ffffff,0xf8000000-0xfbffffff,0xfc094000-0xfc095fff irq 10 at device 2.0 on pci0
vgapci0: Boot video device
virtio_pci0: <VirtIO PCI Network adapter> port 0xc120-0xc13f mem 0xfc096000-0xfc096fff,0xfebf0000-0xfebf3fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 52:54:00:9b:85:3a
pci0: <multimedia, HDA> at device 4.0 (no driver attached)
uhci0: <Intel 82801I (ICH9) USB controller> port 0xc140-0xc15f irq 10 at device 5.0 on pci0
usbus0 on uhci0
usbus0: 12Mbps Full Speed USB v1.0
uhci1: <Intel 82801I (ICH9) USB controller> port 0xc160-0xc17f irq 10 at device 5.1 on pci0
usbus1 on uhci1
usbus1: 12Mbps Full Speed USB v1.0
uhci2: <Intel 82801I (ICH9) USB controller> port 0xc180-0xc19f irq 11 at device 5.2 on pci0
usbus2 on uhci2
usbus2: 12Mbps Full Speed USB v1.0
ehci0: <Intel 82801I (ICH9) USB 2.0 controller> mem 0xfc097000-0xfc097fff irq 11 at device 5.7 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci0
usbus3: 480Mbps High Speed USB v2.0
virtio_pci1: <VirtIO PCI Console adapter> port 0xc080-0xc0bf mem 0xfc098000-0xfc098fff,0xfebf4000-0xfebf7fff irq 10 at device 6.0 on pci0
virtio_pci2: <VirtIO PCI Balloon adapter> port 0xc0c0-0xc0ff mem 0xfebf8000-0xfebfbfff irq 11 at device 7.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci2
virtio_pci3: <VirtIO PCI Block adapter> port 0xc000-0xc07f mem 0xfc099000-0xfc099fff,0xfebfc000-0xfebfffff irq 11 at device 8.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci3
vtblk0: 5723166MB (11721045168 512 byte sectors)
acpi_syscontainer0: <System Container> on acpi0
acpi_syscontainer1: <System Container> port 0xaf00-0xaf0b on acpi0
acpi_syscontainer2: <System Container> port 0xafe0-0xafe3 on acpi0
acpi_syscontainer3: <System Container> port 0xae00-0xae13 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
orm0: <ISA Option ROM> at iomem 0xe9800-0xeffff on isa0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 10.000 msec
freenas_sysctl: adding account.
freenas_sysctl: adding directoryservice.
freenas_sysctl: adding middlewared.
freenas_sysctl: adding network.
freenas_sysctl: adding services.
ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled
ugen2.1: <Intel UHCI root HUB> at usbus2
ugen3.1: <Intel EHCI root HUB> at usbus3
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen0.1: <Intel UHCI root HUB> at usbus0
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel UHCI root HUB> at usbus1
uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <QEMU HARDDISK 2.5+> ATA-7 device
ada0: Serial Number QM00001
ada0: 16.700MB/s transfers (WDMA2, PIO 8192bytes)
ada0: 61440MB (125829120 512 byte sectors)
cd0 at ata0 bus 0 scbus0 target 1 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00002
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
WARNING: WITNESS option enabled, expect reduced performance.
Trying to mount root from zfs:freenas-boot/ROOT/default []...
Root mount waiting for: usbus3 usbus2 usbus1 usbus0
uhub0: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub3: 2 ports with 2 removable, self powered
Root mount waiting for: usbus3
Root mount waiting for: usbus3
uhub1: 6 ports with 6 removable, self powered
Root mount waiting for: usbus3
ugen3.2: <QEMU QEMU USB Tablet> at usbus3
Starting devd.
warning: KLD '/boot/kernel-debug/uhid.ko' is newer than the linker.hints file
lo0: link state changed to UP


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xfffffe02311f30c0
fault code      = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff81016d09


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xfffffe02311c60c0
stack pointer           = 0x28:0xfffffe02311f1eb0
frame pointer           = 0x28:0xfffffe02311f1eb0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 99 (python3.7)
trap number     = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02311f1b70
vpanic() at vpanic+0x17e/frame 0xfffffe02311f1bd0
panic() at panic+0x43/frame 0xfffffe02311f1c30
trap_fatal() at trap_fatal+0x369/frame 0xfffffe02311f1c80
trap_pfault() at trap_pfault+0x62/frame 0xfffffe02311f1cd0
trap() at trap+0x2b3/frame 0xfffffe02311f1de0
calltrap() at calltrap+0x8/frame 0xfffffe02311f1de0
--- trap 0xc, rip = 0xffffffff81016d09, rsp = 0xfffffe02311f1eb0, rbp = 0xfffffe02311f1eb0 ---
bcopy() at bcopy+0x19/frame 0xfffffe02311f1eb0
fpugetregs() at fpugetregs+0x192/frame 0xfffffe02311f1f00
get_mcontext() at get_mcontext+0x1b4/frame 0xfffffe02311f1f50
sys_getcontext() at sys_getcontext+0x56/frame 0xfffffe02311f2300
amd64_syscall() at amd64_syscall+0x792/frame 0xfffffe02311f2430
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe02311f2430
--- syscall (421, FreeBSD ELF64, sys_getcontext), rip = 0x801c26280, rsp = 0x7fffffffd188, rbp = 0x7fffffffdcf0 ---
KDB: enter: panic
[ thread pid 99 tid 100490 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

CPU BIOS 설정은 다음과 같습니다.

dmidecode | grep "Processor Information" -A 54
Processor Information
    Socket Designation: AM4
    Type: Central Processor
    Family: Zen
    Manufacturer: Advanced Micro Devices, Inc.
    ID: 10 0F A2 00 FF FB 8B 17
    Signature: Family 25, Model 33, Stepping 0
    Flags:
        FPU (Floating-point unit on-chip)
        VME (Virtual mode extension)
        DE (Debugging extension)
        PSE (Page size extension)
        TSC (Time stamp counter)
        MSR (Model specific registers)
        PAE (Physical address extension)
        MCE (Machine check exception)
        CX8 (CMPXCHG8 instruction supported)
        APIC (On-chip APIC hardware supported)
        SEP (Fast system call)
        MTRR (Memory type range registers)
        PGE (Page global enable)
        MCA (Machine check architecture)
        CMOV (Conditional move instruction supported)
        PAT (Page attribute table)
        PSE-36 (36-bit page size extension)
        CLFSH (CLFLUSH instruction supported)
        MMX (MMX technology supported)
        FXSR (FXSAVE and FXSTOR instructions supported)
        SSE (Streaming SIMD extensions)
        SSE2 (Streaming SIMD extensions 2)
        HTT (Multi-threading)
    Version: AMD Ryzen 9 5950X 16-Core Processor
    Voltage: 1.1 V
    External Clock: 100 MHz
    Max Speed: 5050 MHz
    Current Speed: 3400 MHz
    Status: Populated, Enabled
    Upgrade: Socket AM4
    L1 Cache Handle: 0x0013
    L2 Cache Handle: 0x0014
    L3 Cache Handle: 0x0015
    Serial Number: Unknown
    Asset Tag: Unknown
    Part Number: Unknown
    Core Count: 16
    Core Enabled: 16
    Thread Count: 32
    Characteristics:
        64-bit capable
        Multi-Core
        Hardware Thread
        Execute Protection
        Enhanced Virtualization
        Power/Performance Control

kdb를 재설정한 후 다음 메시지를 발견했습니다.

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xfffffe02311d00c0
fault code      = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff81016d09
stack pointer           = 0x28:0xfffffe02311ceeb0
frame pointer           = 0x28:0xfffffe02311ceeb0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 99 (python3.7)
trap number     = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:

내가 시도한 것들:

  1. 게스트를 다시 설치했지만 실패했습니다. 동일한 문제로 kdb 모드로 들어갈 수 없습니다.
  2. 호스트를 다시 시작했지만 문제를 해결할 수 없습니다.

질문:

  1. KDB로부터 더 자세한 정보를 수집하려면 어떻게 해야 합니까?
  2. 문제를 해결하는 방법
  3. freeNas는 AMD Ryzen 9 5950X 16코어 프로세서를 지원하지 않습니다.

답변1

Wu의 도움으로 다음 명령을 사용하여 freeNas os 이미지를 사용하여 테스트 가상 머신을 생성할 수 있었습니다.

virt-install \
--name test \
--memory 8096 \
--vcpus 2 \
--cpu host-model-only \
--cdrom /var/lib/libvirt/isos/TrueNAS-12.0-U5.1.iso \
--disk size=30,bus=virtio \
--network type=direct,source=enp42s0,source_mode=bridge \
--os-type=linux  \
--os-variant freebsd11.3 \
--graphics vnc,listen=0.0.0.0,port=20012 \
--video vga --input tablet,bus=usb

freeNas vm과 test vm의 xml을 비교한 후, CPU 구성요소를 다음과 같이 변경했습니다.

  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>EPYC-Rome</model>
    <feature policy='require' name='ibpb'/>
    <feature policy='require' name='spec-ctrl'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='virt-ssbd'/>
  </cpu>

그리고 다음 명령을 실행하세요

virsh destroy freeNas
virsh start freeNas

마침내 그것이 돌아왔다.

현재로서는 이론이 아닌 시도에 영감을 받은 것이기 때문에 왜 이런 일이 발생하는지 전혀 모르겠습니다.

관련 정보