왜 내 raid5가 계속 재동기화되나요? (장치 이름이 지속되지 않습니까?)

2024-5-22 • tag-icon

왜 내 raid5가 계속 재동기화되나요? (장치 이름이 지속되지 않습니까?)

mdadm을 사용하여 ubuntu 14.04에서 Intel RST 소프트웨어 RAID를 수행했습니다. (Raid 5의 4 x 6TB 드라이브, 장치 이름 /dev로 생성됨)

sudo mdadm -C /dev/md/imsm /dev/sda /dev/sdb /dev/sdh /dev/sdi -n 4 -e imsm
sudo mdadm -C /dev/md/vol0 /dev/md/imsm -n 4 -l 5

이제 출력됩니다(몇 번 재부팅한 후의 모습).

sudo mdadm --query --detail /dev/md/vol0 
/dev/md/vol0:
      Container : /dev/md/imsm0, member 0
     Raid Level : raid5
     Array Size : 17581557760 (16767.08 GiB 18003.52 GB)
  Used Dev Size : -1
   Raid Devices : 4
  Total Devices : 4

          State : clean, resyncing 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-asymmetric
     Chunk Size : 128K

  Resync Status : 54% complete


           UUID : 9adaf3f8:d899c72b:fdf41fd1:07ee0399
    Number   Major   Minor   RaidDevice State
       3       8      112        0      active sync   /dev/sdh
       2       8       16        1      active sync   /dev/sdb
       1       8        0        2      active sync   /dev/sda
       0       8      144        3      active sync   /dev/sdj

항상 재동기화 문제는 시스템이 부팅 후 일관되지 않게 장치 이름을 바꾸는 것일 수 있습니까(예: /dev/sda -> 갑자기 /dev/sdi가 됨)?

sudo mdadm --detail --scan
ARRAY /dev/md/imsm0 metadata=imsm UUID=e409a30d:353a9b11:1f9a221a:7ed7cd21
ARRAY /dev/md/vol0 container=/dev/md/imsm0 member=0 UUID=9adaf3f8:d899c72b:fdf41fd1:07ee0399

mdadm 도구 출력:

sudo mdadm --examine /dev/md/imsm0 
/dev/md/imsm0:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.3.00
    Orig Family : 68028309
         Family : 68028309
     Generation : 00002f29
     Attributes : All supported
           UUID : e409a30d:353a9b11:1f9a221a:7ed7cd21
       Checksum : 85b1b0cb correct
    MPB Sectors : 2
          Disks : 4
   RAID Devices : 1

  Disk00 Serial : WD-WXL1H84E5WHF
          State : active
             Id : 00000002
    Usable Size : 11721038862 (5589.03 GiB 6001.17 GB)

[vol0]:
           UUID : 9adaf3f8:d899c72b:fdf41fd1:07ee0399
     RAID Level : 5 <-- 5
        Members : 4 <-- 4
          Slots : [UUUU] <-- [UUUU]
    Failed disk : none
      This Slot : 0
     Array Size : 35163115520 (16767.08 GiB 18003.52 GB)
   Per Dev Size : 11721038848 (5589.03 GiB 6001.17 GB)
  Sector Offset : 0
    Num Stripes : 45785308
     Chunk Size : 128 KiB <-- 128 KiB
       Reserved : 0
  Migrate State : repair
      Map State : normal <-- normal
     Checkpoint : 5191081 (1024)
    Dirty State : clean

  Disk01 Serial : WD-WX51DA476UL6
          State : active
             Id : 00000001
    Usable Size : 11721038862 (5589.03 GiB 6001.17 GB)

  Disk02 Serial : WD-WX51DA476P65
          State : active
             Id : 00000000
    Usable Size : 11721038862 (5589.03 GiB 6001.17 GB)

  Disk03 Serial : WD-WX51DA476HS5
          State : active
             Id : 00000003
    Usable Size : 11721038862 (5589.03 GiB 6001.17 GB)

그럼 더러운 상태는 깨끗하다는 뜻인가요? 그렇다면 왜 다시 동기화해야 할까요? 잠재적인 문제가 어디에 있는지 아는 사람이 있습니까?

내 dmesg tail 출력에는 다음이 표시됩니다. SAS ata7 포트가 없다고 해야 할까요(아마 BIOS에서 Marvell SAS 컨트롤러가 꺼져 있는 것 같습니다). SAT 포트는 6개이고 SAS 포트는 2개(꺼져 있음)뿐입니다.

[ 4064.913017] sr 0:0:0:0: command ffff8802fc4ccc00 timed out
[ 4064.913043] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 4064.913048] sas: ata7: end_device-0:0: cmd error handler
[ 4064.913092] sas: ata7: end_device-0:0: dev error handler
[ 4064.913529] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[ 4064.913874] sr 0:0:0:0: command ffff8802fb703b00 timed out
[ 4064.913896] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 4064.913900] sas: ata7: end_device-0:0: cmd error handler
[ 4064.913984] sas: ata7: end_device-0:0: dev error handler
[ 4064.914356] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[ 4064.915269] sr 0:0:0:0: command ffff8802fc4ccc00 timed out
[ 4064.915297] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 4064.915302] sas: ata7: end_device-0:0: cmd error handler
[ 4064.915382] sas: ata7: end_device-0:0: dev error handler
[ 4064.915777] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[ 4064.923419] md: md127 stopped.
[ 4064.927256] md: bind<sdc>
[ 4064.927350] md: bind<sdb>
[ 4064.927427] md: bind<sda>
[ 4064.927505] md: bind<sdi>
[ 4065.497163] sr 0:0:0:0: command ffff880304de9700 timed out
[ 4065.497181] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 4065.497184] sas: ata7: end_device-0:0: cmd error handler
[ 4065.497255] sas: ata7: end_device-0:0: dev error handler
[ 4065.497650] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[ 4065.498026] sr 0:0:0:0: command ffff8802fb703e00 timed out
[ 4065.498041] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 4065.498043] sas: ata7: end_device-0:0: cmd error handler
[ 4065.498106] sas: ata7: end_device-0:0: dev error handler
[ 4065.498503] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[ 4065.499352] sr 0:0:0:0: command ffff880304de9700 timed out
[ 4065.499372] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[ 4065.499375] sas: ata7: end_device-0:0: cmd error handler
[ 4065.499483] sas: ata7: end_device-0:0: dev error handler
[ 4065.499803] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[ 4071.294317] md: md126 stopped.
[ 4071.294421] md: bind<sdi>
[ 4071.294481] md: bind<sda>
[ 4071.294533] md: bind<sdb>
[ 4071.294596] md: bind<sdc>
[ 4071.296579] md/raid:md126: not clean -- starting background reconstruction
[ 4071.296595] md/raid:md126: device sdc operational as raid disk 0
[ 4071.296596] md/raid:md126: device sdb operational as raid disk 1
[ 4071.296597] md/raid:md126: device sda operational as raid disk 2
[ 4071.296598] md/raid:md126: device sdi operational as raid disk 3
[ 4071.296900] md/raid:md126: allocated 0kB
[ 4071.296920] md/raid:md126: raid level 5 active with 4 out of 4 devices, algorithm 0
[ 4071.296922] RAID conf printout:
[ 4071.296923]  --- level:5 rd:4 wd:4
[ 4071.296925]  disk 0, o:1, dev:sdc
[ 4071.296926]  disk 1, o:1, dev:sdb
[ 4071.296927]  disk 2, o:1, dev:sda
[ 4071.296929]  disk 3, o:1, dev:sdi
[ 4071.296944] md126: detected capacity change from 0 to 18003515146240
[ 4071.297632]  md126: unknown partition table
[ 4072.773368] md: md126 switched to read-write mode.
[ 4072.773686] md: resync of RAID array md126
[ 4072.773690] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 4072.773692] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 4072.773698] md: using 128k window, over a total of 5860519424k.

관련 정보