4개의 디스크가 있는 RAID 5가 1개의 디스크 오류로 실행되지 않습니까?

Question 1

이는 RAID5의 근본적인 문제입니다. 재구축 중 불량 블록이 킬러가 됩니다.

Oct  2 15:08:51 it kernel: [1686185.573233] md/raid:md0: device xvdc operational as raid disk 0
Oct  2 15:08:51 it kernel: [1686185.580020] md/raid:md0: device xvde operational as raid disk 2
Oct  2 15:08:51 it kernel: [1686185.588307] md/raid:md0: device xvdd operational as raid disk 1
Oct  2 15:08:51 it kernel: [1686185.595745] md/raid:md0: allocated 4312kB
Oct  2 15:08:51 it kernel: [1686185.600729] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
Oct  2 15:08:51 it kernel: [1686185.608928] md0: detected capacity change from 0 to 2705221484544
⋮

어레이가 조립되고 성능이 저하되었습니다. xvdc, xvde 및 xvdd로 조립되었습니다. 분명히 핫 스페어가 있습니다.

Oct  2 15:08:51 it kernel: [1686185.615772] md: recovery of RAID array md0
Oct  2 15:08:51 it kernel: [1686185.621150] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Oct  2 15:08:51 it kernel: [1686185.627626] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Oct  2 15:08:51 it kernel: [1686185.634024]  md0: unknown partition table
Oct  2 15:08:51 it kernel: [1686185.645882] md: using 128k window, over a total of 880605952k.

"파티션 테이블" 메시지는 관련이 없습니다. 다른 메시지에서는 md가 아마도 핫 스페어(또는 제거/다시 추가하려고 하면 이전에 실패한 장치)에서 복구를 시도하고 있음을 알려줍니다.

⋮
Oct  2 15:24:19 it kernel: [1687112.817845] end_request: I/O error, dev xvde, sector 881423360
Oct  2 15:24:19 it kernel: [1687112.820517] raid5_end_read_request: 1 callbacks suppressed
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423360 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: Disk failure on xvde, disabling device.
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: Operation continuing on 2 devices.

여기서 md는 xvde(나머지 세 장치 중 하나)에서 섹터를 읽으려고 시도합니다. [불량 섹터로 인해] 실패하고 md(어레이 성능 저하로 인해)는 복구할 수 없습니다. 따라서 어레이에서 디스크를 제거하고 이중 디스크 오류가 발생하는 경우 RAID5는 쓸모 없게 됩니다.

왜 예비용으로 표시되어 있는지 잘 모르겠습니다. 이상합니다(하지만 제가 보통 보는 것 같으니 /proc/mdstatmdadm이 그렇게 표시했을 수도 있습니다). 또한 최신 커널은 불량 블록을 제거하는 데 훨씬 더 주저한다고 생각합니다. 하지만 아마도 이전 커널을 실행하고 있습니까?

이에 대해 무엇을 할 수 있나요?

좋은 백업입니다. 이는 데이터를 유지하기 위한 모든 전략에서 항상 중요한 부분입니다.

정기적으로 불량 블록 배열을 청소하십시오. 운영 체제에는 이 목적을 위한 크론 작업이 이미 포함되어 있을 수 있습니다. 또는 를 에코하여 repair이를 check수행 할 수 있습니다 /sys/block/md0/md/sync_action. "복구"는 발견된 패리티 오류도 수정합니다(예: 패리티 비트가 디스크의 데이터와 일치하지 않음).

# echo repair > /sys/block/md0/md/sync_action
#

cat /proc/mdstat진행 상황은 , 또는 sysfs 디렉터리의 다양한 파일을 사용하여 볼 수 있습니다. (최신 문서 중 일부는 다음에서 찾을 수 있습니다.Linux Raid Wiki mdstat 기사.

참고: 이전 커널(정확한 버전은 확실하지 않음)에서는 검사를 통해 잘못된 블록이 수정되지 않을 수 있습니다.

마지막 옵션은 RAID6으로 전환하는 것입니다. 이를 위해서는 다른 디스크가 필요합니다(할 수 있는4개 또는 3개의 디스크로 RAID6을 실행하는 것은 아마도 원하지 않을 것입니다.) 충분히 새로운 커널을 사용하면 불량 블록이 가능한 한 즉시 복구됩니다. RAID6은 두 개의 디스크 오류에서도 살아남을 수 있으므로 하나의 디스크에 오류가 발생해도 여전히 불량 블록에서 살아남을 수 있으므로 불량 블록을 매핑하고 계속 재구축합니다.

Answer

이는 RAID5의 근본적인 문제입니다. 재구축 중 불량 블록이 킬러가 됩니다.

Oct  2 15:08:51 it kernel: [1686185.573233] md/raid:md0: device xvdc operational as raid disk 0
Oct  2 15:08:51 it kernel: [1686185.580020] md/raid:md0: device xvde operational as raid disk 2
Oct  2 15:08:51 it kernel: [1686185.588307] md/raid:md0: device xvdd operational as raid disk 1
Oct  2 15:08:51 it kernel: [1686185.595745] md/raid:md0: allocated 4312kB
Oct  2 15:08:51 it kernel: [1686185.600729] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
Oct  2 15:08:51 it kernel: [1686185.608928] md0: detected capacity change from 0 to 2705221484544
⋮

어레이가 조립되고 성능이 저하되었습니다. xvdc, xvde 및 xvdd로 조립되었습니다. 분명히 핫 스페어가 있습니다.

Oct  2 15:08:51 it kernel: [1686185.615772] md: recovery of RAID array md0
Oct  2 15:08:51 it kernel: [1686185.621150] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Oct  2 15:08:51 it kernel: [1686185.627626] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Oct  2 15:08:51 it kernel: [1686185.634024]  md0: unknown partition table
Oct  2 15:08:51 it kernel: [1686185.645882] md: using 128k window, over a total of 880605952k.

"파티션 테이블" 메시지는 관련이 없습니다. 다른 메시지에서는 md가 아마도 핫 스페어(또는 제거/다시 추가하려고 하면 이전에 실패한 장치)에서 복구를 시도하고 있음을 알려줍니다.

⋮
Oct  2 15:24:19 it kernel: [1687112.817845] end_request: I/O error, dev xvde, sector 881423360
Oct  2 15:24:19 it kernel: [1687112.820517] raid5_end_read_request: 1 callbacks suppressed
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423360 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: Disk failure on xvde, disabling device.
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: Operation continuing on 2 devices.

여기서 md는 xvde(나머지 세 장치 중 하나)에서 섹터를 읽으려고 시도합니다. [불량 섹터로 인해] 실패하고 md(어레이 성능 저하로 인해)는 복구할 수 없습니다. 따라서 어레이에서 디스크를 제거하고 이중 디스크 오류가 발생하는 경우 RAID5는 쓸모 없게 됩니다.

왜 예비용으로 표시되어 있는지 잘 모르겠습니다. 이상합니다(하지만 제가 보통 보는 것 같으니 /proc/mdstatmdadm이 그렇게 표시했을 수도 있습니다). 또한 최신 커널은 불량 블록을 제거하는 데 훨씬 더 주저한다고 생각합니다. 하지만 아마도 이전 커널을 실행하고 있습니까?

이에 대해 무엇을 할 수 있나요?

좋은 백업입니다. 이는 데이터를 유지하기 위한 모든 전략에서 항상 중요한 부분입니다.

정기적으로 불량 블록 배열을 청소하십시오. 운영 체제에는 이 목적을 위한 크론 작업이 이미 포함되어 있을 수 있습니다. 또는 를 에코하여 repair이를 check수행 할 수 있습니다 /sys/block/md0/md/sync_action. "복구"는 발견된 패리티 오류도 수정합니다(예: 패리티 비트가 디스크의 데이터와 일치하지 않음).

# echo repair > /sys/block/md0/md/sync_action
#

cat /proc/mdstat진행 상황은 , 또는 sysfs 디렉터리의 다양한 파일을 사용하여 볼 수 있습니다. (최신 문서 중 일부는 다음에서 찾을 수 있습니다.Linux Raid Wiki mdstat 기사.

참고: 이전 커널(정확한 버전은 확실하지 않음)에서는 검사를 통해 잘못된 블록이 수정되지 않을 수 있습니다.

마지막 옵션은 RAID6으로 전환하는 것입니다. 이를 위해서는 다른 디스크가 필요합니다(할 수 있는4개 또는 3개의 디스크로 RAID6을 실행하는 것은 아마도 원하지 않을 것입니다.) 충분히 새로운 커널을 사용하면 불량 블록이 가능한 한 즉시 복구됩니다. RAID6은 두 개의 디스크 오류에서도 살아남을 수 있으므로 하나의 디스크에 오류가 발생해도 여전히 불량 블록에서 살아남을 수 있으므로 불량 블록을 매핑하고 계속 재구축합니다.

Question 2

다음과 같은 RAID5 어레이를 생성한다고 가정합니다.

$ mdadm --create /dev/md0 --level=5 --raid-devices=4 \
       /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

이것은 정확히 당신이 원하는 것이 아닙니다. 대신 다음과 같이 디스크를 추가해야 합니다.

$ mdadm --create /dev/md0 --level=5 --raid-devices=4 \
       /dev/sda1 /dev/sdb1 /dev/sdc1
$ mdadm --add /dev/md0 /dev/sdd1

또는 mdadm옵션을 사용하여 아래와 같이 예비 부품을 추가할 수 있습니다.

$ mdadm --create /dev/md0 --level=5 --raid-devices=3 --spare-devices=1 \
       /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

목록의 마지막 드라이브가 예비 드라이브가 됩니다.

에서 발췌mdadm 매뉴얼 페이지

-n, --raid-devices=
      Specify the number of active devices in the array.  This, plus the 
      number of spare devices (see below) must  equal the  number  of  
      component-devices (including "missing" devices) that are listed on 
      the command line for --create. Setting a value of 1 is probably a 
      mistake and so requires that --force be specified first.  A  value 
      of  1  will then be allowed for linear, multipath, RAID0 and RAID1.  
      It is never allowed for RAID4, RAID5 or RAID6. This  number  can only 
      be changed using --grow for RAID1, RAID4, RAID5 and RAID6 arrays, and
      only on kernels which provide the necessary support.

-x, --spare-devices=
      Specify the number of spare (eXtra) devices in the initial array.  
      Spares can also be  added  and  removed  later. The  number  of component
      devices listed on the command line must equal the number of RAID devices 
      plus the number of spare devices.

Answer

다음과 같은 RAID5 어레이를 생성한다고 가정합니다.

$ mdadm --create /dev/md0 --level=5 --raid-devices=4 \
       /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

이것은 정확히 당신이 원하는 것이 아닙니다. 대신 다음과 같이 디스크를 추가해야 합니다.

$ mdadm --create /dev/md0 --level=5 --raid-devices=4 \
       /dev/sda1 /dev/sdb1 /dev/sdc1
$ mdadm --add /dev/md0 /dev/sdd1

또는 mdadm옵션을 사용하여 아래와 같이 예비 부품을 추가할 수 있습니다.

$ mdadm --create /dev/md0 --level=5 --raid-devices=3 --spare-devices=1 \
       /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

목록의 마지막 드라이브가 예비 드라이브가 됩니다.

에서 발췌mdadm 매뉴얼 페이지

-n, --raid-devices=
      Specify the number of active devices in the array.  This, plus the 
      number of spare devices (see below) must  equal the  number  of  
      component-devices (including "missing" devices) that are listed on 
      the command line for --create. Setting a value of 1 is probably a 
      mistake and so requires that --force be specified first.  A  value 
      of  1  will then be allowed for linear, multipath, RAID0 and RAID1.  
      It is never allowed for RAID4, RAID5 or RAID6. This  number  can only 
      be changed using --grow for RAID1, RAID4, RAID5 and RAID6 arrays, and
      only on kernels which provide the necessary support.

-x, --spare-devices=
      Specify the number of spare (eXtra) devices in the initial array.  
      Spares can also be  added  and  removed  later. The  number  of component
      devices listed on the command line must equal the number of RAID devices 
      plus the number of spare devices.

4개의 디스크가 있는 RAID 5가 1개의 디스크 오류로 실행되지 않습니까?

답변1

이에 대해 무엇을 할 수 있나요?

답변2

관련 정보