기존 SATA 케이블을 교체하세요. 이전 버전으로 인해 HDD에서 dmesg 오류가 발생합니까?


제가 새로운 WD Red 3TB 디스크로 서버를 구축했을 때 아마도 심각한 판단 오류를 범하고 오래되고 아주 오래된 SATA(위키피디아)케이블. 내 질문은 본질적으로 하드웨어 배경과 Debian 10 Linux 시스템에서 실행되는 일부 기록에 관한 것입니다.

제가 말하는 케이블은 2개입니다.

  • 5세 이상이지만 여전히SATA III인증된 데이터 케이블. IMHO 이것은 과도하게 구부리면 문제가 발생할 수 있지만 남용에 대해서는 인식하지 못하므로 차폐가 수년에 걸쳐 더 좋아졌을 것입니다(?) 기다릴 것 같습니다.

  • 같은 서버에 있다는 사실에 놀랐습니다.SATA 케이블은 15년 이상 되었을 수 있습니다.오직직렬ATAmdadm이상만 쓰고, dmesg다양한 디스크 I/O 오류 메시지 형태로 이 서버를 구성한 이후로 RAID 1 어레이에 점점 더 작은 문제가 발생하고 궁극적으로 어레이 성능이 저하된다는 점을 고려하여 Wondering을 씁니다. 이 케이블 하나 또는 두 케이블 모두 어레이를 읽고 쓸 때 오류를 일으킬 수 있습니다.

케이블 교체

오늘 제가 한 일은 독일산(스티커에 적혀있음) 2개를 구매한 것인데요,아마도 더 높은 품질,SATA III 인증 새 케이블, 무슨 일이 일어나는지 확인하세요.


  1. 서버를 시작하고 어레이를 마운트 해제한 후 중지했습니다.

  2. 밤새 이 두 가지 별도의 디스크 읽기 명령을 실행하기 시작했습니다.

    pv < /dev/sdX > /dev/null
  3. 또한 오류 dmesg및 속도 모니터링을 시작했습니다 nmon. 1시간이 지나도 현재까지 오류나 속도 저하가 없습니다...


dmesg내가 깨어나서 이 드라이브가 오류 없이 완전히 읽는 것을 발견했다고 가정할 때, 오래된 케이블이 오류의 원인이라고 가정할 수 있습니까, 아니면 제가 고려하지 않은 것이 있습니까?

여기에 게시할지 슈퍼유저에 게시할지 결정할 수 없습니다. 다른 곳이 더 적절하다면, 이런 댓글이 많이 달리면 아침에 다시 올리겠습니다. 어쨌든 시간 내주셔서 감사합니다.



smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-9-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N6EZXNSD
LU WWN Device Id: 5 0014ee 210a9a0ef
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun 20 08:47:05 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (40380) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 405) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x703d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   179   178   021    Pre-fail  Always       -       6050
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       31
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       2443
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2423
194 Temperature_Celsius     0x0022   116   109   000    Old_age   Always       -       34
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 2033 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2033 occurred at disk power-on lifetime: 424 hours (17 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 02 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 a0 00   2d+13:04:28.795  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 00   2d+13:04:28.794  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   2d+13:04:28.794  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 00   2d+13:04:28.794  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 00   2d+13:04:28.793  IDENTIFY DEVICE

Error 2032 occurred at disk power-on lifetime: 424 hours (17 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 00   2d+13:04:28.794  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 00   2d+13:04:28.794  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 00   2d+13:04:28.793  IDENTIFY DEVICE
  c8 00 08 00 00 00 e0 00   2d+13:04:28.779  READ DMA
  ef 10 02 00 00 00 a0 00   2d+13:04:28.779  SET FEATURES [Enable SATA feature]

Error 2031 occurred at disk power-on lifetime: 424 hours (17 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 02 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 a0 00   2d+13:04:28.794  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 00   2d+13:04:28.793  IDENTIFY DEVICE
  c8 00 08 00 00 00 e0 00   2d+13:04:28.779  READ DMA
  ef 10 02 00 00 00 a0 00   2d+13:04:28.779  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 00   2d+13:04:28.778  IDENTIFY DEVICE

Error 2030 occurred at disk power-on lifetime: 424 hours (17 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 08 00 00 00 e0  Device Fault; Error: ABRT 8 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 00 00 00 e0 00   2d+13:04:28.779  READ DMA
  ef 10 02 00 00 00 a0 00   2d+13:04:28.779  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 00   2d+13:04:28.778  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   2d+13:04:28.778  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 00   2d+13:04:28.778  SET FEATURES [Enable SATA feature]

Error 2029 occurred at disk power-on lifetime: 424 hours (17 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 02 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 a0 00   2d+13:04:28.779  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 00   2d+13:04:28.778  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   2d+13:04:28.778  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 00   2d+13:04:28.778  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 00   2d+13:04:28.777  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-9-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N5EKLTNX
LU WWN Device Id: 5 0014ee 2bb548051
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun 20 08:50:48 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (39540) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 397) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x703d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   180   179   021    Pre-fail  Always       -       5975
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       2443
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       31
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2443
194 Temperature_Celsius     0x0022   115   107   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 45 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 45 occurred at disk power-on lifetime: 2416 hours (100 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 02 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 a0 08      04:26:20.066  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      04:26:20.066  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      04:26:20.066  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 08      04:26:20.065  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      04:26:20.065  IDENTIFY DEVICE

Error 44 occurred at disk power-on lifetime: 2416 hours (100 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 08      04:26:20.066  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 08      04:26:20.065  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      04:26:20.065  IDENTIFY DEVICE
  ef 10 02 00 00 00 a0 08      04:26:20.046  SET FEATURES [Enable SATA feature]

Error 43 occurred at disk power-on lifetime: 2416 hours (100 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 02 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 a0 08      04:26:20.065  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      04:26:20.065  IDENTIFY DEVICE
  ef 10 02 00 00 00 a0 08      04:26:20.046  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      04:26:20.046  IDENTIFY DEVICE

Error 42 occurred at disk power-on lifetime: 2416 hours (100 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 02 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 10 02 00 00 00 a0 08      04:26:20.046  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      04:26:20.046  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      04:26:20.046  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 08      04:26:20.045  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      04:26:20.045  IDENTIFY DEVICE

Error 41 occurred at disk power-on lifetime: 2416 hours (100 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 08      04:26:20.046  SET FEATURES [Set transfer mode]
  ef 10 02 00 00 00 a0 08      04:26:20.045  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      04:26:20.045  IDENTIFY DEVICE
  ef 10 02 00 00 00 a0 08      04:26:20.030  SET FEATURES [Enable SATA feature]

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


두 드라이브 모두 보증이 적용되므로 고장이 입증되면 교체할 수 있습니다.


SATA 케이블 교체 후(특히 SATA v1 케이블)

그렇다면 두 개의 SATA 케이블을 교체한 후 정확히 무슨 일이 일어났을까요?

  • 첫째, 내 질문에서 언급했듯이 오류 없이 두 드라이브를 모두 읽었습니다!

  • 둘째, 이러한 오류가 쓰기 관련일 수 있다는 것을 알고 쓰기 테스트를 수행했습니다!

아래 이미지는 더 큰 해상도를 가지고 있습니다. 확대하려면 클릭하세요.

SATA 케이블 교체 후

더 이상 볼 수 있는 오류가 없음을 직접 확인할 수 있으며 dmesg이는 나를 기쁘게 하고 내 이론을 증명합니다. 서버를 구축할 때 그 케이블이 얼마나 오래되었는지 알지 못했기 때문에 슬펐습니다. 어쨌든 문제는 일시적으로 해결되었습니다.


내가 생각할 수 있는 유일한 것은 smartctl을 사용하여 하드 드라이브에 문제가 있는지 확인하는 것입니다. 달리기:

smartctl -a -x /dev/sdX

마지막 X를 적절한 문자로 바꾸고 복구할 수 없는 섹터가 있는지 또는 불량 섹터가 계속 쌓이는지 주의 깊게 살펴보세요.

특히 SMART 보고서에 따르면 하드 드라이브에 문제가 있는 것으로 보이지 않고 케이블을 교체한 후 개선된 점이 보이면 케이블에 문제가 있다고 말하는 것이 타당하다고 생각합니다.


관련 SATA 버전은 2002년(1.5G), 2005년(3.0G), 2008년(6.0G)에 출시되었습니다. 따라서 귀하의 케이블은 1.5 또는 3.0 시대의 것입니다. 이론적으로 오래된 케이블은 더 새롭고 빠른 장치에서 작동해야 하지만 이 조합의 문제점은 잘 알려져 있습니다.

현재 SATA 링크 속도는 다음과 같이 확인할 수 있습니다.

smartctl -a /dev/sda | grep SATA

커널 매개변수를 사용하여 커널이 링크를 더 낮은 속도로 구성하도록 할 수 있습니다 libata.force=1.5. 이전 케이블 및 커널 매개변수 문제가 사라지면 케이블에 문제가 있다고 확신합니다.

