smartctl을 설명하면 하드 드라이브에 결함이 있습니까?

smartctl을 설명하면 하드 드라이브에 결함이 있습니까?

약 일주일 전에 내 드라이브 중 하나의 파일 시스템이 손상된 것을 발견했습니다. fsck로 몇 번 고쳤지만 항상 문제가 발생했습니다. 하드웨어 오류일 수 있다고 생각됩니다. 나는 최근 특정 디스크의 자체 로깅 기능을 활용하는 smartmontools에 대해 배웠습니다. 달리고 난 후

smartctl -t long -C /dev/sda

그런 다음 지정된 시간을 기다린 후 다음 명령을 사용하여 로그에 액세스합니다.

smartctl -a /dev/sda

이것이 출력입니다. 드라이브가 "통과"로 표시되었지만 5개의 ​​오류가 감지되었습니다. 어떻게 생각하나요?

smartctl 7.3 2022-02-28 r5338 [armv7l-linux-5.15.74-gentoo] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Gold
Device Model:     WDC WD102KRYZ-01A5AB0
Serial Number:    VCJ720GP
LU WWN Device Id: 5 000cca 0b0df652b
Firmware Version: 01.01H01
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Jan 25 04:29:49 2023 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (   87) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (1103) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       96
  3 Spin_Up_Time            0x0007   253   253   024    Pre-fail  Always       -       85 (Average 82)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       618
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       8776
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       618
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1959
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1959
194 Temperature_Celsius     0x0002   105   105   000    Old_age   Always       -       57 (Min/Max 20/75)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 5
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 5 occurred at disk power-on lifetime: 7233 hours (301 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  27 00 00 00 00 00 e0 08      00:00:13.239  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      00:00:13.238  IDENTIFY DEVICE

Error 4 occurred at disk power-on lifetime: 7233 hours (301 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  27 00 00 00 00 00 e0 08      00:00:18.904  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      00:00:18.903  IDENTIFY DEVICE
  a1 00 00 00 00 00 a0 08      00:00:17.385  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 08      00:00:17.385  IDENTIFY DEVICE
  2f 00 01 10 00 00 a0 08      00:00:17.384  READ LOG EXT

Error 3 occurred at disk power-on lifetime: 7168 hours (298 days + 16 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 08 80 ff 3f 40  Error: UNC 8 sectors at LBA = 0x003fff80 = 4194176

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 80 ff 3f e0 08      00:01:38.837  READ DMA EXT
  ec 00 00 00 00 00 a0 08      00:01:38.833  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      00:01:38.685  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 08      00:01:38.683  IDENTIFY DEVICE
  25 00 08 80 ff 3f e0 08      00:01:36.902  READ DMA EXT

Error 2 occurred at disk power-on lifetime: 7168 hours (298 days + 16 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 08 00 00 00 40  Error: UNC 8 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 00 00 00 e0 08      00:01:24.305  READ DMA
  ec 00 00 00 00 00 a0 08      00:01:24.211  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      00:01:24.207  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 08      00:01:24.205  IDENTIFY DEVICE
  c8 00 08 00 00 00 e0 08      00:01:23.967  READ DMA

Error 1 occurred at disk power-on lifetime: 7168 hours (298 days + 16 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 08 00 00 00 40  Error: UNC 8 sectors at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 00 00 00 e0 08      00:01:23.968  READ DMA
  ec 00 00 00 00 00 a0 08      00:01:23.912  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      00:01:23.572  SET FEATURES [Set transfer mode]
  ef c3 01 00 00 00 a0 08      00:01:23.485  SET FEATURES [Sense Data Reporting]
  ec 00 00 00 00 00 a0 08      00:01:23.483  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay

답변1

SMART 값은 100으로 정규화되었으며, 값이 낮을수록 더 나쁩니다. 이 값들은 모두 건강해 보입니다. 특히, 수정 불가능한 섹터나 재할당된 섹터가 없고, 원시 읽기 오류율도 좋습니다.

따라서 장기 자체 테스트를 실행하여 불량 섹터가 있는지 확인하십시오. badblk만약을 대비해 직접 스캔을 실행할 수도 있습니다 .

지금까지의 정보를 바탕으로 메모리 장애 등 다양한 원인도 찾기 시작했습니다. 무엇이 손상되었는지, 어떤 fsck가 복구되었는지, 그리고 그것이 하드 드라이브의 특정 섹터와 관련이 있는지 또는 전체 장소와 관련이 있는지에 대한 정보를 얻는 것은 매우 도움이 될 것입니다.

관련 정보