"커널: 장치의 버퍼 I/O 오류" - 내 서버에 하드웨어 문제가 있습니까?

"커널: 장치의 버퍼 I/O 오류" - 내 서버에 하드웨어 문제가 있습니까?

Linux DB 서버 redhat 7.2가 있습니다.

마운트된 모든 디스크에 관해 다음과 같은 많은 메시지가 나타났습니다.

~에서/var/log/messages

이 동작이 하드웨어 문제와 관련된 경우 무엇을 알아야 합니까?

Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4980*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4981*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4982*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4983*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4984*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4985*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4986*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4987*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4988*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4989*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4990*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4991*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4992*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4993*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4994*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4995*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4996*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4997*

우리도 이 뉴스를 봤어요

Mar 27 09:18:08 server_DB smartd[1734]: Monitoring 0 ATA and 26 SCSI devices
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:02*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:02*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:01*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:01*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:80/0000:80*CO*/0000:81*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:80/0000:80*CO*/0000:81*CO*': not supported by any plugin

디스크도 확인해봤는데

smartctl -a -d megaraid,0 /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST600MM0238
Revision:             BS04
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Formatted with type 2 protection
Logical block provisioning type unreported, LBPME=0, LBPRZ=0
Rotation Rate:        10000 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c500a0f28343
Serial number:        W0M0LYD2
Device type:          disk
Transport protocol:   SAS
Local Time is:        Wed Mar 27 10:51:30 2019 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     24 C
Drive Trip Temperature:        60 C

Manufactured in week 45 of year 2017
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  50
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  177
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 412242328
  Blocks received from initiator = 3213595579
  Blocks read from cache and sent to initiator = 312462212
  Number of read and write commands whose size <= segment size = 31915885
  Number of read and write commands whose size > segment size = 0

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 3178.45
  number of minutes until next internal SMART test = 12

답변1

I/O error메시지는 하드웨어 오류에 대해 경고하기 위해 작성되었습니다 sdb. 예를 들어 디스크나 케이블과 함께 사용할 수 있습니다.

동시에 오류를 표시하는 디스크가 많다면 디스크 자체에 결함이 있을 가능성은 거의 없습니다. :-). 이는 디스크 컨트롤러의 버그일 수 있습니다.

"버퍼 I/O 오류"가 표시되지만 ATA 또는 SCSI 오류 코드 또는 일반적인 재시도에 대한 구체적인 메시지가 없는 경우 몇 가지 단서를 제공할 수 있습니다. 그러나 나는 정말로 모른다 :-).

물론 소프트웨어 버그로 인해 어떤 메시지라도 나타날 수 있습니다. :-).

동일한 오류가 아님을 알면서도 소프트웨어 오류의 예를 들면 다음과 같습니다. "버퍼 I/O 오류"라는 커널 오류가 표시되지만 ATA 또는 SCSI에 관한 오류 메시지나 재시도는 없습니다. 페도라 버그 1553979.


"버퍼" 부분은 페이지 캐시에 캐시될 수 있는 파일 데이터를 요청하는 동안 발생한다는 의미입니다. 역사적 이유로 사람들은 이러한 요청을 "버퍼링된 IO"라고 부르기도 합니다.

관련 정보