기존 서버와 동일한 스토리지 구성으로 새 파일 서버를 설정하려고 하는데 프로세스가 실패하고 이유를 알 수 없습니다. 내 목표는 RAID 10 볼륨 위에 TrueCrypt 볼륨을 만드는 것입니다. 그러나 부팅하면 truecrypt -c
RAID 볼륨이 손상됩니다. 이전 서버에서도 동일한 프로세스가 작동했기 때문에 무슨 일이 일어나고 있는지 잘 모르겠습니다.
내 프로그램:
# create a data partition on each disk (/dev/sdb, /dev/sdc, /dev/sdd, /dev/sde):
fdisk /dev/sdX
new, p, 1, 4096, 2930273071, type, da, write
# combine data partitions into raid10 array:
mdadm --create /dev/md0 -v --raid-devices=4 --chunk=512 --level=raid10 --layout=f2 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
# create a truecrypt volume on the new data partition /dev/md0:
truecrypt -c /dev/md0
truecrypt가 시작된 직후 mdadm에서 하나 이상의 디스크에 대한 구성 요소 오류가 발생합니다.
$ cat /proc/mdstat; echo; mdadm --misc --detail /dev/md0
Personalities : [raid10]
md0 : active raid10 sdd1[4] sdc1[1] sde1[3] sdb1[0](F)
2930006016 blocks super 1.2 512K chunks 2 far-copies [4/3] [_UUU]
unused devices: <none>
/dev/md0:
Version : 1.2
Creation Time : Fri Sep 21 14:27:31 2012
Raid Level : raid10
Array Size : 2930006016 (2794.27 GiB 3000.33 GB)
Used Dev Size : 1465003008 (1397.14 GiB 1500.16 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Tue Sep 25 10:54:06 2012
State : active, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Layout : far=2
Chunk Size : 512K
Name : emma:0 (local to host emma)
UUID : 21c2f9b7:923dacab:805375f8:96a2959b
Events : 33268
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 33 1 active sync /dev/sdc1
4 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
0 8 17 - faulty spare /dev/sdb1
dmesg는 다음과 같은 오류 메시지를 표시했습니다.
[326876.652057] ata3.00: status: { DRDY ERR }
[326876.652801] ata3.00: error: { ABRT }
[326876.653543] ata3.00: failed command: WRITE FPDMA QUEUED
[326876.654301] ata3.00: cmd 61/80:f0:80:f6:58/00:00:57:00:00/40 tag 30 ncq 65536 out
[326876.654301] res 41/04:00:00:00:00/04:00:00:00:00/00 Emask 0x1 (device error)
[326876.655812] ata3.00: status: { DRDY ERR }
[326876.656563] ata3.00: error: { ABRT }
[326876.657326] ata3: hard resetting link
[326876.657328] ata3: nv: skipping hardreset on occupied port
[326877.124117] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[326877.138346] ata3.00: configured for UDMA/133
[326877.138397] sd 2:0:0:0: [sdb]
[326877.138399] Result: hostbyte=0x00 driverbyte=0x08
[326877.138402] sd 2:0:0:0: [sdb]
[326877.138404] Sense Key : 0xb [current] [descriptor]
[326877.138408] Descriptor sense data with sense descriptors (in hex):
[326877.138411] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326877.138423] 00 00 00 00
[326877.138428] sd 2:0:0:0: [sdb]
[326877.138430] ASC=0x0 ASCQ=0x0
[326877.138434] sd 2:0:0:0: [sdb] CDB:
[326877.138435] cdb[0]=0x2a: 2a 00 57 58 f0 80 00 00 80 00
[326877.138446] end_request: I/O error, dev sdb, sector 1465446528
[326877.138844] sd 2:0:0:0: [sdb]
[326877.138846] Result: hostbyte=0x00 driverbyte=0x08
[326877.138847] sd 2:0:0:0: [sdb]
[326877.138849] Sense Key : 0xb [current] [descriptor]
[326877.138851] Descriptor sense data with sense descriptors (in hex):
[326877.138852] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326877.138860] 00 00 00 00
[326877.138864] sd 2:0:0:0: [sdb]
[326877.138865] ASC=0x0 ASCQ=0x0
[326877.138867] sd 2:0:0:0: [sdb] CDB:
[326877.138868] cdb[0]=0x2a: 2a 00 57 58 f1 00 00 00 80 00
[326877.138875] end_request: I/O error, dev sdb, sector 1465446656
[326877.139208] sd 2:0:0:0: [sdb]
[326877.139210] Result: hostbyte=0x00 driverbyte=0x08
[326877.139212] sd 2:0:0:0: [sdb]
[326877.139213] Sense Key : 0xb [current] [descriptor]
[326877.139215] Descriptor sense data with sense descriptors (in hex):
[326877.139217] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326877.139224] 00 00 00 00
...
[326877.155726] sd 2:0:0:0: [sdb]
[326877.155727] ASC=0x0 ASCQ=0x0
[326877.155729] sd 2:0:0:0: [sdb] CDB:
[326877.155730] cdb[0]=0x2a: 2a 00 57 58 f6 80 00 00 80 00
[326877.155736] end_request: I/O error, dev sdb, sector 1465448064
[326877.155987] ata3: EH complete
[326877.281684] md/raid10:md0: Disk failure on sdb1, disabling device.
[326877.281684] md/raid10:md0: Operation continuing on 3 devices.
[326877.801033] RAID10 conf printout:
[326877.801038] --- wd:3 rd:4
[326877.801040] disk 0, wo:1, o:0, dev:sdb1
[326877.801042] disk 1, wo:0, o:1, dev:sdc1
[326877.801044] disk 2, wo:0, o:1, dev:sdd1
[326877.801046] disk 3, wo:0, o:1, dev:sde1
[326877.801071] RAID10 conf printout:
[326877.801074] --- wd:3 rd:4
[326877.801076] disk 1, wo:0, o:1, dev:sdc1
[326877.801078] disk 2, wo:0, o:1, dev:sdd1
[326877.801079] disk 3, wo:0, o:1, dev:sde1
[326899.233166] ata4: EH in SWNCQ mode,QC:qc_active 0x7 sactive 0x7
[326899.233384] ata4: SWNCQ:qc_active 0x1 defer_bits 0x6 last_issue_tag 0x0
[326899.233384] dhfis 0x1 dmafis 0x0 sdbfis 0x0
[326899.233643] ata4: ATA_REG 0x41 ERR_REG 0x4
[326899.233775] ata4: tag : dhfis dmafis sdbfis sactive
[326899.234078] ata4: tag 0x0: 1 0 0 1
[326899.234458] ata4.00: exception Emask 0x1 SAct 0x7 SErr 0x0 action 0x6 frozen
[326899.234843] ata4.00: Ata error. fis:0x41
[326899.235230] ata4.00: failed command: WRITE FPDMA QUEUED
[326899.235617] ata4.00: cmd 61/80:00:80:e0:5b/00:00:57:00:00/40 tag 0 ncq 65536 out
[326899.235617] res 41/04:00:00:00:00/04:00:00:00:00/00 Emask 0x1 (device error)
[326899.236423] ata4.00: status: { DRDY ERR }
[326899.236818] ata4.00: error: { ABRT }
[326899.237200] ata4.00: failed command: WRITE FPDMA QUEUED
[326899.237609] ata4.00: cmd 61/80:08:00:e1:5b/00:00:57:00:00/40 tag 1 ncq 65536 out
[326899.237609] res 41/04:00:00:00:00/04:00:00:00:00/00 Emask 0x1 (device error)
[326899.238428] ata4.00: status: { DRDY ERR }
[326899.238865] ata4.00: error: { ABRT }
[326899.239288] ata4.00: failed command: WRITE FPDMA QUEUED
[326899.239730] ata4.00: cmd 61/80:10:80:e1:5b/00:00:57:00:00/40 tag 2 ncq 65536 out
[326899.239730] res 41/04:00:00:00:00/04:00:00:00:00/00 Emask 0x1 (device error)
[326899.240682] ata4.00: status: { DRDY ERR }
[326899.241162] ata4.00: error: { ABRT }
[326899.241653] ata4: hard resetting link
[326899.241654] ata4: nv: skipping hardreset on occupied port
[326899.760685] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[326899.774644] ata4.00: configured for UDMA/133
[326899.774695] sd 3:0:0:0: [sdc]
[326899.774698] Result: hostbyte=0x00 driverbyte=0x08
[326899.774700] sd 3:0:0:0: [sdc]
[326899.774702] Sense Key : 0xb [current] [descriptor]
[326899.774707] Descriptor sense data with sense descriptors (in hex):
[326899.774709] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326899.774721] 00 00 00 00
[326899.774727] sd 3:0:0:0: [sdc]
[326899.774728] ASC=0x0 ASCQ=0x0
[326899.774732] sd 3:0:0:0: [sdc] CDB:
[326899.774734] cdb[0]=0x2a: 2a 00 57 5b e0 80 00 00 80 00
[326899.774744] end_request: I/O error, dev sdc, sector 1465639040
[326899.775097] sd 3:0:0:0: [sdc]
[326899.775098] Result: hostbyte=0x00 driverbyte=0x08
[326899.775100] sd 3:0:0:0: [sdc]
[326899.775102] Sense Key : 0xb [current] [descriptor]
[326899.775104] Descriptor sense data with sense descriptors (in hex):
[326899.775105] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326899.775113] 00 00 00 00
[326899.775117] sd 3:0:0:0: [sdc]
[326899.775118] ASC=0x0 ASCQ=0x0
[326899.775120] sd 3:0:0:0: [sdc] CDB:
[326899.775121] cdb[0]=0x2a: 2a 00 57 5b e1 00 00 00 80 00
[326899.775128] end_request: I/O error, dev sdc, sector 1465639168
[326899.775404] sd 3:0:0:0: [sdc]
[326899.775405] Result: hostbyte=0x00 driverbyte=0x08
[326899.775407] sd 3:0:0:0: [sdc]
[326899.775408] Sense Key : 0xb [current] [descriptor]
[326899.775410] Descriptor sense data with sense descriptors (in hex):
[326899.775412] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326899.775412] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326899.775420] 00 00 00 00
[326899.775423] sd 3:0:0:0: [sdc]
[326899.775424] ASC=0x0 ASCQ=0x0
[326899.775427] sd 3:0:0:0: [sdc] CDB:
[326899.775428] cdb[0]=0x2a: 2a 00 57 5b e1 80 00 00 80 00
[326899.775434] end_request: I/O error, dev sdc, sector 1465639296
[326899.775691] ata4: EH complete
[326899.830768] Buffer I/O error on device md0p1, logical block 1474688
[326899.830965] lost page write due to I/O error on md0p1
[326899.831257] Buffer I/O error on device md0p1, logical block 1474689
[326899.831419] lost page write due to I/O error on md0p1
[326899.831424] Buffer I/O error on device md0p1, logical block 1474690
[326899.831585] lost page write due to I/O error on md0p1
[326899.831589] Buffer I/O error on device md0p1, logical block 1474691
[326899.831751] lost page write due to I/O error on md0p1
a) smartd는 디스크에 어떤 문제도 발견하지 못하고 b) 각 디스크에 개별적으로 전체 TrueCrypt 볼륨을 생성할 수 있기 때문에 이는 실제 디스크 오류가 아닙니다.
또한 /dev/md0(83/Linux 및 da/Non-FS 데이터)에 파티션을 생성한 다음 /dev/md0p1 파티션(이 dmesg 출력의 출처)에 TrueCrypt 볼륨을 생성하려고 시도했지만 작동하지 않았습니다. 알았어, 나도.
TrueCrypt가 어떻게든 mdadm의 중요한 메타데이터를 손상시키고 있다고 가정합니다. 그런데 이상하게도 이 프로그램은 이전에는 잘 작동했습니다. 여기서 무슨 일이 일어나고 있는 걸까요?
[root@emma]# uname -a
Linux emma 3.5.4-1-ARCH #1 SMP PREEMPT Sat Sep 15 08:12:04 CEST 2012 x86_64 GNU/Linux
[root@emma]# mdadm --version
mdadm - v3.2.5 - 18th May 2012
[root@emma]# truecrypt --version
TrueCrypt 7.1a
[root@emma]# fdisk -l
Disk /dev/sda: 160.0 GB, 160040803840 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312579695 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe256e256
Device Boot Start End Blocks Id System
/dev/sda1 * 63 208844 104391 83 Linux
/dev/sda2 208845 738989 265072+ 82 Linux swap / Solaris
/dev/sda3 738990 62187614 30724312+ 83 Linux
/dev/sda4 62187615 312579694 125196040 83 Linux
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
16 heads, 62 sectors/track, 2953908 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xcbb904fc
Device Boot Start End Blocks Id System
/dev/sdb1 4096 2930273071 1465134488 da Non-FS data
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
16 heads, 62 sectors/track, 2953908 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x6978c214
Device Boot Start End Blocks Id System
/dev/sdc1 4096 2930273071 1465134488 da Non-FS data
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
16 heads, 62 sectors/track, 2953908 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x8dd1e314
Device Boot Start End Blocks Id System
/dev/sdd1 4096 2930273071 1465134488 da Non-FS data
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
16 heads, 62 sectors/track, 2953908 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x70b7ece7
Device Boot Start End Blocks Id System
/dev/sde1 4096 2930273071 1465134488 da Non-FS data
Disk /dev/md0: 3000.3 GB, 3000326160384 bytes
2 heads, 3 sectors/track, 976668672 cylinders, total 5860012032 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 2097152 bytes
Disk identifier: 0xce6d6f88
Device Boot Start End Blocks Id System
/dev/md0p1 4096 4294967294 2147481599+ 83 Linux
편집: 이 프로세스가 작동하면 아마도 TrueCrypt 6을 사용하고 있을 것입니다. 6을 사용해 보고 어떤 일이 일어나는지 살펴보겠습니다. 결과를 업데이트하겠습니다.
답변1
이러한 오류 메시지는 메타데이터 손상이 아닌 디스크 오류처럼 읽혀집니다. mdraid가 아닌 libata에서 왔습니다.
실제 상황에서는 문제가 없을 수도 있습니다.디스크하지만. 예를 들어 SATA 드라이버 결함, SATA 컨트롤러 결함, 커넥터 손상, 케이블 손상 등이 원인일 수 있습니다.
다양한 I/O 모드로 인해 mdraid 배열을 생성할 때만 볼 수 있습니다. 하지만 다른 것이 작동하더라도 실제로 드라이버나 하드웨어가 불안정하기 때문에 안정적이지 않을 것이라고 확신합니다.
그런데: 뭐라고 말했습니까 smartctl -x
? smartctl -a
SATA 오류 카운터가 있나요?