ceph Storage MDS에서 메타데이터 IO가 느리다고 보고함

ceph Storage MDS에서 메타데이터 IO가 느리다고 보고함

연구실에서 ceph Storage를 사용하고 있고 서버도 있어서 MON, OSD, MDS 등과 같은 모든 서비스를 단일 시스템에 설치하려고 합니다.

loopdevice를 이용하여 디스크 2개를 만들었습니다. (서버에 SSD 디스크가 있어서 속도가 매우 좋습니다.)

root@ceph2# losetup -a
/dev/loop1: [64769]:26869770 (/root/100G-2.img)
/dev/loop0: [64769]:26869769 (/root/100G-1.img)

이것이 내 ceph -s출력의 모습입니다

root@ceph2# ceph -s
  cluster:
    id:     1106ae5c-e5bf-4316-8185-3e559d246ac5
    health: HEALTH_WARN
            1 MDSs report slow metadata IOs
            Reduced data availability: 65 pgs inactive
            Degraded data redundancy: 65 pgs undersized

  services:
    mon: 1 daemons, quorum ceph2 (age 8m)
    mgr: ceph2(active, since 9m)
    mds: 1/1 daemons up
    osd: 2 osds: 2 up (since 20m), 2 in (since 38m)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 0 objects, 0 B
    usage:   11 MiB used, 198 GiB / 198 GiB avail
    pgs:     100.000% pgs not active
             65 undersized+peered

MDS 느린 IO 오류가 어디서 발생하는지 모르고 mds 통계가 생성 상태로 유지됩니다.

root@ceph2# ceph mds stat
cephfs:1 {0=ceph2=up:creating}

건강 세부정보는 다음과 같습니다.

root@ceph2# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 65 pgs inactive; Degraded data redundancy: 65 pgs undersized
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
    mds.ceph2(mds.0): 31 slow metadata IOs are blocked > 30 secs, oldest blocked for 864 secs
[WRN] PG_AVAILABILITY: Reduced data availability: 65 pgs inactive
    pg 1.0 is stuck inactive for 22m, current state undersized+peered, last acting [1]
    pg 2.0 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.2 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.3 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.4 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.5 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.6 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.7 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.8 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.c is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.d is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.e is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.f is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.10 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.11 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.12 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.13 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.14 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.15 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.16 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.17 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.18 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.19 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1a is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1b is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.0 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.2 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.3 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.4 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.5 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.6 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.7 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.9 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.c is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.d is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.e is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.f is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.10 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.11 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.12 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.13 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.14 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.15 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.16 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.17 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.18 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.19 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1a is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1b is stuck inactive for 14m, current state undersized+peered, last acting [0]
[WRN] PG_DEGRADED: Degraded data redundancy: 65 pgs undersized
    pg 1.0 is stuck undersized for 22m, current state undersized+peered, last acting [1]
    pg 2.0 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.2 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.3 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.4 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.5 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.6 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.7 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.8 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.c is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.d is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.e is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.f is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.10 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.11 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.12 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.13 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.14 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.15 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.16 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.17 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.18 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.19 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1a is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1b is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.0 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.2 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.3 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.4 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.5 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.6 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.7 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.9 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.c is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.d is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.e is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.f is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.10 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.11 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.12 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.13 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.14 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.15 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.16 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.17 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.18 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.19 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1a is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1b is stuck undersized for 14m, current state undersized+peered, last acting [0]

여기서 무엇이 잘못될 수 있나요? 서버가 1개이고 OSD가 2개밖에 없기 때문이라고 생각하시나요?

답변1

MDS는 어떤 PG에도 연결할 수 없고 모든 PG가 "비활성" 상태이므로 메타데이터 보고 속도가 느립니다. PG를 실행하면 결국 경고가 사라집니다. 풀당 기본 압축 규칙 크기는 3이며, OSD가 2개만 있는 경우에는 절대 달성할 수 없습니다. 또한 OSD가 호스트가 아닌 스매시 오류 도메인이 되도록 osd_crush_chooseleaf_type이 값을 0으로 변경 . 그런 다음 모든 PG가 두 OSD에 모두 맞도록 풀 크기를 2로 변경해야 합니다. 그러나 풀 크기 2는 테스트 목적으로만 사용되며 데이터를 중요하게 생각하지 않는 경우 프로덕션 용도로 권장되지 않습니다.

관련 정보