1분기

2024-5-23 • tag-icon

bash text-processing slurm infiniband

1분기

가장 먼저 해야 할 일: slurm이나 Infiniband에 대한 지식이 필요하지 않습니다. 이는 순전히 텍스트 처리 문제입니다.
두 번째 - 알아요ib2slurm- 코드가 어떤 방식으로든 손상되었으며 최신 버전이 아닐 가능성이 높습니다. - 매핑 파일의 존재 여부나 형식에 관계없이 실행될 때마다 코어 덤프가 생성됩니다.

ibnetdiscover의 출력을 양식당 37줄 청크로 줄일 수 있습니다.

Switch  36 "S-0002c90200423e70"     # "MF0;ibsw20:SX6036/U1" enhanced port 0 lid 3 lmc 0
[1] "H-0002c903000c26f2"[1](2c903000c26f3)      # "compute061 HCA-1" lid 49 4xQDR
[2] "H-0002c903000bf36e"[1](2c903000bf36f)      # "compute060 HCA-1" lid 1 4xQDR
[3] "H-0002c903000bf35a"[1](2c903000bf35b)      # "compute063 HCA-1" lid 28 4xQDR
[4] "H-0002c903000c2646"[1](2c903000c2647)      # "compute062 HCA-1" lid 25 4xQDR
[5] "H-0002c903000bf35e"[1](2c903000bf35f)      # "compute064 HCA-1" lid 31 4xQDR
[6] "H-0002c903000c26de"[1](2c903000c26df)      # "compute065 HCA-1" lid 47 4xQDR
[7] "S-0002c90200423e80"[31]        # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
[8] "S-0002c90200423e80"[32]        # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
[9] "S-0002c90200423e80"[33]        # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
[10]    "S-0002c90200423e80"[34]        # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
[11]    "S-0002c90200423e80"[35]        # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
[12]    "S-0002c90200423e80"[36]        # "Infiniscale-IV Mellanox Technologies" lid 6 4xQDR
[13]    "S-0002c90200423eb8"[35]        # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
[14]    "S-0002c90200423eb8"[36]        # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
[15]    "S-0002c90200423eb8"[33]        # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
[16]    "S-0002c90200423eb8"[34]        # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
[17]    "S-0002c90200423eb8"[31]        # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
[18]    "S-0002c90200423eb8"[32]        # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
[19]    "S-0002c90200423ee0"[31]        # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
[20]    "S-0002c90200423ee0"[32]        # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
[21]    "S-0002c90200423ee0"[33]        # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
[22]    "S-0002c90200423ee0"[34]        # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
[23]    "S-0002c90200423ee0"[35]        # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
[24]    "S-0002c90200423ee0"[36]        # "Infiniscale-IV Mellanox Technologies" lid 15 4xQDR
[25]    "H-0002c903000c26fa"[1](2c903000c26fb)      # "compute046 HCA-1" lid 112 4xQDR
[26]    "H-0002c903000c26e2"[1](2c903000c26e3)      # "compute047 HCA-1" lid 63 4xQDR
[27]    "H-0002c903000c263a"[1](2c903000c263b)      # "compute048 HCA-1" lid 59 4xQDR
[28]    "H-0002c903000c27c2"[1](2c903000c27c3)      # "compute049 HCA-1" lid 117 4xQDR
[29]    "H-0002c903000c27a6"[1](2c903000c27a7)      # "compute051 HCA-1" lid 34 4xQDR
[30]    "H-0002c903000c2732"[1](2c903000c2733)      # "compute050 HCA-1" lid 22 4xQDR
[31]    "H-0002c903000c265e"[1](2c903000c265f)      # "compute052 HCA-1" lid 29 4xQDR
[32]    "H-0002c903000c266a"[1](2c903000c266b)      # "compute055 HCA-1" lid 32 4xQDR
[33]    "H-0002c903000c264e"[1](2c903000c264f)      # "compute054 HCA-1" lid 26 4xQDR
[34]    "H-0002c903000c26ee"[1](2c903000c26ef)      # "compute056 HCA-1" lid 48 4xQDR
[35]    "H-0002c903000bf246"[1](2c903000bf247)      # "compute057 HCA-1" lid 33 4xQDR
[36]    "H-0002c903000c27ca"[1](2c903000c27cb)      # "compute053 HCA-1" lid 44 4xQDR

그리고 awk 또는 sed를 사용하여 Compute061과 같은 노드 이름을 추출할 수 있습니다.

스위치 이름으로 시작하고 그 뒤에 노드 이름이 오는 각 블록에 대해 하나의 행을 가져오고 싶습니다. 즉: ibsw20 compute061 compute060 compute063 compute062 compute064 compute065 compute046 compute047 compute048 compute049 compute051 compute050 compute052 compute055 compute054 compute056 compute057 compute053

scontrol show hostlist "<nodename> <nodename> ..."slurm을 사용하여 여러 노드를 단일 엔터티로 압축하여 slurm의 topology.conf 파일 형식으로 푸시할 계획입니다 .

SwitchName=ibsw20 Nodes=compute[046-057,060-061]

어떤 아이디어가 있나요?

모든 스위치 매핑이 완료된 후 ibnetdiscover 파일은 다음과 같은 형식으로 노드별로 스위치를 매핑하는 역순으로 진행됩니다.

vendid=0x2c9
devid=0x673c
sysimgguid=0x2c903000bf371
caguid=0x2c903000bf36e
Ca  1 "H-0002c903000bf36e"      # "compute060 HCA-1"
[1](2c903000bf36f)  "S-0002c90200423e70"[2]     # lid 1 lmc 0 "MF0;ibsw20:SX6036/U1" lid 3 4xQDR

각 블록은 빈 줄로 구분됩니다.

시작하기 위한 간단한 질문 - 여러 줄의 텍스트를 한 줄로 구문 분석하고, 각 줄의 다른 부분을 추출하고(머리글과 본문 줄을 다르게 처리) 관련 데이터가 포함되지 않은 줄을 삭제하는 방법은 무엇입니까?

편집: 이 블록은 가득 차지 않을 수 있습니다. 특정 스위치의 특정 포트에 연결된 것이 없으면 출력이 해당 라인을 건너뛰고 다음과 같은 결과가 발생할 수 있습니다.

Switch  36 "S-0002c90200423e70"     # "MF0;ibsw20:SX6036/U1" enhanced port 0 lid 3 lmc 0
[2] "H-0002c903000bf36e"[1](2c903000bf36f)      # "compute060 HCA-1" lid 1 4xQDR
[3] "H-0002c903000bf35a"[1](2c903000bf35b)      # "compute063 HCA-1" lid 28 4xQDR
[4] "H-0002c903000c2646"[1](2c903000c2647)      # "compute062 HCA-1" lid 25 4xQDR
[15]    "S-0002c90200423eb8"[33]        # "Infiniscale-IV Mellanox Technologies" lid 11 4xQDR
[33]    "H-0002c903000c264e"[1](2c903000c264f)      # "compute074 HCA-1" lid 26 4xQDR
[34]    "H-0002c903000c26ee"[1](2c903000c26ef)      # "compute076 HCA-1" lid 48 4xQDR

따라서 각 스위치 라인 뒤에 36개의 라인이 따라온다거나 [36]이 항상 스위치 블록의 마지막 라인이라고 믿을 수는 없습니다.

답변1

1분기

이 awk 명령은 다음을 가정하여 파일에서 고유한 컴퓨터 이름의 정렬된 목록을 추출합니다.

소스 파일은 훨씬 길며 각 스위치에 대한 줄 블록이 있습니다.
전체 스위치 블록을 정렬하고(스위치 행이 항상 각 스위치에 대한 연속 행 집합의 첫 번째 행이라고 가정) 중복 노드를 제거하는 스크립트는 다음과 같습니다.

awk -v FS='[#"]' '
    BEGIN{c=0}
    $1~/Switch/     {c++; j=0; split($5,arr,"[;:]" ); sw[c,0]=arr[2] }
    $1~/\[[0-9]+\]/ {     j++; split($5,arr," "    ); sw[c,j]=arr[1] }
    END {
            print("final count of switches=" c)
            for (i=1; i<=c; i++) {
                print( "switch=" i, sw[i,0] )     # show switch number.
                split("", out , ":" )             # delete array "out".
                split("", indices , ":" )         # delete array "indices".
                j=0
                while (sw[i,++j]) {               # for all array elements.
                    if (out[sw[i,j]]++ < 1) {     # Is it a new value?
                        indices[sw[i,j]]=j        # add to array "indices".
                    }
                }
                n=asorti(indices)                 # sort the keys of indices
                printf( "%s ", sw[i,0] )
                for (k=1; k<=n; k++) {            # all values for a switch.
                    printf( "%s ", indices[k] )
                }
                printf( "%s\n", "" )
            }
    }
    ' infile

결과:

final count of switches=3
switch=1 ibsw20
ibsw20 Infiniscale-IV compute060 compute061 compute062 compute063
compute064 compute065 compute066 compute067 compute068 compute069
compute070 compute071 compute072 compute073 compute074 compute075
compute076 compute077 
switch=2 ibsw21
ibsw21 Infiniscale-IV compute060 compute061 compute062 compute063
compute064 compute065 compute066 compute067 compute068 compute069
compute070 compute071 compute072 compute073 compute074 compute075
compute076 compute077 
switch=3 ibsw22
ibsw22 Infiniscale-IV compute060 compute062 compute063 compute074 
compute076

Infiniscale-IV를 제거해야 하는지, 그리고 다음을 얻기 위해 추가 처리를 요청하는지 잘 모르겠습니다.

SwitchName=ibsw20 Nodes=compute[060-077]

2분기

"man awk"에서:

RS가 빈 문자열로 설정된 경우 레코드는 빈 줄로 구분됩니다.

즉, "레코드 구분 기호"(RS)가 null로 설정됩니다.

awk -v RS='' 'script to process lines' file

답변2

이것은 기본적으로 slurm topology.conf 파일을 생성하도록 수정된 BinaryZebra의 답변입니다.한정된

ibnetdiscover | awk -v FS='[#"]' '
    BEGIN{c=0}
    $1~/Switch/     {c++; j=0; split($5,arr,"[;:]" ); sw[c,0]=arr[2] }
    $1~/\[[0-9]+\]/ && $2~/^H-/ {     j++; split($5,arr," "    ); sw[c,j]=arr[1] }
    END {
            # print("final count of switches=" c)
            for (i=1; i<=c; i++) {
                printf( "SwitchName=s" i, sw[i,0] )     # show switch number.
                split("", out , ":" )             # delete array "out".
                split("", indices , ":" )         # delete array "indices".
                j=0
                while (sw[i,++j]) {               # for all array elements.
                    if (out[sw[i,j]]++ < 1) {     # Is it a new value?
                        indices[sw[i,j]]=j        # add to array "indices".
                    }
                }
                n=asorti(indices)                 # sort the keys of indices
                # printf( "%s ", sw[i,0] )
                printf ( " Nodes=" )
                for (k=1; k<n; k++) {            # all values for a switch.
                    printf( "%s,", indices[k] )
                }
                printf( "%s\n", indices[n] )
            }
    }
    ' | sed -r '/Nodes=$/d' | awk '{sub(/[0-9]+/, ++i)}1; END{printf( "SwitchName=s%s Switches=s[1-%s]\n", NR+1, NR )}'

호스트 목록을 압축해야 하는 경우 각 Node= 행을 수정하면 됩니다 scontrol show hostlist. 수정된 최종 파이프라인은 다음과 같습니다.

| awk -F= '{sub(/[[:digit:]]+/, ++i) ; cmd= "scontrol show hostlist " $3 ; cmd | getline line ; printf( "%s=%s=%s\n" , $1, $2, line ) } END{printf( "SwitchName=s%s Switches=s[1-%s]\n", NR+1, NR )}'

관련 정보