awk 스크립트의 조건인 경우

Question 1

다음은 귀하가 말할 때 any of the size/date/repo-name/repo-path has no value의미하는 바를 가정합니다. 예를 들어, 일부 블록에는 repo-name=선이 전혀 없습니다 .repo-name=

awk를 사용하여 실제로 원하는 것을 달성하고 column최종 열 간격을 설정하는 방법은 다음과 같습니다.

$ cat tst.sh
#!/usr/bin/env bash

awk '
BEGIN { OFS="\t" }
{
    sub(/^@/,"")                  # instead of `| tr -d @`
    ++numTags
    tag = val = $0
    sub(/ *=.*/,"",tag)
    sub(/[^=]+= */,"",val)
    tags[numTags] = tag
    vals[numTags] = val
}
numTags == 4 {
    if ( !doneHdr++ ) {
        for ( i=1; i<=numTags; i++ ) {
            tag = ( tags[i] == "date" ? "creationTime" : tags[i] )  # instead of `| sed s/date/creationTime/`
            printf "%s%s", tag, (i<numTags ? OFS : ORS)
        }
    }
    vals[3] = substr(vals[3],1,10)     # instead of `| awk {$3=substr($3,0,10}1`
    for ( i=1; i<=numTags; i++ ) {
        val = ( vals[i] == "" ? 0 : vals[i] )
        printf "%s%s", val, (i<numTags ? OFS : ORS)
    }
    numTags = 0
}
' "${@:--}" |
column -s$'\t' -t

$ cat file
size=190000
date=1603278566981
repo-name=testupload
repo-path=
size=140000
date=1603278566981
repo-name=
repo-path=/home/test/testupload2
size=
date=1603278566981
repo-name=testupload3
repo-path=/home/test/testupload3

$ ./tst.sh file
size    creationTime   repo-name    repo-path
190000  1603278566981  testupload   0
140000  1603278566981  0            /home/test/testupload2
0       1603278566981  testupload3  /home/test/testupload3

기존 코드 변경 사항:

awk더 이상 전체 파일을 한 번에 메모리로 읽어올 필요가 없습니다. column나는 이것이 간격을 찾는 데 필요하다고 생각합니다 . 그렇지 않으면 columnawk는 출력하기 전에 2단계 방법을 사용하여 각 열의 필드 최대 길이 printf와 최대 필드 너비를 파악하기 때문에 모든 입력을 메모리로 읽어야 합니다.
더 이상 데이터의 값에 의존하지 않으며( 현재 sed 파이프를 사용하여 실행하는 헤더 행에 date매핑을 추가한 것을 제외하고 creationTime) 한 번에 4개의 데이터 행만 필요합니다. 이것이 더 유용하다면 특정 태그 행에 대한 클릭을 트리거하도록 쉽게 변경할 수 있습니다. 예를 들어 numTags == 4로 변경하면 됩니다 tag == "repo-path".
sed추가 파이프와 명령이 필요하지 않을 뿐만 아니라 입력에 문자열이 포함되어 있으면 중단되기 때문에 더 이상 열 헤더를 파이프하지 않습니다.datecreatingTimedaterepo-path=/home/date/uploadX
예를 들어 =입력에 가 포함되어 있으면 실패하므로 더 이상 FS 값으로 사용되지 않습니다.=repo-path=/home/foo=bar/uploadX
데이터에서 모든 s를 제거 하려면 출력을 파이핑하는 대신 @사용하는 것이지만 실제로는 헤더 이름(태그)에 대해서만 이 작업을 수행하고 싶은 것 같습니다. 그렇지 않으면 어느 쪽이든 가능합니다. 예를 들어 s가 포함된 데이터를 깨뜨릴 수 있으므로 태그 시작 부분에 s를 포함하고 제거했습니다.gsub(/@/,"")tr -d @@repo-path=/home/foo@bar/uploadXsub(/^@/,"")@
세 번째 필드를 10자로 자르려면 두 번째 awk 스크립트에 파이프를 추가하는 대신 substr(vals[3],1,10)인쇄하는 루프 이전에 이를 수행하는 방법이 있으므로 이를 포함합니다. vals[]그런데 두 번째 인수는 arg 가 아닌 arg 로 substr()시작합니다 .10

Answer

다음은 귀하가 말할 때 any of the size/date/repo-name/repo-path has no value의미하는 바를 가정합니다. 예를 들어, 일부 블록에는 repo-name=선이 전혀 없습니다 .repo-name=

awk를 사용하여 실제로 원하는 것을 달성하고 column최종 열 간격을 설정하는 방법은 다음과 같습니다.

$ cat tst.sh
#!/usr/bin/env bash

awk '
BEGIN { OFS="\t" }
{
    sub(/^@/,"")                  # instead of `| tr -d @`
    ++numTags
    tag = val = $0
    sub(/ *=.*/,"",tag)
    sub(/[^=]+= */,"",val)
    tags[numTags] = tag
    vals[numTags] = val
}
numTags == 4 {
    if ( !doneHdr++ ) {
        for ( i=1; i<=numTags; i++ ) {
            tag = ( tags[i] == "date" ? "creationTime" : tags[i] )  # instead of `| sed s/date/creationTime/`
            printf "%s%s", tag, (i<numTags ? OFS : ORS)
        }
    }
    vals[3] = substr(vals[3],1,10)     # instead of `| awk {$3=substr($3,0,10}1`
    for ( i=1; i<=numTags; i++ ) {
        val = ( vals[i] == "" ? 0 : vals[i] )
        printf "%s%s", val, (i<numTags ? OFS : ORS)
    }
    numTags = 0
}
' "${@:--}" |
column -s$'\t' -t

$ cat file
size=190000
date=1603278566981
repo-name=testupload
repo-path=
size=140000
date=1603278566981
repo-name=
repo-path=/home/test/testupload2
size=
date=1603278566981
repo-name=testupload3
repo-path=/home/test/testupload3

$ ./tst.sh file
size    creationTime   repo-name    repo-path
190000  1603278566981  testupload   0
140000  1603278566981  0            /home/test/testupload2
0       1603278566981  testupload3  /home/test/testupload3

기존 코드 변경 사항:

awk더 이상 전체 파일을 한 번에 메모리로 읽어올 필요가 없습니다. column나는 이것이 간격을 찾는 데 필요하다고 생각합니다 . 그렇지 않으면 columnawk는 출력하기 전에 2단계 방법을 사용하여 각 열의 필드 최대 길이 printf와 최대 필드 너비를 파악하기 때문에 모든 입력을 메모리로 읽어야 합니다.
더 이상 데이터의 값에 의존하지 않으며( 현재 sed 파이프를 사용하여 실행하는 헤더 행에 date매핑을 추가한 것을 제외하고 creationTime) 한 번에 4개의 데이터 행만 필요합니다. 이것이 더 유용하다면 특정 태그 행에 대한 클릭을 트리거하도록 쉽게 변경할 수 있습니다. 예를 들어 numTags == 4로 변경하면 됩니다 tag == "repo-path".
sed추가 파이프와 명령이 필요하지 않을 뿐만 아니라 입력에 문자열이 포함되어 있으면 중단되기 때문에 더 이상 열 헤더를 파이프하지 않습니다.datecreatingTimedaterepo-path=/home/date/uploadX
예를 들어 =입력에 가 포함되어 있으면 실패하므로 더 이상 FS 값으로 사용되지 않습니다.=repo-path=/home/foo=bar/uploadX
데이터에서 모든 s를 제거 하려면 출력을 파이핑하는 대신 @사용하는 것이지만 실제로는 헤더 이름(태그)에 대해서만 이 작업을 수행하고 싶은 것 같습니다. 그렇지 않으면 어느 쪽이든 가능합니다. 예를 들어 s가 포함된 데이터를 깨뜨릴 수 있으므로 태그 시작 부분에 s를 포함하고 제거했습니다.gsub(/@/,"")tr -d @@repo-path=/home/foo@bar/uploadXsub(/^@/,"")@
세 번째 필드를 10자로 자르려면 두 번째 awk 스크립트에 파이프를 추가하는 대신 substr(vals[3],1,10)인쇄하는 루프 이전에 이를 수행하는 방법이 있으므로 이를 포함합니다. vals[]그런데 두 번째 인수는 arg 가 아닌 arg 로 substr()시작합니다 .10

Question 2

마지막 필드가 비어 있으면 다음을 사용하여 0으로 설정할 수 있습니다.

if ($NF == "") $NF = 0

그래서 당신은 다음과 같은 것을 얻을 것입니다

/^@repo-name/ {
  if (++count2 == 1) header = header OFS $1 ","
  if ($NF == "") $NF = 0

  repoNameArr[count] = $NF
  next
}

또는 코드 중복을 방지하려면

$NF == "" { $NF = 0 }

# ...

/^@repo-name/ {
  if (++count2 == 1) header = header OFS $1 ","
  repoNameArr[count] = $NF
  next
}

(데이터에 일치하는 행이 없습니다 ^@repo-name.)

이 경우에는 아마도 더 간단한 접근 방식을 택할 것입니다. 각 레코드가 항상 4개의 행이라고 가정하면 다음을 사용하여 데이터를 탭으로 구분된 4개의 열로 다시 정렬할 수 있습니다 paste.

$ cat file
size=
date=1603278566981
repo-name=testupload
repo-path=/home/test/testupload
size=140000
date=
repo-name=testupload2
repo-path=/home/test/testupload2
size=170000
date=1603278566981
repo-name=
repo-path=/home/test/testupload3
size=170000
date=1603278566981
repo-name=testupload3
repo-path=/home/test/testupload3

$ paste - - - - <file
size=   date=1603278566981      repo-name=testupload    repo-path=/home/test/testupload
size=140000     date=   repo-name=testupload2   repo-path=/home/test/testupload2
size=170000     date=1603278566981      repo-name=      repo-path=/home/test/testupload3
size=170000     date=1603278566981      repo-name=testupload3   repo-path=/home/test/testupload3

그런 다음 다음 방법을 사용하여 CSV로 변환할 수 있습니다 mlr(밀러):

$ paste - - - - <file | mlr --ifs tab --ocsv cat
size,date,repo-name,repo-path
,1603278566981,testupload,/home/test/testupload
140000,,testupload2,/home/test/testupload2
170000,1603278566981,,/home/test/testupload3
170000,1603278566981,testupload3,/home/test/testupload3

mlr누락된 값을 0으로 바꿀 수도 있습니다 .

$ paste - - - - <file | mlr --ifs tab --ocsv put 'for (k,v in $*) { is_null(v) { $[k] = 0 } }'
size,date,repo-name,repo-path
0,1603278566981,testupload,/home/test/testupload
140000,0,testupload2,/home/test/testupload2
170000,1603278566981,0,/home/test/testupload3
170000,1603278566981,testupload3,/home/test/testupload3

CSV 대신 탭 구분 값(TSV)을 사용하려면 , JSON 또는 "예쁜 인쇄" 표 형식 출력을 얻는 데 필요한 모든 것을 --otsv사용할 --ocsv수 있습니다 .--opprint--ojson

위의 내용은 입력 데이터가 질문의 데이터와 유사하다고 가정합니다. 질문의 데이터가 구조화된 데이터 형식(예: XML 또는 JSON)의 일부 데이터를 처리한 변형인 경우 원본 데이터를 직접 사용하는 것이 좋습니다.

Answer