Stdout을 CSV와 JSON의 혼합으로 구문 분석하는 방법은 무엇입니까?

Stdout을 CSV와 JSON의 혼합으로 구문 분석하는 방법은 무엇입니까?

저는 현재 자동 채점기에 코드를 제출하고 결과를 반환하는 클래스를 개발 중입니다. 반환되는 형식은 시각적으로 구문 분석하기가 약간 어렵기 때문에 파이프라인에서 사용할 수 있는 스크립트를 작성하여 읽기 쉽게 만들고 싶었습니다.

자동 채점기의 결과는 다음과 같습니다.

Problem,Correct?,Correct Answer,Agent's Answer
"Challenge Problem B-04",0,4,-1
"Basic Problem B-12",0,1,-1
"Challenge Problem B-05",0,6,-1
"Challenge Problem B-07",0,6,-1
"Challenge Problem B-06",0,3,-1
"Basic Problem B-11",0,1,-1
"Basic Problem B-10",0,3,-1
"Challenge Problem B-03",0,3,-1
"Challenge Problem B-02",0,1,-1
"Challenge Problem B-01",0,6,-1
"Challenge Problem B-09",0,4,-1
"Challenge Problem B-08",0,4,-1
"Basic Problem B-08",0,6,-1
"Basic Problem B-09",0,5,-1
"Basic Problem B-04",0,3,-1
"Basic Problem B-05",0,4,-1
"Basic Problem B-06",0,5,-1
"Basic Problem B-07",0,6,-1
"Basic Problem B-01",0,2,-1
"Basic Problem B-02",0,5,-1
"Basic Problem B-03",0,1,-1
"Challenge Problem B-10",0,4,-1
"Challenge Problem B-11",0,5,-1
"Challenge Problem B-12",0,1,-1
{
    "Basic Problems B": {
        "Incorrect": "0",
        "Skipped": "12",
        "Correct": "0",
        "Set": "Basic Problems B"
    },
    "Challenge Problems B": {
        "Incorrect": "0",
        "Skipped": "12",
        "Correct": "0",
        "Set": "Challenge Problems B"
    }
}

쉼표로 구분된 값과 JSON이 혼합되어 있습니다. 이 모든 것을 내가 읽을 수 있는 멋진 테이블에 담는 것이 좋을 것입니다.

현재 나는 다음과 같은 것을 가지고 있습니다

python submit.py --provider gt --assignment error-check | column -t -s, | less -S

어떤 출력:

{
    "Basic Problems B": {
        "Incorrect": "0",
        "Skipped": "12",
        "Correct": "0",
        "Set": "Basic Problems B"
    },
    "Challenge Problems B": {
        "Incorrect": "0",
        "Skipped": "12",
        "Correct": "0",
        "Set": "Challenge Problems B"
    }
}
Problem                   Correct?  Correct Answer  Agent's Answer
"Challenge Problem B-04"  0         4               -1
"Basic Problem B-12"      0         1               -1
"Challenge Problem B-05"  0         6               -1
"Challenge Problem B-07"  0         6               -1
"Challenge Problem B-06"  0         3               -1
"Basic Problem B-11"      0         1               -1
"Basic Problem B-10"      0         3               -1
"Challenge Problem B-03"  0         3               -1
"Challenge Problem B-02"  0         1               -1
"Challenge Problem B-01"  0         6               -1
"Challenge Problem B-09"  0         4               -1
"Challenge Problem B-08"  0         4               -1
"Basic Problem B-08"      0         6               -1
"Basic Problem B-09"      0         5               -1
"Basic Problem B-04"      0         3               -1
"Basic Problem B-05"      0         4               -1
"Basic Problem B-06"      0         5               -1
"Basic Problem B-07"      0         6               -1
"Basic Problem B-01"      0         2               -1
"Basic Problem B-02"      0         5               -1
"Basic Problem B-03"      0         1               -1
"Challenge Problem B-10"  0         4               -1
"Challenge Problem B-11"  0         5               -1
"Challenge Problem B-12"  0         1               -1

그것은 나를 대부분의 시간 동안 거기에 있게 했습니다. 이제 JSON을 처리할 수 있는 방법이 있는지 궁금합니다.

특정 라인 번호에서 출력을 분할할 수는 없지만 처음에는 찾을 수 있을 것 같습니다 {.

가능한 한 이 일을 덜 하여 반 친구들과 공유하고 싶었습니다. 따라서 의존도가 낮을수록 좋습니다.

외부 코드 사용을 제안하는 다른 JSON 구문 분석 게시물을 본 적이 있습니다.

이상적인 출력은 다음과 같습니다.

Problem                   Correct?  Correct Answer  Agent's Answer
"Challenge Problem B-04"  0         4               -1
"Basic Problem B-12"      0         1               -1
"Challenge Problem B-05"  0         6               -1
"Challenge Problem B-07"  0         6               -1
"Challenge Problem B-06"  0         3               -1
"Basic Problem B-11"      0         1               -1
"Basic Problem B-10"      0         3               -1
"Challenge Problem B-03"  0         3               -1
"Challenge Problem B-02"  0         1               -1
"Challenge Problem B-01"  0         6               -1
"Challenge Problem B-09"  0         4               -1
"Challenge Problem B-08"  0         4               -1
"Basic Problem B-08"      0         6               -1
"Basic Problem B-09"      0         5               -1
"Basic Problem B-04"      0         3               -1
"Basic Problem B-05"      0         4               -1
"Basic Problem B-06"      0         5               -1
"Basic Problem B-07"      0         6               -1
"Basic Problem B-01"      0         2               -1
"Basic Problem B-02"      0         5               -1
"Basic Problem B-03"      0         1               -1
"Challenge Problem B-10"  0         4               -1
"Challenge Problem B-11"  0         5               -1
"Challenge Problem B-12"  0         1               -1

Set                   Incorrect Skipped Correct
Basic Problems B      0         12      0
Challenge Problems B  0         12      0

답변1

JSON을 나머지 부분과 분리하는 것은 매우 쉽습니다. 이것은 JSON이 아닌 것을 제공합니다.

python submit.py --provider gt --assignment error-check | sed '/{/,$d' 

그리고 이것은 JSON뿐입니다.

python submit.py --provider gt --assignment error-check | sed -n '/{/,$p' 

이를 설명하기 위해 예제 입력을 다음과 같이 저장했습니다 file.

$ sed '/{/,$d' file
Problem,Correct?,Correct Answer,Agent's Answer
"Challenge Problem B-04",0,4,-1
"Basic Problem B-12",0,1,-1
"Challenge Problem B-05",0,6,-1
"Challenge Problem B-07",0,6,-1
"Challenge Problem B-06",0,3,-1
"Basic Problem B-11",0,1,-1
"Basic Problem B-10",0,3,-1
"Challenge Problem B-03",0,3,-1
"Challenge Problem B-02",0,1,-1
"Challenge Problem B-01",0,6,-1
"Challenge Problem B-09",0,4,-1
"Challenge Problem B-08",0,4,-1
"Basic Problem B-08",0,6,-1
"Basic Problem B-09",0,5,-1
"Basic Problem B-04",0,3,-1
"Basic Problem B-05",0,4,-1
"Basic Problem B-06",0,5,-1
"Basic Problem B-07",0,6,-1
"Basic Problem B-01",0,2,-1
"Basic Problem B-02",0,5,-1
"Basic Problem B-03",0,1,-1
"Challenge Problem B-10",0,4,-1
"Challenge Problem B-11",0,5,-1
"Challenge Problem B-12",0,1,-1

그리고

$ sed -n '/{/,$p' file
{
    "Basic Problems B": {
        "Incorrect": "0",
        "Skipped": "12",
        "Correct": "0",
        "Set": "Basic Problems B"
    },
    "Challenge Problems B": {
        "Incorrect": "0",
        "Skipped": "12",
        "Correct": "0",
        "Set": "Challenge Problems B"
    }
}

이제 비 JSON을 이미 잘 처리했으므로 변경하지 않겠습니다. 이상적으로 는 JSON 데이터를 구문 분석하기 위해 JSON 파서를 사용해야 합니다(예 jq: . jq최소한 원하는 작업을 수행합니다( 명령 cat file으로 바꾸십시오 python submit.py --provider gt --assignment error-check:

$ cat file | sed -n 's/[,"]//g; s/^ *//; /{/,$p'  | tac | awk -F': ' 'BEGIN{printf "%-30s%-10s%-10s%-10s\n", "Set", "Incorrect", "Skipped", "Correct"} NF==2 && !/\{/{if($1=="Set"){set=$2;data[set]["Incorrect"] = 0;data[set]["Skipped"] = 0;data[set]["Correct"] = 0;} data[set][$1]=$2}END{for(set in data){printf "%-30s%-10s%-10s%-10s\n", set,data[set]["Incorrect"],data[set]["Skipped"],data[set]["Correct"]}}' 
Set                           Incorrect Skipped   Correct   
Challenge Problems B          0         12        0         
Basic Problems B              0         12        0      

이 모든 것을 쉘 스크립트에 넣으면 다음이 제공됩니다.

#!/bin/bash

tmpFile=$(mktemp)
python submit.py --provider gt --assignment error-check > "$tmpFile";

sed '/{/,$d' "$tmpFile" | column -t -s, 
sed -n 's/[,"]//g; s/^ *//; /{/,$p' "$tmpFile" |
  tac |
  awk -F': ' '
    BEGIN{
      printf "%-30s%-10s%-10s%-10s\n", "Set", "Incorrect", "Skipped", "Correct"
    }
    NF==2 && !/\{/{
      if($1=="Set"){
         set=$2;
         data[set]["Incorrect"] = 0;
         data[set]["Skipped"] = 0;
         data[set]["Correct"] = 0;
      } 
      data[set][$1]=$2
    }
    END{
       for(set in data){
         printf "%-30s%-10s%-10s%-10s\n", set, 
                                     data[set]["Incorrect"], 
                                     data[set]["Skipped"], 
                                     data[set]["Correct"]}
    }' 
rm "$tmpFile"

다음과 같은 출력이 생성됩니다.

$ foo.sh
Problem                   Correct?  Correct Answer  Agent's Answer
"Challenge Problem B-04"  0         4               -1
"Basic Problem B-12"      0         1               -1
"Challenge Problem B-05"  0         6               -1
"Challenge Problem B-07"  0         6               -1
"Challenge Problem B-06"  0         3               -1
"Basic Problem B-11"      0         1               -1
"Basic Problem B-10"      0         3               -1
"Challenge Problem B-03"  0         3               -1
"Challenge Problem B-02"  0         1               -1
"Challenge Problem B-01"  0         6               -1
"Challenge Problem B-09"  0         4               -1
"Challenge Problem B-08"  0         4               -1
"Basic Problem B-08"      0         6               -1
"Basic Problem B-09"      0         5               -1
"Basic Problem B-04"      0         3               -1
"Basic Problem B-05"      0         4               -1
"Basic Problem B-06"      0         5               -1
"Basic Problem B-07"      0         6               -1
"Basic Problem B-01"      0         2               -1
"Basic Problem B-02"      0         5               -1
"Basic Problem B-03"      0         1               -1
"Challenge Problem B-10"  0         4               -1
"Challenge Problem B-11"  0         5               -1
"Challenge Problem B-12"  0         1               -1
Set                           Incorrect Skipped   Correct   
Challenge Problems B          0         12        0         
Basic Problems B              0         12        0         

해킹된 것처럼 느껴지더라도 누군가가 전용 JSON 파서를 사용하여 더 깔끔한 솔루션을 생각해 낼 수 있기를 바랍니다.


스틸 드라이버jq댓글에 올바른 솔루션이 제공되어 기쁘 므로 이를 포함하면 더 간단하고 안전한 솔루션을 얻을 수 있습니다.

#!/bin/bash

tmpFile=$(mktemp)
python submit.py --provider gt --assignment error-check > "$tmpFile";

sed '/{/,$d' "$tmpFile" | column -t -s, 
sed -n '/{/,$p' "$tmpFile" | 
  jq -r '["Set","Incorrect","Skipped","Correct"], (.[] | [.Set,.Incorrect,.Skipped,.Correct]) | @tsv'
 rm "$tmpFile"

답변2

밀러 사용(https://github.com/johnkerl/miller) 그리고 실행

# get the CSV and transform it into a pretty print table
<input grep -P '^("|\w)' | mlr --c2p cat >out
# add a carriage return
echo "" >> out
# convert the json into a pretty print table and add it to the output
<input grep -vP '^("|\w)'  | mlr --j2p cat -n then reshape -r "(Basi|Chal)" -o i,v \
then nest --explode --values --across-fields --nested-fs ":" -f i \
then reshape -s i_2,v \
then cut -x -f i_1,n \
then reorder -f Set >>out

당신은 할 것

Problem                Correct? Correct Answer Agent's Answer
Challenge Problem B-04 0        4              -1
Basic Problem B-12     0        1              -1
Challenge Problem B-05 0        6              -1
Challenge Problem B-07 0        6              -1
Challenge Problem B-06 0        3              -1
Basic Problem B-11     0        1              -1
Basic Problem B-10     0        3              -1
Challenge Problem B-03 0        3              -1
Challenge Problem B-02 0        1              -1
Challenge Problem B-01 0        6              -1
Challenge Problem B-09 0        4              -1
Challenge Problem B-08 0        4              -1
Basic Problem B-08     0        6              -1
Basic Problem B-09     0        5              -1
Basic Problem B-04     0        3              -1
Basic Problem B-05     0        4              -1
Basic Problem B-06     0        5              -1
Basic Problem B-07     0        6              -1
Basic Problem B-01     0        2              -1
Basic Problem B-02     0        5              -1
Basic Problem B-03     0        1              -1
Challenge Problem B-10 0        4              -1
Challenge Problem B-11 0        5              -1
Challenge Problem B-12 0        1              -1

Set                  Incorrect Skipped Correct
Basic Problems B     0         12      0
Challenge Problems B 0         12      0

답변3

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
/{/ { FS="(^|\":)[[:space:]]+\"|\",?" }
FS == "," { $1=$1; print; next }
{ f[$2] = $3 }
/}/ {
    if ( !doneHdr++ ) {
        print "Set", "Incorrect", "Skipped", "Correct"
    }
    print f["Set"], f["Incorrect"], f["Skipped"], f["Correct"]
}

.

$ awk -f tst.awk file | column -s$'\t' -t
Problem                   Correct?   Correct Answer  Agent's Answer
"Challenge Problem B-04"  0          4               -1
"Basic Problem B-12"      0          1               -1
"Challenge Problem B-05"  0          6               -1
"Challenge Problem B-07"  0          6               -1
"Challenge Problem B-06"  0          3               -1
"Basic Problem B-11"      0          1               -1
"Basic Problem B-10"      0          3               -1
"Challenge Problem B-03"  0          3               -1
"Challenge Problem B-02"  0          1               -1
"Challenge Problem B-01"  0          6               -1
"Challenge Problem B-09"  0          4               -1
"Challenge Problem B-08"  0          4               -1
"Basic Problem B-08"      0          6               -1
"Basic Problem B-09"      0          5               -1
"Basic Problem B-04"      0          3               -1
"Basic Problem B-05"      0          4               -1
"Basic Problem B-06"      0          5               -1
"Basic Problem B-07"      0          6               -1
"Basic Problem B-01"      0          2               -1
"Basic Problem B-02"      0          5               -1
"Basic Problem B-03"      0          1               -1
"Challenge Problem B-10"  0          4               -1
"Challenge Problem B-11"  0          5               -1
"Challenge Problem B-12"  0          1               -1
Set                       Incorrect  Skipped         Correct
Basic Problems B          0          12              0
Challenge Problems B      0          12              0
Challenge Problems B      0          12              0

관련 정보