데이터를 테이블 형식으로 지정

Question 1

모든 Unix 시스템의 모든 쉘에서 awk를 사용하십시오.

$ cat tst.awk
BEGIN {
    numTags = split("Name City Age Couse",nums2tags)
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        tag = nums2tags[tagNr]
        tags2nums[tag] = tagNr
        wids[tagNr] = ( length(tag) > length("null") ? length(tag) : length("null") )
    }
    OFS=" | "
}
(NR==1) || (prevTag=="Couse") {
    numRecs++
}
{
    gsub(/^"|"$/,"")
    tag = val = $0
    sub(/".*/,"",tag)
    sub(/[^"]+":"/,"",val)

    tagNr = tags2nums[tag]
    vals[numRecs,tagNr] = val

    wid = length(val)
    wids[tagNr] = ( wid > wids[tagNr] ? wid : wids[tagNr] )

    prevTag = tag
}
END {
    # Uncomment these 3 lines if youd like a header line printed:
    # for (tagNr=1; tagNr<=numTags; tagNr++) {
    #   printf "%-*s%s", wids[tagNr], nums2tags[tagNr], (tagNr<numTags ? OFS : ORS)
    # }

    for (recNr=1; recNr<=numRecs; recNr++) {
        for (tagNr=1; tagNr<=numTags; tagNr++) {
            val = ( (recNr,tagNr) in vals ? vals[recNr,tagNr] : "null" )
            printf "%-*s%s", wids[tagNr], val, (tagNr<numTags ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
asxadadad  ,aaf dsf | Mum  | 23  | BBS
null                | Ors  | 11  | MB
adad sf             | Kol  | 21  | BB
pqr                 | null | 21  | NN

또는 하드코드된 레이블 목록(필드/열 이름)을 사용하지 않으려는 경우:

$ cat tst.awk
BEGIN { OFS=" | " }
(NR==1) || (prevTag=="Couse") {
    numRecs++
}
{
    gsub(/^"|"$/,"")
    tag = val = $0
    sub(/".*/,"",tag)
    sub(/[^"]+":"/,"",val)

    if ( !(tag in tags2nums) ) {
        tagNr = ++numTags
        tags2nums[tag] = tagNr
        nums2tags[tagNr] = tag
        wids[tagNr] = ( length(tag) > length("null") ? length(tag) : length("null") )
    }

    tagNr = tags2nums[tag]
    vals[numRecs,tagNr] = val

    wid = length(val)
    wids[tagNr] = ( wid > wids[tagNr] ? wid : wids[tagNr] )

    prevTag = tag
}
END {
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        printf "%-*s%s", wids[tagNr], nums2tags[tagNr], (tagNr<numTags ? OFS : ORS)
    }

    for (recNr=1; recNr<=numRecs; recNr++) {
        for (tagNr=1; tagNr<=numTags; tagNr++) {
            val = ( (recNr,tagNr) in vals ? vals[recNr,tagNr] : "null" )
            printf "%-*s%s", wids[tagNr], val, (tagNr<numTags ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
Name                | City | Age | Couse
asxadadad  ,aaf dsf | Mum  | 23  | BBS
null                | Ors  | 11  | MB
adad sf             | Kol  | 21  | BB
pqr                 | null | 21  | NN

두 번째 스크립트 출력의 열 순서는 이러한 레이블이 입력에 나타나는 순서입니다. 따라서 모든 레이블이 나타나는 순서대로 표시되지 않는 한 값을 식별하기 위해 헤더 행이 필요합니다. 원하는 입력 그들은 출력합니다.

Answer

모든 Unix 시스템의 모든 쉘에서 awk를 사용하십시오.

$ cat tst.awk
BEGIN {
    numTags = split("Name City Age Couse",nums2tags)
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        tag = nums2tags[tagNr]
        tags2nums[tag] = tagNr
        wids[tagNr] = ( length(tag) > length("null") ? length(tag) : length("null") )
    }
    OFS=" | "
}
(NR==1) || (prevTag=="Couse") {
    numRecs++
}
{
    gsub(/^"|"$/,"")
    tag = val = $0
    sub(/".*/,"",tag)
    sub(/[^"]+":"/,"",val)

    tagNr = tags2nums[tag]
    vals[numRecs,tagNr] = val

    wid = length(val)
    wids[tagNr] = ( wid > wids[tagNr] ? wid : wids[tagNr] )

    prevTag = tag
}
END {
    # Uncomment these 3 lines if youd like a header line printed:
    # for (tagNr=1; tagNr<=numTags; tagNr++) {
    #   printf "%-*s%s", wids[tagNr], nums2tags[tagNr], (tagNr<numTags ? OFS : ORS)
    # }

    for (recNr=1; recNr<=numRecs; recNr++) {
        for (tagNr=1; tagNr<=numTags; tagNr++) {
            val = ( (recNr,tagNr) in vals ? vals[recNr,tagNr] : "null" )
            printf "%-*s%s", wids[tagNr], val, (tagNr<numTags ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
asxadadad  ,aaf dsf | Mum  | 23  | BBS
null                | Ors  | 11  | MB
adad sf             | Kol  | 21  | BB
pqr                 | null | 21  | NN

또는 하드코드된 레이블 목록(필드/열 이름)을 사용하지 않으려는 경우:

$ cat tst.awk
BEGIN { OFS=" | " }
(NR==1) || (prevTag=="Couse") {
    numRecs++
}
{
    gsub(/^"|"$/,"")
    tag = val = $0
    sub(/".*/,"",tag)
    sub(/[^"]+":"/,"",val)

    if ( !(tag in tags2nums) ) {
        tagNr = ++numTags
        tags2nums[tag] = tagNr
        nums2tags[tagNr] = tag
        wids[tagNr] = ( length(tag) > length("null") ? length(tag) : length("null") )
    }

    tagNr = tags2nums[tag]
    vals[numRecs,tagNr] = val

    wid = length(val)
    wids[tagNr] = ( wid > wids[tagNr] ? wid : wids[tagNr] )

    prevTag = tag
}
END {
    for (tagNr=1; tagNr<=numTags; tagNr++) {
        printf "%-*s%s", wids[tagNr], nums2tags[tagNr], (tagNr<numTags ? OFS : ORS)
    }

    for (recNr=1; recNr<=numRecs; recNr++) {
        for (tagNr=1; tagNr<=numTags; tagNr++) {
            val = ( (recNr,tagNr) in vals ? vals[recNr,tagNr] : "null" )
            printf "%-*s%s", wids[tagNr], val, (tagNr<numTags ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
Name                | City | Age | Couse
asxadadad  ,aaf dsf | Mum  | 23  | BBS
null                | Ors  | 11  | MB
adad sf             | Kol  | 21  | BB
pqr                 | null | 21  | NN

두 번째 스크립트 출력의 열 순서는 이러한 레이블이 입력에 나타나는 순서입니다. 따라서 모든 레이블이 나타나는 순서대로 표시되지 않는 한 값을 식별하기 위해 헤더 행이 필요합니다. 원하는 입력 그들은 출력합니다.

Question 2

펄에서. 그것이 무엇인지/어떻게 작동하는지에 대해 더 많은 설명을 추가하고 싶지만 코드의 주석이 모든 것을 다루고 있다고 생각합니다.

#!/usr/bin/perl

use strict;

my @people; # Array-of-Arrays (AoA) to hold each record
my %person; # hash to hold the current record as it's being read in.

# list of valid field names, in the order you want them printed
my @names   = qw(Name City Age Couse);
my $end_key = 'Couse';

# build a regex from the valid names
my $names    = join('|',@names);
my $names_re = qr/^(?:$names)$/;

# Initialise field widths, with a minimum of 4 (for 'null').
my %widths = map {$_ => (length > 4 ? length : 4) } @names;

while(<>) {
  chomp;

  s/^"|"$//g;                       # strip leading and trailing quotes
  my ($key,$val) = split /"?:"?/;   # split on :, with optional quotes.

  if ($key =~ m/$names_re/) {
    $widths{$key} = length($val) if ($widths{$key} < length($val) );

    $person{$key} = $val;

    if ($key eq $end_key) {
      # push an array into the @people array, containing the values of
      # the valid fields, in order.  Use null as the default value
      # if any field is empty/undefined.
      push @people, [ map { $person{$_} || 'null' } @names ];
      %person = ();
    };
  } else {
    print STDERR "Error on input line $.: unrecognised data\n";
  };
};

# build a printf format string, using the longest width of each field.
my $fmt = join(' | ', map { "%-$widths{$_}s" } @names) . "\n";

# optional header line, comment out if not wanted
printf $fmt, @names;

# optional ruler line, comment out if not wanted
print join('-|-', map { '-' x $widths{$_} } @names) . "\n";

foreach my $p (@people) {
  printf $fmt, @{ $p };
}

예를 들어 다른 이름으로 저장 columns.pl하고 chmod +x를 사용하여 실행 가능하게 만듭니다.

산출:

$ chmod +x columns.pl 
$ ./columns.pl demo.txt 
Name                | City | Age  | Couse
--------------------|------|------|------
asxadadad  ,aaf dsf | Mum  | 23   | BBS 
null                | Ors  | 11   | MB  
adad sf             | Kol  | 21   | BB  
pqr                 | null | 21   | NN

Answer

펄에서. 그것이 무엇인지/어떻게 작동하는지에 대해 더 많은 설명을 추가하고 싶지만 코드의 주석이 모든 것을 다루고 있다고 생각합니다.

#!/usr/bin/perl

use strict;

my @people; # Array-of-Arrays (AoA) to hold each record
my %person; # hash to hold the current record as it's being read in.

# list of valid field names, in the order you want them printed
my @names   = qw(Name City Age Couse);
my $end_key = 'Couse';

# build a regex from the valid names
my $names    = join('|',@names);
my $names_re = qr/^(?:$names)$/;

# Initialise field widths, with a minimum of 4 (for 'null').
my %widths = map {$_ => (length > 4 ? length : 4) } @names;

while(<>) {
  chomp;

  s/^"|"$//g;                       # strip leading and trailing quotes
  my ($key,$val) = split /"?:"?/;   # split on :, with optional quotes.

  if ($key =~ m/$names_re/) {
    $widths{$key} = length($val) if ($widths{$key} < length($val) );

    $person{$key} = $val;

    if ($key eq $end_key) {
      # push an array into the @people array, containing the values of
      # the valid fields, in order.  Use null as the default value
      # if any field is empty/undefined.
      push @people, [ map { $person{$_} || 'null' } @names ];
      %person = ();
    };
  } else {
    print STDERR "Error on input line $.: unrecognised data\n";
  };
};

# build a printf format string, using the longest width of each field.
my $fmt = join(' | ', map { "%-$widths{$_}s" } @names) . "\n";

# optional header line, comment out if not wanted
printf $fmt, @names;

# optional ruler line, comment out if not wanted
print join('-|-', map { '-' x $widths{$_} } @names) . "\n";

foreach my $p (@people) {
  printf $fmt, @{ $p };
}

예를 들어 다른 이름으로 저장 columns.pl하고 chmod +x를 사용하여 실행 가능하게 만듭니다.

산출:

$ chmod +x columns.pl 
$ ./columns.pl demo.txt 
Name                | City | Age  | Couse
--------------------|------|------|------
asxadadad  ,aaf dsf | Mum  | 23   | BBS 
null                | Ors  | 11   | MB  
adad sf             | Kol  | 21   | BB  
pqr                 | null | 21   | NN

Question 3

태그가 레코드 블록에 있는 위치와 별도로 작동하고 인쇄할 때 입력의 태그 순서와 마지막 태그를 따르는 짧은 GNU awk 호환(정규 표현식으로 정의된 RS용) 솔루션입니다.Couse레코드 끝 식별자입니다.

<infile awk -F'\n' -v tags='Name,City,Age,Couse' '
BEGIN{ tagsNum=split(tags, tgs, ","); RS="\n?\""tgs[tagsNum]"\":[^\n]*\n" }

function tbl(tag, field) {
    if(index(field, "\""tag"\"")==1 && !key[tag]++ || field==RT){
        gsub(/(^[^:]*:"|"\n?)/, "", field)
        key[tag]=field
    }
}
{ for(i=1; i<=NF; i++){ for(k in tgs) tbl(tgs[k], $i); tbl(RT, RT) }
  for(i=1; i<tagsNum; i++)
      printf "%s", (key[tgs[i]]!=""? key[tgs[i]]:"null") OFS; print key[RT]
  delete key
}' OFS='@|' |column -ts'@'

배열의 각 태그 이름에 대해 이 함수를 호출하여 tgs해당 태그가 나타나는 관련 필드를 일치시켜 해당 값을 다시 채운 다음 각 레코드에 대해 인쇄합니다.null값이 없으면 배열을 삭제하고 다음 블록에서도 동일한 작업을 수행합니다.

우리가 사용하는 column -ts'@'출력을 표로 만들기 위해 @문자는 에서 옵니다 OFS='@|'. 이 방법을 사용하면 column해당 문자를 기반으로 출력 필드가 조정되고 나중에 출력에서 제거되므로 해당 @문자가 입력 데이터에 없어야 한다고 가정합니다(가능한 경우). , 다른 문자로 바꿉니다). 당신이 가지고 있다면columnutil-linux포장 에서, 변경할 수 있습니다OFS='@|' |column -ts'@'도착하다OFS='|' |column -t -s'|' -o' | '.

asxadadad  ,aaf dsf  |Mum   |23  |BBS
null                 |Ors   |11  |MB
adad sf              |Kol   |21  |BB
pqr                  |null  |21  |NN

Answer

태그가 레코드 블록에 있는 위치와 별도로 작동하고 인쇄할 때 입력의 태그 순서와 마지막 태그를 따르는 짧은 GNU awk 호환(정규 표현식으로 정의된 RS용) 솔루션입니다.Couse레코드 끝 식별자입니다.

<infile awk -F'\n' -v tags='Name,City,Age,Couse' '
BEGIN{ tagsNum=split(tags, tgs, ","); RS="\n?\""tgs[tagsNum]"\":[^\n]*\n" }

function tbl(tag, field) {
    if(index(field, "\""tag"\"")==1 && !key[tag]++ || field==RT){
        gsub(/(^[^:]*:"|"\n?)/, "", field)
        key[tag]=field
    }
}
{ for(i=1; i<=NF; i++){ for(k in tgs) tbl(tgs[k], $i); tbl(RT, RT) }
  for(i=1; i<tagsNum; i++)
      printf "%s", (key[tgs[i]]!=""? key[tgs[i]]:"null") OFS; print key[RT]
  delete key
}' OFS='@|' |column -ts'@'

배열의 각 태그 이름에 대해 이 함수를 호출하여 tgs해당 태그가 나타나는 관련 필드를 일치시켜 해당 값을 다시 채운 다음 각 레코드에 대해 인쇄합니다.null값이 없으면 배열을 삭제하고 다음 블록에서도 동일한 작업을 수행합니다.

우리가 사용하는 column -ts'@'출력을 표로 만들기 위해 @문자는 에서 옵니다 OFS='@|'. 이 방법을 사용하면 column해당 문자를 기반으로 출력 필드가 조정되고 나중에 출력에서 제거되므로 해당 @문자가 입력 데이터에 없어야 한다고 가정합니다(가능한 경우). , 다른 문자로 바꿉니다). 당신이 가지고 있다면columnutil-linux포장 에서, 변경할 수 있습니다OFS='@|' |column -ts'@'도착하다OFS='|' |column -t -s'|' -o' | '.

asxadadad  ,aaf dsf  |Mum   |23  |BBS
null                 |Ors   |11  |MB
adad sf              |Kol   |21  |BB
pqr                  |null  |21  |NN

Question 4

데이터는 원본 JSON 문서 중 하나에서 수정된 것처럼 보입니다.

JSON 문서 구조를 복원해 보겠습니다.

문서의 [{시작과 끝 부분에 추가됨}]
},{정확한 문자열로 시작하는 각 줄의 끝에 추가됩니다 "Couse"(마지막 줄은 아님).
달리 수정되지 않는 각 줄의 끝에 쉼표를 추가합니다(즉, 줄 끝에 여전히 큰따옴표가 있습니다).

sed -e '1 s/^/[{/' -e '$ s/$/}]/' \
    -e '/^"Couse"/ { $! s/$/},{/; }' \
    -e 's/"$/&,/' file

예쁘게 인쇄하면 문서가 다음과 같이 바뀔 것입니다.

[
  {
    "Name": "asxadadad  ,aaf dsf",
    "City": "Mum",
    "Age": "23",
    "Couse": "BBS"
  },
  {
    "City": "Ors",
    "Age": "11",
    "Couse": "MB"
  },
  {
    "Name": "adad sf",
    "City": "Kol",
    "Age": "21",
    "Couse": "BB"
  },
  {
    "Name": "pqr",
    "Age": "21",
    "Couse": "NN"
  }
]

그런 다음 이를 CSV로 파이프할 수 있습니다 jq(일부 열 헤더를 추가하고 null 값을 문자열로 대체 null).

jq -r '    [ "Name", "City", "Age", "Couse" ],
    (.[] | [ .Name,  .City,  .Age,  .Couse  ]) |
    map(. // "null") | @csv'

이것은 생성됩니다

"Name","City","Age","Couse"
"asxadadad  ,aaf dsf","Mum","23","BBS"
"null","Ors","11","MB"
"adad sf","Kol","21","BB"
"pqr","null","21","NN"

그런 다음 툴킷 csvlook의 내용을 사용하여 csvkit보기 좋은 테이블을 생성할 수 있습니다.

최종 파이프라인은 다음과 같습니다.

sed -e '1 s/^/[{/' -e '$ s/$/}]/' \
    -e '/^"Couse"/ { $! s/$/},{/; }' \
    -e 's/"$/&,/' file |
jq -r '    [ "Name", "City", "Age", "Couse" ],
    (.[] | [ .Name,  .City,  .Age,  .Couse  ]) |
    map(. // "null") | @csv' |
csvlook --blanks

문자열을 그대로 유지하기 위해 csvlook해당 옵션을 사용합니다 (그렇지 않으면 문자열이 삭제됩니다).--blanksnull

결과는 다음과 같습니다

| Name                | City | Age | Couse |
| ------------------- | ---- | --- | ----- |
| asxadadad  ,aaf dsf | Mum  |  23 | BBS   |
| null                | Ors  |  11 | MB    |
| adad sf             | Kol  |  21 | BB    |
| pqr                 | null |  21 | NN    |

또는 마크다운으로 렌더링합니다.

이름	도시	나이	쿠스
아스다다드, aaf dsf	어머니	이십 삼	법정
유효하지 않은	오르스	11	MB
아다드 SF	서양 평지	이십 일	BB
퓨	유효하지 않은	이십 일	신경망

Answer