패턴을 검색하고 같은 이름의 파일을 만듭니다.

Question 1

파일은 일련의 JSON 개체로 구성됩니다. 각 객체에는 .location_country키가 포함되어 있습니다. 키 값으로 명명된 파일에 개체 자체의 직렬화된 복사본을 쓰는 각 개체에서 셸 명령을 만들 수 있습니다 .location_country. 그런 다음 이러한 쉘 명령을 쉘에서 실행할 수 있습니다.

사용 jq,

jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt

@jsonjq직렬화된 객체는 입력 문서(이 경우 현재 객체)가 포함된 JSON 인코딩 문자열을 내보내는 in 연산자를 사용하여 생성할 수 있습니다 . 그런 다음 @sh문자열로 입력하여 셸을 올바르게 인용합니다. 이 연산자는 키 값을 기반으로 부분 출력 파일 이름을 만드는 데에도 @sh사용됩니다 ..location_country

이 명령은 기본적으로 를 호출하고 printf현재 개체를 출력하며 출력을 특정 파일로 리디렉션하는 셸 코드를 생성합니다.

의 예제 데이터가 주어지면 file.txt다음을 내보냅니다.

printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

이를 별도의 파일로 리디렉션하고 실행하여 sh명령을 실행하거나 eval셸에서 직접 사용할 수 있습니다.

eval "$( jq ...as above... )"

올바른 JSON 파서를 사용하고 있으므로 jq위 코드는 입력 JSON 문서가 한 줄에 하나의 개체로 형식화되지 않은 경우에도 작동합니다.

$ cat file.txt
{
  "full_name": "name1",
  "location_country": "united kingdom"
}
{
  "full_name": "name2",
  "location_country": "united states"
}
{
  "full_name": "name3",
  "location_country": "china"
}

$ jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt
printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

$ eval "$( jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt )"
$ ls
china.txt           file.txt            united kingdom.txt  united states.txt
$ cat 'united kingdom.txt'
{"full_name":"name1","location_country":"united kingdom"}

Answer

파일은 일련의 JSON 개체로 구성됩니다. 각 객체에는 .location_country키가 포함되어 있습니다. 키 값으로 명명된 파일에 개체 자체의 직렬화된 복사본을 쓰는 각 개체에서 셸 명령을 만들 수 있습니다 .location_country. 그런 다음 이러한 쉘 명령을 쉘에서 실행할 수 있습니다.

사용 jq,

jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt

@jsonjq직렬화된 객체는 입력 문서(이 경우 현재 객체)가 포함된 JSON 인코딩 문자열을 내보내는 in 연산자를 사용하여 생성할 수 있습니다 . 그런 다음 @sh문자열로 입력하여 셸을 올바르게 인용합니다. 이 연산자는 키 값을 기반으로 부분 출력 파일 이름을 만드는 데에도 @sh사용됩니다 ..location_country

이 명령은 기본적으로 를 호출하고 printf현재 개체를 출력하며 출력을 특정 파일로 리디렉션하는 셸 코드를 생성합니다.

의 예제 데이터가 주어지면 file.txt다음을 내보냅니다.

printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

이를 별도의 파일로 리디렉션하고 실행하여 sh명령을 실행하거나 eval셸에서 직접 사용할 수 있습니다.

eval "$( jq ...as above... )"

올바른 JSON 파서를 사용하고 있으므로 jq위 코드는 입력 JSON 문서가 한 줄에 하나의 개체로 형식화되지 않은 경우에도 작동합니다.

$ cat file.txt
{
  "full_name": "name1",
  "location_country": "united kingdom"
}
{
  "full_name": "name2",
  "location_country": "united states"
}
{
  "full_name": "name3",
  "location_country": "china"
}

$ jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt
printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

$ eval "$( jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt )"
$ ls
china.txt           file.txt            united kingdom.txt  united states.txt
$ cat 'united kingdom.txt'
{"full_name":"name1","location_country":"united kingdom"}

Question 2

사용awk

입력하다

$ cat input_file
{"full_name":"name1","location_country":"united kingdom"}
{"full_name":"name2","location_country":"united states"}
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

awk -F"[\"|:]" '$10~/[A-Za-z]/ {print > $10".txt"}' input_file

산출

$ cat china.txt
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

$ cat united\ kingdom.txt
{"full_name":"name1","location_country":"united kingdom"}

$ cat united\ states.txt
{"full_name":"name2","location_country":"united states"}

Answer

사용awk

입력하다

$ cat input_file
{"full_name":"name1","location_country":"united kingdom"}
{"full_name":"name2","location_country":"united states"}
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

awk -F"[\"|:]" '$10~/[A-Za-z]/ {print > $10".txt"}' input_file

산출

$ cat china.txt
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

$ cat united\ kingdom.txt
{"full_name":"name1","location_country":"united kingdom"}

$ cat united\ states.txt
{"full_name":"name2","location_country":"united states"}

Question 3

아래의 의견을 고려하면 match()의 세 번째 인수를 사용하여 GNU awk를 사용하고 동시에 열린 많은 파일을 처리하여 원하는 작업을 수행해야 합니다.

awk 'match($0,/"location_country":"([^"]+)"/,a) { print > (a[1] ".txt") }' file

실행 속도를 위해서는 장식/정렬/사용/장식 해제 방법이 가장 좋습니다. 예를 들면 다음과 같습니다.

awk -v OFS='"' 'match($0,/"location_country":"[^"]+"/) { print substr($0,RSTART+20,RLENGTH-21), $0 }' file |
sort -t'"' -k1,1 |
awk -F'"' '$1!=prev { close(out); out=$1 ".txt"; prev=$1 } { print > out }' |
cut -d'"' -f2-

이것은 모든 종류, awk 및 cut에서 작동합니다.

원래 답변:

데이터가 항상 단순/정규적인 경우 필요한 것은 GNU awk를 사용하는 것뿐입니다(동시에 열리는 많은 출력 파일을 처리하기 위해).

awk -F'"' '{ print > ($5 ".txt") }' file

또는 awk와 함께 사용하십시오.

awk -F'"' '{
    out = $5 ".txt"
    if ( !seen[out]++ ) {
        printf "" > out
    }
    print >> out
    close(out)
}' file

입력 파일의 크기에 관계없이 위의 방법은 출력 파일을 생성하는 데 사용할 수 있는 디스크 공간이 있는 한 작동합니다.

원하는 경우 국가 이름을 먼저 정렬하여 이 작업을 보다 효율적으로 수행할 수 있습니다.

sort -t'"' -k5,5 file |
awk -F'"' '$5 != prev{ close(out); out=$5 ".txt"; prev=$5 } { print > out }'

마지막 스크립트는 모든 정렬 및 awk에서 작동하지만 국가별로 입력 줄의 순서를 재정렬할 수 있습니다. 이에 대해 관심이 있고 GNU 정렬이 있는 경우 인수를 추가하십시오 -s. 관심이 있고 GNU 정렬이 없는 경우 매우 간단한 해결 방법이 있으므로 알려 주시기 바랍니다.

Answer