파일의 열 5 값을 기준으로 .CSV 파일을 필터링하고 이러한 레코드를 새 파일에 인쇄합니다.

Question 1

awk -F '","'  'BEGIN {OFS=","} { if (toupper($5) == "STRING 1")  print }' file1.csv > file2.csv

산출

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

나는 이것이 당신이 원하는 것이라고 생각합니다.

Answer

awk -F '","'  'BEGIN {OFS=","} { if (toupper($5) == "STRING 1")  print }' file1.csv > file2.csv

산출

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

나는 이것이 당신이 원하는 것이라고 생각합니다.

Question 2

CSV의 문제점은 표준이 없다는 것입니다. CSV 형식의 데이터를 자주 처리해야 하는 경우 단순히 ","필드 구분 기호 로 사용하는 것보다 더 강력한 방법을 찾고 싶을 수 있습니다 . 이 경우 Perl의 Text::CSVCPAN 모듈이 해당 작업에 적합합니다.

$ perl -mText::CSV_XS -WlanE '
    BEGIN {our $csv = Text::CSV_XS->new;} 
    $csv->parse($_); 
    my @fields = $csv->fields(); 
    print if $fields[4] =~ /string 1/i;
' file1.csv
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

Answer

CSV의 문제점은 표준이 없다는 것입니다. CSV 형식의 데이터를 자주 처리해야 하는 경우 단순히 ","필드 구분 기호 로 사용하는 것보다 더 강력한 방법을 찾고 싶을 수 있습니다 . 이 경우 Perl의 Text::CSVCPAN 모듈이 해당 작업에 적합합니다.

$ perl -mText::CSV_XS -WlanE '
    BEGIN {our $csv = Text::CSV_XS->new;} 
    $csv->parse($_); 
    my @fields = $csv->fields(); 
    print if $fields[4] =~ /string 1/i;
' file1.csv
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

Question 3

csvgrepcsvkit에서

awk를 사용하는 가장 안정적인 방법은 FPAT다음을 사용하는 것입니다.https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk/45420607#45420607불행하게도 FPAT따옴표 안의 리터럴 개행 문자도 처리할 수 없습니다.

대신, 더 똑똑해지고 싶다면 사용할 수 있는 다양한 CSV CLI 도구가 있습니다. pip 버전을 통해 설치하기 매우 쉬운 방법(Python 기반이기 때문에 반드시 가장 빠르지는 않지만)은 csvgrepcsvkit에서 제공됩니다.

pip install csvkit

그러면 일치하지 않는 행을 얻을 수 있습니다.

csvgrep -H -c5 -r '^string 1$' mytest.csv

명령 설명:

-H: 첫 번째 줄은 제목 줄이 아닙니다.
-i:역 일치
-c5: 다섯 번째 열에서 작동
-r: 다음 정규식과 일치합니다.

구체적인 예:

printf '00,01,02,03,string 1,"04,\n""05"\n10,11,12,13,string 2,"14,\n""15"\n' > nohead.csv
printf 'col1,col2,col3,col4,col5,col6\n00,01,02,03,string 1,"04,\n""05"\n10,11,12,13,string 2,"14,\n""15"\n' > head.csv

그 다음에:

csvgrep -H -c5 -r '^string 1$' nohead.csv | tail -n+2

산출:

00,01,02,03,string 1,"04,
""05"

불쾌한 더미 헤더를 추가하기 tail때문에 파이프로 연결합니다 .-H

a,b,c,d,e,f
00,01,02,03,string 1,"04,
""05"

우리는 -i일치를 되돌릴 수 있습니다:

csvgrep -H -i -c5 -r '^sstring 1$' nohead.csv | tail -n+2

산출:

10,11,12,13,string 2,"14,
""15"

헤더가 있으면 열 이름을 사용할 수 있습니다.

csvgrep -c col5 -r '^string 1$' head.csv

산출:

col1,col2,col3,col4,col5,col6
00,01,02,03,string 1,"04,
""05"

csvkit 1.0.7, Ubuntu 23.04에서 테스트되었습니다.

Answer