CSV의 필드에서 쉼표를 제거하는 방법은 무엇입니까?

Question 1

표시된 대로 필드가 올바르게 인용된 경우 삽입된 쉼표는 문제가 되지 않습니다(CSV 인식 파서를 사용하여 데이터를 읽는다고 가정).

여전히 명명된 필드에서 쉼표를 제거해야 한다고 생각하는 경우 nameCSV를 지원하는 파서를 사용하세요.csvkit또는밀러( mlr) 데이터 처리.

Miller를 사용한 예는 다음과 같습니다.

mlr --csv put '$name = gsub($name, ",", "")' file.csv

그러면 CSV에서 데이터를 읽고 file.csv, 찾은 것과 유사한 바꾸기 기능을 사용하여 awk이름이 지정된 필드에서 모든 쉼표를 제거한 name다음 수정되었을 수 있는 레코드를 출력합니다.

예:

$ cat file.csv
age,name,note
47,"Hatter, Mad","Isn't actually ""mad"""
39,"Rabbit, White",Drinks too much tea
2,"Dormouse, The",Sleeps most of the time
$ mlr --csv put '$name = gsub($name, ",", "")' file.csv
age,name,note
47,Hatter Mad,"Isn't actually ""mad"""
39,Rabbit White,Drinks too much tea
2,Dormouse The,Sleeps most of the time

csvformat(csvkit에서) 및 의 경우 tr다음은 문서의 구분 기호를 일시적으로 세미콜론으로 변경하여 모든 쉼표를 제거합니다.

csvformat -D ';' file.csv | tr -d , | csvformat -d ';'

예:

$ csvformat -D ';' file.csv | tr -d , | csvformat -d ';'
age,name,note
47,Hatter Mad,"Isn't actually ""mad"""
39,Rabbit White,Drinks too much tea
2,Dormouse The,Sleeps most of the time

csvsql또는 (csvkit에서도) 일부 SQL을 통해 필드에서 쉼표를 제거하는 데 사용할 수 있습니다 .name

csvsql --query 'UPDATE file SET name = REPLACE(name, ",", "")' \
    --query 'SELECT * FROM file' file.csv

Answer

표시된 대로 필드가 올바르게 인용된 경우 삽입된 쉼표는 문제가 되지 않습니다(CSV 인식 파서를 사용하여 데이터를 읽는다고 가정).

여전히 명명된 필드에서 쉼표를 제거해야 한다고 생각하는 경우 nameCSV를 지원하는 파서를 사용하세요.csvkit또는밀러( mlr) 데이터 처리.

Miller를 사용한 예는 다음과 같습니다.

mlr --csv put '$name = gsub($name, ",", "")' file.csv

그러면 CSV에서 데이터를 읽고 file.csv, 찾은 것과 유사한 바꾸기 기능을 사용하여 awk이름이 지정된 필드에서 모든 쉼표를 제거한 name다음 수정되었을 수 있는 레코드를 출력합니다.

예:

$ cat file.csv
age,name,note
47,"Hatter, Mad","Isn't actually ""mad"""
39,"Rabbit, White",Drinks too much tea
2,"Dormouse, The",Sleeps most of the time
$ mlr --csv put '$name = gsub($name, ",", "")' file.csv
age,name,note
47,Hatter Mad,"Isn't actually ""mad"""
39,Rabbit White,Drinks too much tea
2,Dormouse The,Sleeps most of the time

csvformat(csvkit에서) 및 의 경우 tr다음은 문서의 구분 기호를 일시적으로 세미콜론으로 변경하여 모든 쉼표를 제거합니다.

csvformat -D ';' file.csv | tr -d , | csvformat -d ';'

예:

$ csvformat -D ';' file.csv | tr -d , | csvformat -d ';'
age,name,note
47,Hatter Mad,"Isn't actually ""mad"""
39,Rabbit White,Drinks too much tea
2,Dormouse The,Sleeps most of the time

csvsql또는 (csvkit에서도) 일부 SQL을 통해 필드에서 쉼표를 제거하는 데 사용할 수 있습니다 .name

csvsql --query 'UPDATE file SET name = REPLACE(name, ",", "")' \
    --query 'SELECT * FROM file' file.csv

Question 2

다음과 같은 CSV가 주어지면@Kusalananda의 답변:

$ cat file.csv
age,name,note
47,"Hatter, Mad","Isn't actually ""mad"""
39,"Rabbit, White",Drinks too much tea
2,"Dormouse, The",Sleeps most of the time

awk를 사용하면 간결하지만 깨지기 쉬운 방법이 있습니다(첫 번째 필드도 인용되거나 두 번째 필드에 이스케이프된 인용문이 포함되어 있으면 중단됩니다).

$ awk 'BEGIN{FS=OFS="\""} {sub(/,/,"",$2)} 1' file.csv
age,name,note
47,"Hatter Mad","Isn't actually ""mad"""
39,"Rabbit White",Drinks too much tea
2,"Dormouse The",Sleeps most of the time

덜 간결하지만 강력한 것(필드 내의 따옴표가 각 필드를 두 배로 늘려 이스케이프된다고 가정하고 개행 문자를 제외한 모든 필드 내용에 대해 작동함)RFC 4180) GNU awk 사용 FPAT:

$ awk 'BEGIN{FPAT="([^,]*)|(\"([^\"]|\"\")*\")"; OFS=","} {sub(/,/,"",$2)} 1' file.csv
age,name,note
47,"Hatter Mad","Isn't actually ""mad"""
39,"Rabbit White",Drinks too much tea
2,"Dormouse The",Sleeps most of the time

어떤 것을 사용해야 하는지는 CSV의 내용에 따라 다릅니다.

참조하는 필드에 개행 문자가 포함될 수 있거나 위의 첫 번째 스크립트가 작동하지 않는 CSV가 있고 GNU awk에 액세스할 수 없는 경우 다른 솔루션이 필요합니다. 예를 참조하세요.awk를 사용하여 csv를 효율적으로 구문 분석하는 가장 강력한 방법은 무엇입니까

Answer

다음과 같은 CSV가 주어지면@Kusalananda의 답변:

$ cat file.csv
age,name,note
47,"Hatter, Mad","Isn't actually ""mad"""
39,"Rabbit, White",Drinks too much tea
2,"Dormouse, The",Sleeps most of the time

awk를 사용하면 간결하지만 깨지기 쉬운 방법이 있습니다(첫 번째 필드도 인용되거나 두 번째 필드에 이스케이프된 인용문이 포함되어 있으면 중단됩니다).

$ awk 'BEGIN{FS=OFS="\""} {sub(/,/,"",$2)} 1' file.csv
age,name,note
47,"Hatter Mad","Isn't actually ""mad"""
39,"Rabbit White",Drinks too much tea
2,"Dormouse The",Sleeps most of the time

덜 간결하지만 강력한 것(필드 내의 따옴표가 각 필드를 두 배로 늘려 이스케이프된다고 가정하고 개행 문자를 제외한 모든 필드 내용에 대해 작동함)RFC 4180) GNU awk 사용 FPAT:

$ awk 'BEGIN{FPAT="([^,]*)|(\"([^\"]|\"\")*\")"; OFS=","} {sub(/,/,"",$2)} 1' file.csv
age,name,note
47,"Hatter Mad","Isn't actually ""mad"""
39,"Rabbit White",Drinks too much tea
2,"Dormouse The",Sleeps most of the time

어떤 것을 사용해야 하는지는 CSV의 내용에 따라 다릅니다.

참조하는 필드에 개행 문자가 포함될 수 있거나 위의 첫 번째 스크립트가 작동하지 않는 CSV가 있고 GNU awk에 액세스할 수 없는 경우 다른 솔루션이 필요합니다. 예를 참조하세요.awk를 사용하여 csv를 효율적으로 구문 분석하는 가장 강력한 방법은 무엇입니까

Question 3

@Kusalananda의 CSV 예제를 빌려 기본 CSV 파서와 함께 Ruby를 사용할 수 있습니다.

$ ruby -r csv -e 'data=CSV.parse($<.read, **{:headers=>true})
data["name"]=data["name"].map{|e| e.gsub(/,/,"")}
puts data' file.csv
age,name,note
47,Hatter Mad,"Isn't actually ""mad"""
39,Rabbit White,Drinks too much tea
2,Dormouse The,Sleeps most of the time

또는 이름을 바꾸어 쉼표 없이 더 잘 맞도록 하려면 다음을 수행하십시오.

$ ruby -r csv -e 'data=CSV.parse($<.read, **{:headers=>true})
data["name"]=data["name"].map{|e| e.split(/,\s*/,2).reverse.join(" ")}
puts data' file.csv
age,name,note
47,Mad Hatter,"Isn't actually ""mad"""
39,White Rabbit,Drinks too much tea
2,The Dormouse,Sleeps most of the time

Answer

@Kusalananda의 CSV 예제를 빌려 기본 CSV 파서와 함께 Ruby를 사용할 수 있습니다.

$ ruby -r csv -e 'data=CSV.parse($<.read, **{:headers=>true})
data["name"]=data["name"].map{|e| e.gsub(/,/,"")}
puts data' file.csv
age,name,note
47,Hatter Mad,"Isn't actually ""mad"""
39,Rabbit White,Drinks too much tea
2,Dormouse The,Sleeps most of the time

또는 이름을 바꾸어 쉼표 없이 더 잘 맞도록 하려면 다음을 수행하십시오.

$ ruby -r csv -e 'data=CSV.parse($<.read, **{:headers=>true})
data["name"]=data["name"].map{|e| e.split(/,\s*/,2).reverse.join(" ")}
puts data' file.csv
age,name,note
47,Mad Hatter,"Isn't actually ""mad"""
39,White Rabbit,Drinks too much tea
2,The Dormouse,Sleeps most of the time

Question 4

사용행복하다(이전 Perl_6)

Raku의 Text::CSV모듈을 사용하여 CSV를 구문 분석합니다.

~$ raku -MText::CSV -e 'my @a = csv(in => $*IN, strict => True);  \
         @a.skip>>.[1] = @a.skip>>.[1].map: *.trans( "," => "");  \
         .join("\t").put for @a;'  <  file.csv

입력 예(@Kusalananda에서):

age,name,note
47,"Hatter, Mad","Isn't actually ""mad"""
39,"Rabbit, White",Drinks too much tea
2,"Dormouse, The",Sleeps most of the time

위 코드의 지시에 따라 가독성을 높이기 위해 탭으로 구분된 열 출력이 아래에 표시됩니다(대신 CSV 파일 생성 사용 join(",")).

예제 출력(1)

age name    note
47  Hatter Mad  Isn't actually "mad"
39  Rabbit White    Drinks too much tea
2   Dormouse The    Sleeps most of the time

열에 공백이 있는 경우 문제가 있는 경우 .trim등을 사용할 수 있습니다. 또한 OP는 열 2의 내용을 반전하여 이름을 먼저 읽은 다음 성을 읽을 수도 있습니다. 그렇다면 split필드를 쉼표로 묶는 것이 아마도 최선의 방법일 것입니다. 위의 마지막 명령문을 로 변경하면 csv(in => @a, out => $*OUT, sep_char => "\t")실제 TSV(또는 CSV) 파일이 생성됩니다.

~$ raku -MText::CSV -e 'my @a = csv(in => "Mad_Hatter.csv", strict => True);  \
         @a.skip>>.[1] = @a.skip>>.[1].map: *.split(", ").reverse;  \
         csv(in => @a, out => $*OUT, sep_char => "\t");'  < file.csv

샘플 출력(2)

age name    note
47  "Mad Hatter"    "Isn't actually ""mad"""
39  "White Rabbit"  "Drinks too much tea"
2   "The Dormouse"  "Sleeps most of the time"

https://modules.raku.org/dist/Text::CSV:cpan:HMBRAND
https://github.com/Tux/CSV
https://raku.org

Answer