중복 항목 찾기 및 바꾸기

Question 1

다음은 정확한 입력 형식에 작동하고 빠르게 실행되는 sed 솔루션입니다.

sed -rz 's:[ \t]+:,:g;s:$:,:mg;:l;s:,([^,]+),(.*),\1,:,\1,\2,:;tl;s:,$::mg;s:^([^,]+),:\1\t:mg' file.csv

작동 방식:

"-z" 플래그는 전체 파일을 로드하므로 다음 코드는 기본값처럼 모든 줄에 적용되는 대신 한 번만 적용됩니다.

#transform input format to actual CSV format
s:[ \t]+:,:g;s:$:,:mg;
#loop while the s command can still find and replace
:l;
    #main code: find two identical cell values anywhere and delete the latter
    #on a very big file this can suffer from backtracking nightmare
    s:,([^,]+),(.*),\1,:,\1,\2,:;
tl;
#transform format back
s:,$::mg;s:^([^,]+),:\1\t:mg

Answer

다음은 정확한 입력 형식에 작동하고 빠르게 실행되는 sed 솔루션입니다.

sed -rz 's:[ \t]+:,:g;s:$:,:mg;:l;s:,([^,]+),(.*),\1,:,\1,\2,:;tl;s:,$::mg;s:^([^,]+),:\1\t:mg' file.csv

작동 방식:

"-z" 플래그는 전체 파일을 로드하므로 다음 코드는 기본값처럼 모든 줄에 적용되는 대신 한 번만 적용됩니다.

#transform input format to actual CSV format
s:[ \t]+:,:g;s:$:,:mg;
#loop while the s command can still find and replace
:l;
    #main code: find two identical cell values anywhere and delete the latter
    #on a very big file this can suffer from backtracking nightmare
    s:,([^,]+),(.*),\1,:,\1,\2,:;
tl;
#transform format back
s:,$::mg;s:^([^,]+),:\1\t:mg

Question 2

파일이 아래와 같은 실제 csv 파일(simple-csv)인 경우 다음 awk명령을 사용할 수 있습니다.

입력하다:

[email protected]
[email protected]
[email protected],[email protected],[email protected]

주문하다:

awk -F, '{ COMMA="";i=0; while (++i<=NF) {
           $1=$i; printf (!seen[$1]++)?COMMA$i:""; COMMA=","}; print ""
}' infile.csv

산출:

[email protected]
[email protected]
[email protected],[email protected]

그렇지 않고 입력이 질문에 제공된 것과 같을 경우 다음을 사용할 수 있습니다.

awk  'NR==1; NR>1{id=$1"\t"; COMMA=$1="";split($0, ar, /,| /); 
    for(i in ar){if(ar[i]!=""){printf(!seen[ar[i]]++)?id""COMMA""ar[i]:""; COMMA=",";id=""}
} print ""}' infile

산출:

id  emails
1       [email protected]
2       [email protected]
3       [email protected],[email protected]

Answer

파일이 아래와 같은 실제 csv 파일(simple-csv)인 경우 다음 awk명령을 사용할 수 있습니다.

입력하다:

[email protected]
[email protected]
[email protected],[email protected],[email protected]

주문하다:

awk -F, '{ COMMA="";i=0; while (++i<=NF) {
           $1=$i; printf (!seen[$1]++)?COMMA$i:""; COMMA=","}; print ""
}' infile.csv

산출:

[email protected]
[email protected]
[email protected],[email protected]

그렇지 않고 입력이 질문에 제공된 것과 같을 경우 다음을 사용할 수 있습니다.

awk  'NR==1; NR>1{id=$1"\t"; COMMA=$1="";split($0, ar, /,| /); 
    for(i in ar){if(ar[i]!=""){printf(!seen[ar[i]]++)?id""COMMA""ar[i]:""; COMMA=",";id=""}
} print ""}' infile

산출:

id  emails
1       [email protected]
2       [email protected]
3       [email protected],[email protected]

중복 항목 찾기 및 바꾸기

답변1

작동 방식:

답변2

관련 정보