여러 개의 쉼표 구분 기호가 있는 Linux CSV 열 - sed 정규식 일치 및 따옴표로 바꾸기

여러 개의 쉼표 구분 기호가 있는 Linux CSV 열 - sed 정규식 일치 및 따옴표로 바꾸기

솔루션을 해킹하는 것은 약간 어렵습니다.

csv로 내보낸 SQL 테이블의 한 열에는 다음 데이터 유형의 쉼표로 구분된 문자열이 포함되어 있습니다. 일부 줄에는 구분 기호 없이 문자열만 있지만 다른 줄은 아래와 같습니다.

정규식과 큰따옴표를 일치시켜 전체 문자열이 필요합니다.
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1
SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1

여기 내가 붙어있어

샘플 파일은 다음과 같습니다.

export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1,6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK

다음 형식은 최종 목표입니다. 즉, 큰따옴표로 묶인 전체 문자열입니다. 정규 표현식을 사용하고 양쪽 끝에 큰따옴표를 추가합니다. 아래 예 - 처음에는 정규식을 전달하고 sed를 적용해 보았습니다.

grep -Eo "\w{10}\=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]" export.csv | sed 's/^/"/;s/$/"/'
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1"
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1"
"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1"

그러나 grep과 sed -i를 사용하면 일치하는 정규식을 수정하지 않습니다.

예를 들어 -

tail export.csv | sed -e 's/"\w{10}\=\w{12},\w{10}=\w{3}.*,\w{10}=\d$"/"/;s/$/"/'
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1,4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1,6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK"

"를 추가하면 됩니다.마지막 열.

답변1

요점은 구분된 필드 \w+=\w+와 일치하는 ,시퀀스 주위에 따옴표를 붙이는 것 입니다 perl.

$ perl -lpe 's/(?<![^,])(\w+=\w+)(,(?1))*(?![^,])/"$&"/g' your-file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L2-TDD,"AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L3-TDD,"AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1",AntennaUnitGroup=L1-TDD,"AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK

AntennaUnitGroup=L2-TDD따옴표 밖에는 ord 문자가 -없다는 점에 유의하세요 \w. 를 제외한 모든 문자 \w와 일치하려면 [\w-]또는 로 바꾸십시오 (예: include ).[^,"=],"=

n-place -i파일을 수정하는 옵션이 추가되었습니다 .i

답변2

awk를 사용하십시오.

$ awk 'BEGIN{FS=OFS=","} {$12="\""$12; $19=$19"\""} 1' file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK

-E또는 GNU 또는 BSD sed와 같은 ERE 지원 sed의 경우:

$ sed -E 's/(([^,]*,){11})(([^,]*,){7}[^,]*)/\1"\3"/' file
export_pl,XYSS04S,58.38,800,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L2-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,NULL,0.0604592,NULL,NULL,NULL,NULL,0,NULL,0.666667,0,0,NULL,NULL,NULL,NULL,1703.12,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,250,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L3-TDD,AntennaNearUnit=1,RetSubUnit=1",4,0,28169.8,0,0,NULL,NULL,99.9459,3.88318,0,NULL,25.7521,0.666667,888,50848,43,1966,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK
export_pl,XYSS04S,58.38,340,39544,0,6,NULL,NULL,0,0,"SubNetwork=ONRM_ROOT_AB,SubNetwork=XYZ,MeContext=XYSS04S,ManagedElement=XYSS04S,Equipment=1,AntennaUnitGroup=L1-TDD,AntennaNearUnit=1,RetSubUnit=1",6,0,28169.8,0,0,NULL,NULL,100,100,0.000163336,NULL,0.0388318,1,43,1966,888,50848,NULL,NULL,NULL,RT,2023-10-08 00:00:00,2023-10-10 00:00:00,20:00:00;21:00:00;22:00:00;14:00:00,1,OK

답변3

사용 awk:

$ awk '{ sub(/[_[:alpha:]]{10}=[_[:alpha:]]{12},[_[:alpha:]]{10}=[_[:alpha:]]{3}.*,[_[:alpha:]]{10}=[[:alnum:]]+/, "\"&\"")}1'

# Using GNU awk
$ awk '{sub(/\w{10}=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]{1,}/, "\"&\"" ) }1'

아니면 GNU awk기능을 gensub()이용 해보세요

$ awk '{print gensub(/(^.*)(\w{10}=\w{12},\w{10}=\w{3}.*,\w{10}=[[:alnum:]]{1,})(.*$)/, "\\1\"\\2\"\\3", "g"); }'

관련 정보