![Join 사용 시 추가된 캐리지 리턴 문자 이동](https://linux55.com/image/140175/Join%20%EC%82%AC%EC%9A%A9%20%EC%8B%9C%20%EC%B6%94%EA%B0%80%EB%90%9C%20%EC%BA%90%EB%A6%AC%EC%A7%80%20%EB%A6%AC%ED%84%B4%20%EB%AC%B8%EC%9E%90%20%EC%9D%B4%EB%8F%99.png)
두 개의 파이프로 구분된 파일을 결합하려고 하지만 내 결합 명령을 사용한 후에는 다음과 같습니다.
join -a 1 -i -t"|" -o 1.3 1.1 2.2 1.4 1.5 2.3 2.4 2.5 2.6 2.7 2.8 2.9 <(sort -d -t"|" -z alt.csv) <(sort -d -t"|" -z ../original/alt.csv) > ../out/alt.csv
출력 파일에는 연결이 발생하는 캐리지 리턴 문자가 있습니다. 예:
IRN|EADUnitID|EADPhysicalTechnical|AdmPublishWebNoPassword|AdmPublishWebPassword
|EADUnitTitle|EADBiographyOrHistory|EADScopeAndContent|EADArrangement|EADAcquisitionInformationRef|EADRelatedMaterial|BibBibliographyRef_tab
51899|ga.1.1|GLS Add. GA 1/1|Yes|Yes
|Photographic negatives ||<p>The albums comprise of negatives of Gypsies and Gypsy life in Germany and eastern Europe. The albums have been indexed and the negatives numbered by Althaus in series I-IV; VII-VIII, though numbering is not continuous. The majority of the negatives have duplicates in slide or photograph format (GA 1/2 and GA 3) and reference has been made to these. The captions are those taken from the index except for unindexed negatives, whereupon the caption has been taken from a duplicate photograph or slide. Where there is no duplicate, the caption simply describes what can be seen in the negative. The list also includes 22 negatives that are indexed in the albums but are missing. There is a closed section from GA 1/1/53 - GA 1/1/68 due to the sensitive nature of the negatives. </p>||||
51900|ga.1.1.1|GLS Add. GA 1/1/1|Yes|Yes
|Ehepaar Weltzel. ||||||
51901|ga.1.1.2|GLS Add. GA 1/1/2|Yes|Yes
|Ehepaar Weltzel. ||||||
51902|ga.1.1.3|GLS Add. GA 1/1/3|Yes|Yes
|Roßlau, Dessauerstr Kegli. Julius Braun, Bitterfield, 1939 Koitsch. ||||||
그러나 올바른 처리를 위해서는 캐리지 리턴이 마지막 열 뒤에 나타나야 합니다.
IRN|EADUnitID|EADPhysicalTechnical|AdmPublishWebNoPassword|AdmPublishWebPassword|EADUnitTitle|EADBiographyOrHistory|EADScopeAndContent|EADArrangement|EADAcquisitionInformationRef|EADRelatedMaterial|BibBibliographyRef_tab
51899|ga.1.1|GLS Add. GA 1/1|Yes|Yes|Photographic negatives ||<p>The albums comprise of negatives of life in Germany and eastern Europe. The albums have been indexed and the negatives numbered by Althaus in series I-IV; VII-VIII, though numbering is not continuous. The majority of the negatives have duplicates in slide or photograph format (GA 1/2 and GA 3) and reference has been made to these. The captions are those taken from the index except for unindexed negatives, whereupon the caption has been taken from a duplicate photograph or slide. Where there is no duplicate, the caption simply describes what can be seen in the negative. The list also includes 22 negatives that are indexed in the albums but are missing. There is a closed section from GA 1/1/53 - GA 1/1/68 due to the sensitive nature of the negatives. </p>||||
51900|ga.1.1.1|GLS Add. GA 1/1/1|Yes|Yes|Ehepaar Weltzel. ||||||
51901|ga.1.1.2|GLS Add. GA 1/1/2|Yes|Yes|Ehepaar Weltzel. ||||||
51902|ga.1.1.3|GLS Add. GA 1/1/3|Yes|Yes|Roßlau, Dessauerstr Kegli. Julius Braun, Bitterfield, 1939 Koitsch. ||||||
원하는 결과를 얻기 위해 sed나 awk를 사용할 수 있는 방법이 있나요? 먼저 마지막 열 끝에 다른 파이프를 추가하고 발생 횟수에 따라 교체해야 합니까?
답변1
해결책을 찾았지만 특별히 우아하지는 않습니다. 나는 두 번째 파일에 추가 파이프를 추가하여 결합하기로 결정했습니다. 이렇게 하면 형식을 올바르게 지정하기 위해 몇 가지 추가 처리를 수행할 수 있기 때문입니다.
이제 내가 취해야 할 단계는 다음과 같습니다.
# add pipe to the end of the line for ORIGINAL files only
sed -i 's/$/|/' ../original/alt.csv
--- Do join and output joined file to ../out/alt.csv ---
# match on last pipe and add a carriage return
sed -i 's/\(.*\)\|/\0\r/' ../out/alt.csv
# remove carriage return where join occurred (the use of pipe is simply to locate carriage return) and replace with pipe
sed -i 's/\r|/|/' ../out/alt.csv
# remove all blank lines
sed -i '/^\s*$/d' ../out/alt.csv
# remove pipe at the end of the line of output file and add a carriage return
sed -i 's/[^\r\n].$/\r/' ../out/alt.csv
이를 달성하는 쉬운 방법이 있다면 기꺼이 듣고 싶습니다.