CSV 파일에서 같은 행의 중복 항목을 제거하는 방법은 무엇입니까?

2024-6-1 • tag-icon

약 4000개의 행이 있는 csv 파일이 있는데 각 행에는 2~30개의 쉼표로 구분된 이름이 포함되어 있습니다. 이름에는 직함이 포함됩니다(예: Mr. X Adams 또는 Ms. Y Sanders). 일부 이름은 같은 행에 여러 번 존재하며 같은 행에서 여러 이름을 삭제하고 싶습니다. 이는 "input.csv" 파일과 최종 결과가 되어야 하는 또 다른 파일 "output.csv"에 있습니다.

예를 들어 다음과 같습니다.

mr. 1,mr. 2,mr. 3,mr. 1,mr. 4
prof. x,prof. y,prof. x
mr. 1,prof y

이것이 되어야 한다

mr. 1,mr. 2,mr. 3,mr. 4   (mr. 1 was already meantioned so it should be removed)
prof. x,prof. y           (prof. x was already mentioned so it should be removed)
mr. 1,prof y              (even though both were already mentioned in the same file, they were not mentioned within this line so they may remain)

답변1

당신은 시도 할 수 있습니다:

#!/bin/bash

cat file | while IFS= read -r line ; do 
echo "$line" | tr , '\n' | sort -u | tr '\n' , | sed 's/,$/\n/' ; 
done

답변1

관련 정보