SQLITE 데이터베이스로 가져오기 전에 편집하고 싶은 CSV 파일이 있습니다. 수천 줄이 있는데 줄의 일부를 복사하여 파이프 "|"를 사용하여 끝에 추가하고 싶습니다. 쉽게 분리하여 데이터베이스로 가져올 수 있습니다.
CSV에는 다음과 같은 줄이 포함되어 있습니다.
989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked - http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) - http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) - http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|
765 번호를 복사하여 다음과 같이 줄 끝에 추가하고 싶습니다.
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) - http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591
CSV의 각 행에 대해 이 작업을 수행하고 싶습니다. 따라서 for 루프가 필요할 수도 있습니다. 나는 모른다.
답변1
sed
해결책:
sed -E 's/.*\/profiles\/([0-9]+).*/&\1/' file.csv
예제 출력:
989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked - http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|76561198402636850
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) - http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|76561198006267103
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) - http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591
답변2
그리고 awk
:
awk -F'["/]' '{print $0$(NF-1)}' infile > outfile
print
전체 행 $0
과 끝에서 두 번째 필드 $(NF-1)
. 여기서 필드 구분 기호 는 따옴표 또는 슬래시 -F
집합이며 결과 는 에 저장됩니다 .'[...]'
"
/
infile
outfile
답변3
$ sed -E 'h;s/.*(http[^ ]*).*/\1/;s/.*\///;H;x;s/\n//' file
989155126903533568|2018-04-25|14:52:14|GMT|report|"""Умственно отстал"" was checked - http://steamcommunity.com/profiles/76561198402636850 …"|0|0|0|76561198402636850
989154874184085505|2018-04-25|14:51:14|GMT|report|"""Clavicus Vile"" was checked (8 reports) - http://steamcommunity.com/profiles/76561198006267103 …"|0|0|0|76561198006267103
989154622890823685|2018-04-25|14:50:14|GMT|report|"""~TAKA~"" was checked (3 reports) - http://steamcommunity.com/profiles/76561198161608591 …"|0|0|0|76561198161608591
주석이 달린 스크립트 sed
:
h # save a copy of the current line in the "hold space"
s/.*(http[^ ]*).*/\1/ # remove everything but the URL
s/.*\/// # trim the URL so that only the last bit (the number) is left
H # add that last bit to the "hold space" (with a newline in-between)
x # swap the "hold space" and the "pattern space"
s/\n// # delete that inserted newline
# (implicit print at the end)
이는 URL이 항상 다음과 같다고 가정합니다.오직URL만 있으면 됩니다언제나공백 문자로 구분됩니다.