두 번째 열에는 "0-4년 고위험", "65+ 최초 대응자" 등과 같은 열 이름이 있는 scene1.csv라는 csv 파일이 있습니다. 이러한 값은 20개입니다. 21행, 2열에는 1행, 2열과 동일한 항목이 있습니다. 이 값의 이름을 각각 p1-p20으로 바꾸고 싶습니다. 따라서 21번째 줄에는 p1이라는 라벨이 붙게 됩니다. 따옴표 없는 모든 것. scene1.csv, sceneario2.csv라는 이름의 파일이 150개 있습니다. 어떻게 해야 합니까? 다음은 더 짧은 파일의 예입니다.
t, group, 1, 3, 5
0, 0-4 years low risk, 0, 0, 0
0, 0-4 years high risk, 0, 0, 1
....., ....
0, 0-4 years low risk, 0, 0, 0
파일당 예상 출력:
t, group, 1, 3, 5
0, p1, 0, 0, 0
0, p2, 0, 0, 0
....., ....
0, p1, 0, 0, 0
이것은 나에게 필요한 사전입니다:
0-4 years first responder p1
0-4 years high risk p2
.......
65+ years low risk p19
65+ years pregnant women p20
답변1
sponge
GNU AWK가 설치되어 있지 않기 때문에 :
<<<"$(<treatables-000.csv)" awk -F ',' -v OFS=',' 'NR!=1{$2="p"(NR-2)%20+1}1' >treatables-000.csv
-F ','
: 입력 필드 구분 기호를 로 설정합니다,
.-v OFS=','
: 출력 필드 구분 기호를 로 설정합니다,
.NR!=1{$2="p"(NR-2)%20+1}1
: 현재 레코드 번호가 보다 큰 경우1
두 번째 필드를p
문자와 결과 표현식으로 구성된 문자열 로 설정하고(NR-2)%20+1
레코드를 인쇄합니다.
% cat treatables-000.csv
t,group,1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127,129,131,133,135,137,139,141,143,145,147,149,151,153,155,157,159,161,163,165,167,169,171,173,175,177,179,181,183,185,187,189,191,193,195,197,199,201,203,205,207,209,211,213,215,217,219,221,223,225,227,229,231,233,235,237,239,241,243,245,247,249,251,253,255,257,259,261,263,265,267,269,271,273,275,277,279,281,283,285,287,289,291,293,295,297,299,301,303,305,307,309,311,313,315,317,319,321,323,325,327,329,331,333,335,337,339,341,343,345,347,349,351,353,355,357,359,361,363,365,367,369,371,373,375,377,379,381,383,385,387,389,391,393,395,397,399,401,403,405,407,409,411,413,415,417,419,421,423,425,427,429,431,433,435,437,439,441,443,445,447,449,451,453,455,457,459,461,463,465,467,469,471,473,475,477,479,481,483,485,487,489,491,493,495,497,499,501,503,505,507
0,0-4 years low risk,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0-4 years high risk,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
% <<<"$(<treatables-000.csv)" awk -F ',' -v OFS=',' 'NR!=1{$2="p"(NR-2)%20+1}1' >treatables-000.csv
% cat treatables-000.csv
t,group,1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127,129,131,133,135,137,139,141,143,145,147,149,151,153,155,157,159,161,163,165,167,169,171,173,175,177,179,181,183,185,187,189,191,193,195,197,199,201,203,205,207,209,211,213,215,217,219,221,223,225,227,229,231,233,235,237,239,241,243,245,247,249,251,253,255,257,259,261,263,265,267,269,271,273,275,277,279,281,283,285,287,289,291,293,295,297,299,301,303,305,307,309,311,313,315,317,319,321,323,325,327,329,331,333,335,337,339,341,343,345,347,349,351,353,355,357,359,361,363,365,367,369,371,373,375,377,379,381,383,385,387,389,391,393,395,397,399,401,403,405,407,409,411,413,415,417,419,421,423,425,427,429,431,433,435,437,439,441,443,445,447,449,451,453,455,457,459,461,463,465,467,469,471,473,475,477,479,481,483,485,487,489,491,493,495,497,499,501,503,505,507
0,p1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,p2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
treatables-???.csv
와일드카드 패턴과 일치하는 현재 작업 디렉터리의 모든 파일에 대해 이 작업을 반복하려면 Bash for
루프를 사용할 수 있습니다.
for f in treatables-???.csv; do <<<"$(<"$f")" awk -F ',' -v OFS=',' 'NR!=1{$2="p"(NR-2)%20+1}1' >"$f"; done
답변2
루프를 사용하여 작업을 수행할 수 있습니다 nl
(N일련번호나이내) 및 sed
(에스트린이자형편집하다)
for f in scenario*.csv
do
#next will numerate all lines exept first (started without number)
nl -bp^[0-9] -nln -w1 "$f" |
sed '
#add the «p» before line number
s/^[0-9]/p&/
#put «pNUM» on the place of second field started with «NUM-NUM»
s/\(^p[0-9]*\)\s*\([0-9]*,\s*\)[0-9]-[0-9][^,]*/\2\1/
#removes spaces from the line begining (may be for header only)
s/^\s*//
' > out.tmp #outputs changed lines into temporary file
mv out.tmp "$f" #move temp file to original
done
rm out.tmp #delete temp file
답변3
고유한 문구 목록이 있고 목록의 첫 번째 문구를 'p1'로 바꾸고 두 번째 문구를 'p2'로 바꾸고 싶다는 내용을 확인했습니다. 열 너비를 유지하려면 다음과 같이 할 수 있습니다.
for filename in *.csv; do
awk '
BEGIN {
FS = ","
n = 0
}
{
if (NR > 1) {
if (!($2 in p)) {
n++
p[$2] = n
}
$2 = "p" p[$2]
}
for (i = 1; i <= NF; i++) {
sub("^[ ]+", "", $i)
if (i != NF) {
$i = $i ","
}
}
# Add more columns and adjust the column widths to
# your liking here.
printf "%-3s%-10s%-3s%-3s%-3s\n", $1, $2, $3, $4, $5
}
' "$filename" > "$filename.tmp"
mv "$filename.tmp" "$filename"
done
답변4
다음은 작업을 수행하는 Perl 스크립트입니다.
%patterns
필요에 따라 해시에 더 많은 패턴과 대체 항목을 추가 할 수 있습니다 . 각 줄 끝에 쉼표를 잊지 마세요.
패턴은 리터럴 문자열이 아닌 정규식으로 해석됩니다. 따라서 패턴에 정규식 특수 문자(예 *
: , (
, )
, ?
, 등)가 있는 경우 ( , , , , 등 )을 사용하여 해당 문자를 이스케이프 +
해야 합니다 .\
\*
\(
\)
\?
\+
,\t
원래 입력에 공백이 여러 개 있는 경우(쉼표 및 단일 탭)을 사용하여 모든 필드를 연결하기 때문에 스크립트는 출력을 약간 변경합니다 . 이것이 중요한 경우 해당 print 문을 조정하여 동일하거나 유사한 출력을 생성할 수 있습니다(예: printf
대신 을 사용하여 print join()
).
$ cat bissi.pl
#! /usr/bin/perl
use strict;
# optimisation: use qr// for the search patterns so that
# the hash keys are pre-compiled regular expressions.
# this makes the for loop later MUCH faster if there are
# lots of patterns and lots of input lines to process.
my %patterns = (
qr/0-4 years low risk/ => 'p1',
qr/0-4 years high risk/ => 'p2',
qr/65\+ years low risk/ => 'p19',
qr/65\+ years pregnant women/ => 'p20',
);
while(<>) {
chomp;
my @line = split /,\s*/;
foreach my $key (keys %patterns) {
# perl arrays are zero based, so $line[1] is 2nd field
if ($line[1] =~ m/$key/) {
$line[1] = $patterns{$key} ;
last;
}
}
print join(",\t",@line), "\n";
}
그러면 다음과 같은 출력이 생성됩니다.
$ ./bissi.pl input.txt
t, group, 1, 3, 5
0, p1, 0, 0, 1
0, p2, 0, 0, 0
0, p1, 0, 0, 0
150개 파일을 모두 변환하려면 다음과 같이 셸 for 루프로 래핑할 수 있습니다.
mkdir -p new
for i in {1..150} ; do
./bissi.pl "scenario$i.csv" > "new/scenario$i.csv"
done