아래와 같이 큰 입력 파일이 있습니다. "PRIMER_LEFT_NUM_RETURNED"가 0이 아닌 경우 해당 SEQUENCE_ID, SEQUENCE_TEMPLATE, PRIMER_LEFT_NUM_RETURNED, PRIMER_RIGHT_NUM_RETURNED, PRIMER_INTERNAL_NUM_RETURNED, PRIMER_PAIR_NUM_RETURNED, PRIMER_PAIR_0_PENALTY 및 "_0" 모든 필드를 포함하는 항목을 인쇄합니다.
입력 파일:
SEQUENCE_ID=Contig1
SEQUENCE_TEMPLATE=AAGTCGCCCCTCCAT
PRIMER_LEFT_NUM_RETURNED=2
PRIMER_RIGHT_NUM_RETURNED=2
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=5
PRIMER_PAIR_0_PENALTY=7.398828
PRIMER_LEFT_0_SEQ=AAGTCGCCCC
PRIMER_RIGHT_0_SEQ=aaaagaca
PRIMER_LEFT_0=0,20
PRIMER_RIGHT_0=69,20
PRIMER_LEFT_0_END_STABILITY=2
PRIMER_RIGHT_0_END_STABILITY=3
PRIMER_PAIR_0_PRODUCT_SIZE=70
PRIMER_PAIR_1_PENALTY=7.78
PRIMER_LEFT_1_PENALTY=1
PRIMER_RIGHT_1_PENALTY=6.7
PRIMER_LEFT_1_SEQ=AAGTCGCCCCTCCA
PRIMER_RIGHT_1_SEQ=aaaagacagagaG
PRIMER_LEFT_1=0,19
PRIMER_RIGHT_1=69,25
PRIMER_LEFT_1_END_STABILITY=3
PRIMER_RIGHT_1_END_STABILITY=3
PRIMER_PAIR_1_PRODUCT_SIZE=70
=
SEQUENCE_ID=Contig31
SEQUENCE_TEMPLATE=ACCCCTTTTT
PRIMER_LEFT_NUM_RETURNED=0
PRIMER_RIGHT_NUM_RETURNED=0
=
SEQUENCE_ID=Contig22
SEQUENCE_TEMPLATE=CCGTCGCCCC
PRIMER_LEFT_NUM_RETURNED=3
PRIMER_RIGHT_NUM_RETURNED=3
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=5
PRIMER_PAIR_0_PENALTY=7
PRIMER_LEFT_0_SEQ=AAGTCGCC
PRIMER_RIGHT_0_SEQ=agaGAGTA
PRIMER_LEFT_0=0,20
PRIMER_RIGHT_0=69,20
PRIMER_LEFT_0_END_STABILITY=2
PRIMER_RIGHT_0_END_STABILITY=3
PRIMER_PAIR_0_PRODUCT_SIZE=70
PRIMER_PAIR_1_PENALTY=7
PRIMER_LEFT_1_PENALTY=1
PRIMER_RIGHT_1_PENALTY=6
PRIMER_LEFT_1_SEQ=AAGTCGC
PRIMER_RIGHT_1_SEQ=aaaag
PRIMER_LEFT_1=0,19
PRIMER_RIGHT_1=69,25
PRIMER_LEFT_1_END_STABILITY=3
PRIMER_RIGHT_1_END_STABILITY=3
PRIMER_PAIR_1_PRODUCT_SIZE=73
PRIMER_PAIR_2_PENALTY=7
PRIMER_LEFT_2_PENALTY=1
PRIMER_RIGHT_2_PENALTY=6
PRIMER_LEFT_2_SEQ=AAGTCGC
PRIMER_RIGHT_2_SEQ=aaaag
PRIMER_LEFT_2=0,19
PRIMER_RIGHT_2=69,25
PRIMER_LEFT_2_END_STABILITY=3
PRIMER_RIGHT_2_END_STABILITY=3
PRIMER_PAIR_2_PRODUCT_SIZE=75
=
예상 출력:
SEQUENCE_ID=Contig1
SEQUENCE_TEMPLATE=AAGTCGCCCCTCCAT
PRIMER_LEFT_NUM_RETURNED=2
PRIMER_RIGHT_NUM_RETURNED=2
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=5
PRIMER_PAIR_0_PENALTY=7.398828
PRIMER_LEFT_0_SEQ=AAGTCGCCCC
PRIMER_RIGHT_0_SEQ=aaaagaca
PRIMER_LEFT_0=0,20
PRIMER_RIGHT_0=69,20
PRIMER_LEFT_0_END_STABILITY=2
PRIMER_RIGHT_0_END_STABILITY=3
PRIMER_PAIR_0_PRODUCT_SIZE=70
=
SEQUENCE_ID=Contig22
SEQUENCE_TEMPLATE=CCGTCGCCCC
PRIMER_LEFT_NUM_RETURNED=3
PRIMER_RIGHT_NUM_RETURNED=3
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=5
PRIMER_PAIR_0_PENALTY=7
PRIMER_LEFT_0_SEQ=AAGTCGCC
PRIMER_RIGHT_0_SEQ=agaGAGTA
PRIMER_LEFT_0=0,20
PRIMER_RIGHT_0=69,20
PRIMER_LEFT_0_END_STABILITY=2
PRIMER_RIGHT_0_END_STABILITY=3
PRIMER_PAIR_0_PRODUCT_SIZE=70
=
awk를 이용하여 해결하려고 했으나 행과 열의 형태가 아니기 때문에 해결하지 못했습니다. 이 문제를 해결하는 데 도움을 주시면 정말 감사하겠습니다. 감사해요.
답변1
입력 데이터는 단락 중심이므로 한 줄씩 읽지 않고 단락별로 읽어 보겠습니다.
awk -v RS="\n=\n" '
/PRIMER_LEFT_NUM_RETURNED=[^0]/ {
n = split($0, lines, /\n/)
for (i=1; i<=n; i++) {
if (lines[i] ~ /^(SEQUENCE_ID|SEQUENCE_TEMPLATE|PRIMER_LEFT_NUM_RETURNED|PRIMER_RIGHT_NUM_RETURNED|PRIMER_INTERNAL_NUM_RETURNED|PRIMER_PAIR_NUM_RETURNED|[^=]+_0[^=]*)=/)
print lines[i]
}
print "="
}
' input.file
또는 Perl과 동등한 것(더 읽기 쉬운 "확장" 정규식을 허용함)
perl -0777 -ne '
BEGIN {
$wanted = qr{
^ # at the beginning of the string
(?: SEQUENCE_ID # match one of these words
| SEQUENCE_TEMPLATE
| PRIMER_LEFT_NUM_RETURNED
| PRIMER_RIGHT_NUM_RETURNED
| PRIMER_INTERNAL_NUM_RETURNED
| PRIMER_PAIR_NUM_RETURNED
| [^=]+_0[^=]*
)
= # followed by an equal sign
}x
}
for (split /^=$/m) {
if (/PRIMER_LEFT_NUM_RETURNED=[^0]/) {
print join("\n", grep {$_ =~ $wanted} split /\n/), "\n=\n";
}
}
# or, as a single command:
#
# print
# map {join("\n", grep {$_ =~ $wanted} split /\n/) . "\n=\n"}
# grep {/PRIMER_LEFT_NUM_RETURNED=[^0]/}
# split /^=$/m
' input.file
답변2
레코드 구분 기호를 로 설정할 수 있습니다 \n=\n
. 이는 개행, a =
, 그리고 다시 개행입니다. 필요한 경우 awk를 사용할 수 있습니다.
$ awk -v RS='\n=\n' -v OFS="\n" '!/PRIMER_LEFT_NUM_RETURNED=0/{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,RS}' file
SEQUENCE_ID=Contig1
SEQUENCE_TEMPLATE=AAGTCGCCCCTCCAT
PRIMER_LEFT_NUM_RETURNED=2
PRIMER_RIGHT_NUM_RETURNED=2
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=5
PRIMER_PAIR_0_PENALTY=7.398828
PRIMER_LEFT_0_SEQ=AAGTCGCCCC
PRIMER_RIGHT_0_SEQ=aaaagaca
PRIMER_LEFT_0=0,20
PRIMER_RIGHT_0=69,20
PRIMER_LEFT_0_END_STABILITY=2
PRIMER_RIGHT_0_END_STABILITY=3
PRIMER_PAIR_0_PRODUCT_SIZE=70
=
SEQUENCE_ID=Contig22
SEQUENCE_TEMPLATE=CCGTCGCCCC
PRIMER_LEFT_NUM_RETURNED=3
PRIMER_RIGHT_NUM_RETURNED=3
PRIMER_INTERNAL_NUM_RETURNED=0
PRIMER_PAIR_NUM_RETURNED=5
PRIMER_PAIR_0_PENALTY=7
PRIMER_LEFT_0_SEQ=AAGTCGCC
PRIMER_RIGHT_0_SEQ=agaGAGTA
PRIMER_LEFT_0=0,20
PRIMER_RIGHT_0=69,20
PRIMER_LEFT_0_END_STABILITY=2
PRIMER_RIGHT_0_END_STABILITY=3
PRIMER_PAIR_0_PRODUCT_SIZE=70
=
그러나 이것은. 가정된 필드는 _0
항상 동일합니다. 그렇지 않은 경우 다음을 수행할 수 있습니다.
$ awk -v RS='\n=\n' -v OFS="\n\n" '!/PRIMER_LEFT_NUM_RETURNED=0/{printf "%s\n\n", $0}' file |
grep -P 'SEQUENCE_ID|SEQUENCE_TEMPLATE|PRIMER_LEFT_NUM_RETURNED|PRIMER_RIGHT_NUM_RETURNED|PRIMER_INTERNAL_NUM_RETURNED|PRIMER_PAIR_NUM_RETURNED|_0'
답변3
직접적인 방법:
awk '$3 !~ "=0"{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,RS}' RS='=\n' FS='\n' OFS='\n' file
이렇게 하면 뒤에 빈 줄이 추가되며 =
필요한 경우 쉽게 제거할 수 있습니다.