나는 다음과 같은 데이터를 가지고 있습니다:
##sequence-region Q75T13 1 641
Q75T13,UniProtKB,Chain,1,641,.,.,.,ID
Q75T13,UniProtKB,Topological domain,1,60,.,.,.,Note=Cytoplasmic
Q75T13,UniProtKB,Transmembrane,61,85,.,.,.,Note=Helical
Q75T13,UniProtKB,Topological domain,86,641,.,.,.,Note=Lumenal
##sequence-region Q9BRR3 1 403
Q9BRR3,UniProtKB,Chain,1,403,.,.,.,ID
Q9BRR3,UniProtKB,Topological domain,1,22,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Transmembrane,23,43,.,.,.,Note=Helical
Q9BRR3,UniProtKB,Topological domain,44,259,.,.,.,Note=Cytoplasmic
##sequence-region Q96FM1 1 250
Q96FM1,UniProtKB,Topological domain,120,135,.,.,.,Note=Cytoplasmic
Q96FM1,UniProtKB,Transmembrane,136,156,.,.,.,Note=Helical
Q96FM1,UniProtKB,Topological domain,157,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Transmembrane,170,190,.,.,.,Note=Helical
Q96FM1,UniProtKB,Topological domain,191,250,.,.,.,Note=Lumenal
awk 코드가 어떻게 생겼는지 궁금합니다.
lumenal이라는 단어가 있는 행, 이전 행에 transmembrane이라는 단어가 있는 경우 열 4에서 -12를 빼고 lumenal이라는 단어가 있는 행을 인쇄합니다. "lumenal"이라는 단어가 있는 행의 다음 행에 "transmembrane"이라는 단어가 포함되어 있으면 열 5에 +12를 추가하고 "lumenal"이라는 단어가 있는 행을 인쇄합니다. 최종 파일은 다음과 같습니다:
Q75T13,UniProtKB,Topological domain,74,641,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Topological domain,1,34,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,145,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,157,181,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,179,250,.,.,.,Note=Lumenal
누구든지 나를 도와줄 수 있나요? 나는 조금 붙어 있습니다. awk와 grep을 사용하려고 합니다.
답변1
다음 명령을 시도해 보십시오:
root@u2004:~# cat test
##sequence-region Q75T13 1 641
Q75T13,UniProtKB,Chain,1,641,.,.,.,ID
Q75T13,UniProtKB,Topological domain,1,60,.,.,.,Note=Cytoplasmic
Q75T13,UniProtKB,Transmembrane,61,85,.,.,.,Note=Helical
Q75T13,UniProtKB,Topological domain,86,641,.,.,.,Note=Lumenal
##sequence-region Q9BRR3 1 403
Q9BRR3,UniProtKB,Chain,1,403,.,.,.,ID
Q9BRR3,UniProtKB,Topological domain,1,22,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Transmembrane,23,43,.,.,.,Note=Helical
Q9BRR3,UniProtKB,Topological domain,44,259,.,.,.,Note=Cytoplasmic
##sequence-region Q96FM1 1 250
Q96FM1,UniProtKB,Topological domain,120,135,.,.,.,Note=Cytoplasmic
Q96FM1,UniProtKB,Transmembrane,136,156,.,.,.,Note=Helical
Q96FM1,UniProtKB,Topological domain,157,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Transmembrane,170,190,.,.,.,Note=Helical
Q96FM1,UniProtKB,Topological domain,191,250,.,.,.,Note=Lumenal
root@u2004:~#
root@u2004:~# awk -F, -v OFS=, '{while(1){if($0~/Lumenal/){a=$0; $4-=12;p=$0; $0=a;$5+=12;n=$0; if(index(pre,"Transmembrane")>0)print p; if(getline>0){if(index($0,"Transmembrane"))print n; if($0~/Lumenal/){pre=$0; continue}}} break}} {pre=$0}' test
Q75T13,UniProtKB,Topological domain,74,641,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Topological domain,1,34,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,145,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,157,181,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,179,250,.,.,.,Note=Lumenal
root@u2004:~#
답변2
3줄의 스크롤 버퍼를 유지하고 다음을 확인하세요.
$ cat tst.awk
BEGIN { FS=OFS="," }
{
nxt = $0
prt()
}
END {
prt()
}
function prt() {
if ( cur ~ /Lumenal/ ) {
if ( pre ~ /Transmembrane/ ) {
$0 = cur
$4 -= 12
print
}
if ( nxt ~ /Transmembrane/ ) {
$0 = cur
$5 += 12
print
}
}
pre = cur
cur = nxt
nxt = ""
}
$ awk -f tst.awk file
Q75T13,UniProtKB,Topological domain,74,641,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Topological domain,1,34,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,145,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,157,181,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,179,250,.,.,.,Note=Lumenal
답변3
열 구분 기호를 쉼표로 변경하기 전에 이 문제를 해결했습니다. 첫 번째 작업은 테스트 파일의 여러 공백을 탭으로 변경하는 것입니다.
$ cat indata
##sequence-region Q75T13 1 641
Q75T13 UniProtKB Chain 1 641 . . . ID
Q75T13 UniProtKB Topological domain 1 60 . . . Note=Cytoplasmic
Q75T13 UniProtKB Transmembrane 61 85 . . . Note=Helical
Q75T13 UniProtKB Topological domain 86 641 . . . Note=Lumenal
##sequence-region Q9BRR3 1 403
Q9BRR3 UniProtKB Chain 1 403 . . . ID
Q9BRR3 UniProtKB Topological domain 1 22 . . . Note=Lumenal
Q9BRR3 UniProtKB Transmembrane 23 43 . . . Note=Helical
Q9BRR3 UniProtKB Topological domain 44 259 . . . Note=Cytoplasmic
##sequence-region Q96FM1 1 250
Q96FM1 UniProtKB Topological domain 120 135 . . . Note=Cytoplasmic
Q96FM1 UniProtKB Transmembrane 136 156 . . . Note=Helical
Q96FM1 UniProtKB Topological domain 157 169 . . . Note=Lumenal
Q96FM1 UniProtKB Transmembrane 170 190 . . . Note=Helical
Q96FM1 UniProtKB Topological domain 191 250 . . . Note=Lumenal
다음은 스크립트입니다. 함수의 세 번째 매개변수는 split
탭 문자입니다.
#!/bin/bash
awk '
function add12(out_line) {
iarr = split( out_line, arr, " " )
arr[5] = 12 + arr[5]
printf( "%s", arr[1])
for (i=2 ; i<=iarr ; i++) printf( "\t%s", arr[i] )
printf( "\n" )
}
function sub12(out_line) {
iarr = split( out_line, arr, " " )
arr[4] = arr[4] - 12
printf( "%s", arr[1])
for (i=2 ; i<=iarr ; i++) printf( "\t%s", arr[i] )
printf( "\n" )
}
NR == 1 { last_line = $0 ; next }
NR == 2 { test_line = $0 ; next }
test_line ~ /Lumenal/ {
if (last_line ~ /Transmembrane/) sub12( test_line )
if ($0 ~ /Transmembrane/) add12( test_line )
}
{
last_line = test_line
test_line = $0
}
END {
if (test_line ~ /Lumenal/) {
if (last_line ~ /Transmembrane/) sub12( test_line )
}
}
' $1
그리고 "증거는 푸딩에 있다":
$ ./doit indata
Q75T13 UniProtKB Topological domain 74 641 . . . Note=Lumenal
Q9BRR3 UniProtKB Topological domain 1 34 . . . Note=Lumenal
Q96FM1 UniProtKB Topological domain 145 169 . . . Note=Lumenal
Q96FM1 UniProtKB Topological domain 157 181 . . . Note=Lumenal
Q96FM1 UniProtKB Topological domain 179 250 . . . Note=Lumenal
내가 만들었다 doit2
:
$ diff doit*
4c4
< iarr = split( out_line, arr, " " )
---
> iarr = split( out_line, arr, "," )
7c7
< for (i=2 ; i<=iarr ; i++) printf( "\t%s", arr[i] )
---
> for (i=2 ; i<=iarr ; i++) printf( ",%s", arr[i] )
12c12
< iarr = split( out_line, arr, " " )
---
> iarr = split( out_line, arr, "," )
15c15
< for (i=2 ; i<=iarr ; i++) printf( "\t%s", arr[i] )
---
> for (i=2 ; i<=iarr ; i++) printf( ",%s", arr[i] )
csv 파일을 사용하십시오.
$ ./doit2 comma
Q75T13,UniProtKB,Topological domain,74,641,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Topological domain,1,34,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,145,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,157,181,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,179,250,.,.,.,Note=Lumenal