awk를 사용하여 마지막 행과 다음 행을 기준으로 12를 빼거나 합하려면 어떻게 해야 합니까?

awk를 사용하여 마지막 행과 다음 행을 기준으로 12를 빼거나 합하려면 어떻게 해야 합니까?

나는 다음과 같은 데이터를 가지고 있습니다:

##sequence-region Q75T13 1 641
Q75T13,UniProtKB,Chain,1,641,.,.,.,ID
Q75T13,UniProtKB,Topological domain,1,60,.,.,.,Note=Cytoplasmic
Q75T13,UniProtKB,Transmembrane,61,85,.,.,.,Note=Helical
Q75T13,UniProtKB,Topological domain,86,641,.,.,.,Note=Lumenal


##sequence-region Q9BRR3 1 403
Q9BRR3,UniProtKB,Chain,1,403,.,.,.,ID
Q9BRR3,UniProtKB,Topological domain,1,22,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Transmembrane,23,43,.,.,.,Note=Helical
Q9BRR3,UniProtKB,Topological domain,44,259,.,.,.,Note=Cytoplasmic

##sequence-region Q96FM1 1 250
Q96FM1,UniProtKB,Topological domain,120,135,.,.,.,Note=Cytoplasmic
Q96FM1,UniProtKB,Transmembrane,136,156,.,.,.,Note=Helical
Q96FM1,UniProtKB,Topological domain,157,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Transmembrane,170,190,.,.,.,Note=Helical
Q96FM1,UniProtKB,Topological domain,191,250,.,.,.,Note=Lumenal

awk 코드가 어떻게 생겼는지 궁금합니다.

lumenal이라는 단어가 있는 행, 이전 행에 transmembrane이라는 단어가 있는 경우 열 4에서 -12를 빼고 lumenal이라는 단어가 있는 행을 인쇄합니다. "lumenal"이라는 단어가 있는 행의 다음 행에 "transmembrane"이라는 단어가 포함되어 있으면 열 5에 +12를 추가하고 "lumenal"이라는 단어가 있는 행을 인쇄합니다. 최종 파일은 다음과 같습니다:

Q75T13,UniProtKB,Topological domain,74,641,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Topological domain,1,34,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,145,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,157,181,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,179,250,.,.,.,Note=Lumenal

누구든지 나를 도와줄 수 있나요? 나는 조금 붙어 있습니다. awk와 grep을 사용하려고 합니다.

답변1

다음 명령을 시도해 보십시오:

root@u2004:~# cat test
##sequence-region Q75T13 1 641
Q75T13,UniProtKB,Chain,1,641,.,.,.,ID
Q75T13,UniProtKB,Topological domain,1,60,.,.,.,Note=Cytoplasmic
Q75T13,UniProtKB,Transmembrane,61,85,.,.,.,Note=Helical
Q75T13,UniProtKB,Topological domain,86,641,.,.,.,Note=Lumenal


##sequence-region Q9BRR3 1 403
Q9BRR3,UniProtKB,Chain,1,403,.,.,.,ID
Q9BRR3,UniProtKB,Topological domain,1,22,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Transmembrane,23,43,.,.,.,Note=Helical
Q9BRR3,UniProtKB,Topological domain,44,259,.,.,.,Note=Cytoplasmic

##sequence-region Q96FM1 1 250
Q96FM1,UniProtKB,Topological domain,120,135,.,.,.,Note=Cytoplasmic
Q96FM1,UniProtKB,Transmembrane,136,156,.,.,.,Note=Helical
Q96FM1,UniProtKB,Topological domain,157,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Transmembrane,170,190,.,.,.,Note=Helical
Q96FM1,UniProtKB,Topological domain,191,250,.,.,.,Note=Lumenal
root@u2004:~# 
root@u2004:~# awk -F, -v OFS=, '{while(1){if($0~/Lumenal/){a=$0; $4-=12;p=$0; $0=a;$5+=12;n=$0; if(index(pre,"Transmembrane")>0)print p; if(getline>0){if(index($0,"Transmembrane"))print n; if($0~/Lumenal/){pre=$0; continue}}} break}} {pre=$0}' test
Q75T13,UniProtKB,Topological domain,74,641,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Topological domain,1,34,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,145,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,157,181,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,179,250,.,.,.,Note=Lumenal
root@u2004:~#

답변2

3줄의 스크롤 버퍼를 유지하고 다음을 확인하세요.

$ cat tst.awk
BEGIN { FS=OFS="," }
{
    nxt = $0
    prt()
}
END {
    prt()
}

function prt() {
    if ( cur ~ /Lumenal/ ) {
        if ( pre ~ /Transmembrane/ ) {
            $0 = cur
            $4 -= 12
            print
        }

        if ( nxt ~ /Transmembrane/ ) {
            $0 = cur
            $5 += 12
            print
        }
    }

    pre = cur
    cur = nxt
    nxt = ""
}

$ awk -f tst.awk file
Q75T13,UniProtKB,Topological domain,74,641,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Topological domain,1,34,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,145,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,157,181,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,179,250,.,.,.,Note=Lumenal

답변3

열 구분 기호를 쉼표로 변경하기 전에 이 문제를 해결했습니다. 첫 번째 작업은 테스트 파일의 여러 공백을 탭으로 변경하는 것입니다.

$ cat indata 
##sequence-region Q75T13 1 641
Q75T13  UniProtKB   Chain   1   641 .   .   .   ID
Q75T13  UniProtKB   Topological domain  1   60  .   .   .   Note=Cytoplasmic    
Q75T13  UniProtKB   Transmembrane   61  85  .   .   .   Note=Helical
Q75T13  UniProtKB   Topological domain  86  641 .   .   .   Note=Lumenal


##sequence-region Q9BRR3 1 403
Q9BRR3  UniProtKB   Chain   1   403 .   .   .   ID
Q9BRR3  UniProtKB   Topological domain  1   22  .   .   .   Note=Lumenal
Q9BRR3  UniProtKB   Transmembrane   23  43  .   .   .   Note=Helical
Q9BRR3  UniProtKB   Topological domain  44  259 .   .   .   Note=Cytoplasmic

##sequence-region Q96FM1 1 250
Q96FM1  UniProtKB   Topological domain  120 135 .   .   .   Note=Cytoplasmic
Q96FM1  UniProtKB   Transmembrane   136 156 .   .   .   Note=Helical
Q96FM1  UniProtKB   Topological domain  157 169 .   .   .   Note=Lumenal
Q96FM1  UniProtKB   Transmembrane   170 190 .   .   .   Note=Helical
Q96FM1  UniProtKB   Topological domain  191 250 .   .   .   Note=Lumenal

다음은 스크립트입니다. 함수의 세 번째 매개변수는 split탭 문자입니다.

#!/bin/bash
awk '
        function add12(out_line) {
                iarr = split( out_line, arr, "  " )
                arr[5] = 12 + arr[5]
                printf( "%s", arr[1])
                for (i=2 ; i<=iarr ; i++) printf( "\t%s", arr[i] )
                printf( "\n" )
        }

        function sub12(out_line) {
                iarr = split( out_line, arr, "  " )
                arr[4] = arr[4] - 12
                printf( "%s", arr[1])
                for (i=2 ; i<=iarr ; i++) printf( "\t%s", arr[i] )
                printf( "\n" )
        }

        NR == 1 { last_line = $0 ; next }
        NR == 2 { test_line = $0 ; next }

        test_line ~ /Lumenal/ {
                if (last_line ~ /Transmembrane/) sub12( test_line )
                if ($0  ~ /Transmembrane/) add12( test_line )
        }

        {
                last_line = test_line
                test_line = $0
        }

        END {
                if (test_line ~ /Lumenal/) {
                        if (last_line ~ /Transmembrane/) sub12( test_line )
                }
        }
' $1

그리고 "증거는 푸딩에 있다":

$ ./doit indata
Q75T13  UniProtKB   Topological domain  74  641 .   .   .   Note=Lumenal
Q9BRR3  UniProtKB   Topological domain  1   34  .   .   .   Note=Lumenal
Q96FM1  UniProtKB   Topological domain  145 169 .   .   .   Note=Lumenal
Q96FM1  UniProtKB   Topological domain  157 181 .   .   .   Note=Lumenal
Q96FM1  UniProtKB   Topological domain  179 250 .   .   .   Note=Lumenal

내가 만들었다 doit2:

$ diff doit*
4c4
<       iarr = split( out_line, arr, "  " )
---
>       iarr = split( out_line, arr, "," )
7c7
<       for (i=2 ; i<=iarr ; i++) printf( "\t%s", arr[i] )
---
>       for (i=2 ; i<=iarr ; i++) printf( ",%s", arr[i] )
12c12
<       iarr = split( out_line, arr, "  " )
---
>       iarr = split( out_line, arr, "," )
15c15
<       for (i=2 ; i<=iarr ; i++) printf( "\t%s", arr[i] )
---
>       for (i=2 ; i<=iarr ; i++) printf( ",%s", arr[i] )

csv 파일을 사용하십시오.

$ ./doit2 comma
Q75T13,UniProtKB,Topological domain,74,641,.,.,.,Note=Lumenal
Q9BRR3,UniProtKB,Topological domain,1,34,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,145,169,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,157,181,.,.,.,Note=Lumenal
Q96FM1,UniProtKB,Topological domain,179,250,.,.,.,Note=Lumenal

관련 정보