
kegg reconstruct pathway
출력을 다시 포맷해야 합니다 . file1에는 다음과 같은 내용이 있습니다.
00550 Peptidoglycan biosynthesis (2)
K01000
K02563
00511 Other glycan degradation (8) K01190 K01191
K01192
K01201
K01227
K12309
file2에 다음과 같은 것이 필요합니다.
00550 Peptidoglycan biosynthesis (2) K01000 K02563
00511 Other glycan degradation (6) K01190 K01191 K01192 K01201 K01227 K12309
Linux 또는 Python에서 어떻게 다시 포맷합니까?
감사해요
답변1
이것이 당신에게 얼마나 도움이 될까요?
awk '
!NF {next # don"t process empty lines
}
/^[0-9]+ / {sub (/\([0-9]*\)/, "(" CNT ")", PRT) # for the "glycan" lines (leading numerical)
# correct the count in parentheses
if (PRT) print PRT # print the PRT buffer (NOT first line when empty)
PRT = "" # empty it after print
CNT = gsub (/K[0-9]*/, "&") - 1 # get this line"s "K..." count, corr.for later incr.
}
{PRT = sprintf ("%s%s%s", PRT, PRT?" ":"", $0) # append this line to buffer
CNT++ # increment "K..." count
}
END {sub (/\([0-9]*\)/, "(" CNT ")", PRT) # see above
print PRT
}
' file
00550 Peptidoglycan biosynthesis (2) K01000 K02563
00511 Other glycan degradation (6) K01190 K01191 K01192 K01201 K01227 K12309