다음 파일이 있습니다
Nt01 maker mRNA 143295 155540 . + . ID=Nitab4.5_0006317g0010.1;Parent=Nitab4.5_0006317g0010;Name=Nitab4.5_0006317g0010.1;_AED=0.08;_eAED=0.08;_QI=0|0.45|0.25|1|0.90|0.75|12|0|1011;Note="Peptidase S59%2C nucleoporin"
Nt01 maker mRNA 170633 173860 . + . ID=Nitab4.5_0006317g0020.1;Parent=Nitab4.5_0006317g0020;Name=Nitab4.5_0006317g0020.1;_AED=0.26;_eAED=0.26;_QI=15|0|0|0.83|0.6|0.33|6|0|424;Note="Putative S-adenosyl-L-methionine-dependent methyltransferase"
awk 'BEGIN{OFS="\t"} {print $1,$9,$4,$5}' test.txt | head
Nt01 ID=Nitab4.5_0006317g0010.1;Parent=Nitab4.5_0006317g0010;Name=Nitab4.5_0006317g0010.1;_AED=0.08;_eAED=0.08;_QI=0|0.45|0.25|1|0.90|0.75|12|0|1011;Note="Peptidase S59%2C nucleoporin" 143295 155540
Nt01 ID=Nitab4.5_0006317g0020.1;Parent=Nitab4.5_0006317g0020;Name=Nitab4.5_0006317g0020.1;_AED=0.26;_eAED=0.26;_QI=15|0|0|0.83|0.6|0.33|6|0|424;Note="Putative S-adenosyl-L-methionine-dependent methyltransferase" 170633 173860
이것을 다음과 같이 단축하는 것이 어떻게 가능합니까?
Nt01 Nitab4.5_0006317g0010.1 143295 155540
Nt01 Nitab4.5_0006317g0020.1 170633 173860
답변1
입력의 공백이 탭이고 출력도 탭으로 구분되기를 원한다고 가정합니다.
$ awk -F'[\t=;]' -v OFS='\t' '{print $1, $10, $4, $5}' file
Nt01 Nitab4.5_0006317g0010.1 143295 155540
Nt01 Nitab4.5_0006317g0020.1 170633 173860
답변2
옮기다GNU sed확장된 정규식 지원 -E
:
# field construction helper variables
t=$'\t'; T="[^$t]"; F=$T+$t
F2=$F$F; F3=$F2$F; FT=$F$T+
# use the helper variables in rearranging
# and pruning the pattern space
sed -Ee "
s/^($F)$FT($t$FT)$t${F3}ID=([^;]*);.*/\1\3\2/
" file.tsv
결과:
Nt01 Nitab4.5_0006317g0010.1 143295 155540
Nt01 Nitab4.5_0006317g0020.1 170633 173860