첫 번째 열의 첫 번째 공백 뒤의 문자열만 제거

Question 1

awk 'BEGIN{ OFS=FS="\t" } 
  !/^#/{ sub(/ [0-9]+$/, "", $1) }
  1
' LAB330_TE_annotation.gff3 > LAB330_TE_annotation.fix.gff3

이렇게 하면 헤더 행이 #수정되지 않은 상태로 시작되고 첫 번째 필드 끝에 있는 공백 문자와 그 뒤에 최소한 하나의 숫자가 빈 문자열로 대체됩니다.

Answer

awk 'BEGIN{ OFS=FS="\t" } 
  !/^#/{ sub(/ [0-9]+$/, "", $1) }
  1
' LAB330_TE_annotation.gff3 > LAB330_TE_annotation.fix.gff3

이렇게 하면 헤더 행이 #수정되지 않은 상태로 시작되고 첫 번째 필드 끝에 있는 공백 문자와 그 뒤에 최소한 하나의 숫자가 빈 문자열로 대체됩니다.

Question 2

를 사용하여 두 번째 열을 삭제할 수 있습니다 cut. 기본 구분 기호는 탭이므로 -d스위치를 지정할 필요가 없습니다 .

$ cut -f 1,3- LAB330_TE_annotation.gff3
##gff-version 3
##date Sun Feb 14 08:41:36 UTC 2021
##Identity: Sequence identity (0-1) between the library sequence and the target region.
##ltr_identity: Sequence identity (0-1) between the left and right LTR regions.
##tsd: target site duplication.
##seqid source sequence_ontology start end score strand phase attributes
NbLab330C00 EDTA    Gypsy_LTR_retrotransposon   2   3364    20798   -   .   ID=TE_homo_0;Name=TE_00007365_INT;Classification=LTR/Gypsy;Sequence_ontology=SO:0002265;Identity=0.868;Method=homology
NbLab330C00 EDTA    Gypsy_LTR_retrotransposon   3367    4198    3385    -   .   ID=TE_homo_1;Name=TE_00008087_LTR;Classification=LTR/Gypsy;Sequence_ontology=SO:0002265;Identity=0.865;Method=homology
NbLab330C00 EDTA    hAT_TIR_transposon  4424    4715    1278    +   .   ID=TE_homo_2;Name=TE_00003964;Classification=DNA/DTA;Sequence_ontology=SO:0002279;Identity=0.834;Method=homology
NbLab330C00 EDTA    hAT_TIR_transposon  5236    5453    835 +   .   ID=TE_homo_3;Name=TE_00001425;Classification=DNA/DTA;Sequence_ontology=SO:0002279;Identity=0.828;Method=homology

선택하다:$ cut -f 2 --complement LAB330_TE_annotation.gff3

Answer

를 사용하여 두 번째 열을 삭제할 수 있습니다 cut. 기본 구분 기호는 탭이므로 -d스위치를 지정할 필요가 없습니다 .

$ cut -f 1,3- LAB330_TE_annotation.gff3
##gff-version 3
##date Sun Feb 14 08:41:36 UTC 2021
##Identity: Sequence identity (0-1) between the library sequence and the target region.
##ltr_identity: Sequence identity (0-1) between the left and right LTR regions.
##tsd: target site duplication.
##seqid source sequence_ontology start end score strand phase attributes
NbLab330C00 EDTA    Gypsy_LTR_retrotransposon   2   3364    20798   -   .   ID=TE_homo_0;Name=TE_00007365_INT;Classification=LTR/Gypsy;Sequence_ontology=SO:0002265;Identity=0.868;Method=homology
NbLab330C00 EDTA    Gypsy_LTR_retrotransposon   3367    4198    3385    -   .   ID=TE_homo_1;Name=TE_00008087_LTR;Classification=LTR/Gypsy;Sequence_ontology=SO:0002265;Identity=0.865;Method=homology
NbLab330C00 EDTA    hAT_TIR_transposon  4424    4715    1278    +   .   ID=TE_homo_2;Name=TE_00003964;Classification=DNA/DTA;Sequence_ontology=SO:0002279;Identity=0.834;Method=homology
NbLab330C00 EDTA    hAT_TIR_transposon  5236    5453    835 +   .   ID=TE_homo_3;Name=TE_00001425;Classification=DNA/DTA;Sequence_ontology=SO:0002279;Identity=0.828;Method=homology

선택하다:$ cut -f 2 --complement LAB330_TE_annotation.gff3

첫 번째 열의 첫 번째 공백 뒤의 문자열만 제거

답변1

답변2

관련 정보