다양한 종류의 패턴을 그리세요

다양한 종류의 패턴을 그리세요

다음과 같은 파일이 있습니다.

1>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp  (619 aa)
37.3% identity
>>sp|P43238|ALL12_ARAHY Allergen Ara h 1, clone P41B OS=  (626 aa)
37.3% identity
>>tr|N1NG13|N1NG13_ARAHY Seed storage protein Ara h1 OS=  (626 aa)
37.3% identity
>>tr|Q6PSU6|Q6PSU6_ARAHY Conarachin (Fragment) OS=Arachi  (303 aa)
29.4% identity
>>tr|Q6PSU3|Q6PSU3_ARAHY Conarachin (Fragment) OS=Arachi  (580 aa)
29.4% identity
>>tr|A5Z1Q5|A5Z1Q5_ARADU Ara d 6 OS=Arachis duranensis O  (145 aa)
23.7% identity
>>sp|P43237|ALL11_ARAHY Allergen Ara h 1, clone P17 OS=A  (614 aa)
29.4% identity
>>tr|A8VT50|A8VT50_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity
>>tr|A1YQB2|A1YQB2_BOVIN Alpha lactabumin (Fragment) OS=  (52 aa)
50.0% identity
>>tr|A5Z1Q8|A5Z1Q8_ARADU Ara d 2.01 OS=Arachis duranensi  (160 aa)
44.8% identity
>>tr|A8VT44|A8VT44_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity
>>tr|A8VT41|A8VT41_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity
>>tr|N1NEW2|N1NEW2_ARADU Seed storage protein Ara h1 OS=  (614 aa)
29.4% identity
>>tr|B3IXL2|B3IXL2_ARAHY Main allergen Ara h1 OS=Arachis  (614 aa)
29.4% identity
>>tr|A8VT50|A8VT50_ARADU Conglutin OS=Arachis duranensis  (160 aa)
2>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp  (619 aa)
37.3% identity
>>sp|P43238|ALL12_ARAHY Allergen Ara h 1, clone P41B OS=  (626 aa)
37.3% identity

ID가 35% 이상인 행 위의 행을 검색하고 싶습니다. 예상되는 출력은 다음과 같습니다.

1>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp  (619 aa)
37.3% identity
>>sp|P43238|ALL12_ARAHY Allergen Ara h 1, clone P41B OS=  (626 aa)
37.3% identity
>>tr|N1NG13|N1NG13_ARAHY Seed storage protein Ara h1 OS=  (626 aa)
37.3% identity
>>tr|A8VT50|A8VT50_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity
>>tr|A1YQB2|A1YQB2_BOVIN Alpha lactabumin (Fragment) OS=  (52 aa)
50.0% identity
>>tr|A5Z1Q8|A5Z1Q8_ARADU Ara d 2.01 OS=Arachis duranensi  (160 aa)
44.8% identity
>>tr|A8VT44|A8VT44_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity
>>tr|A8VT41|A8VT41_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity  2>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp  (619 aa)
37.3% identity
>>sp|P43238|ALL12_ARAHY Allergen Ara h 1, clone P41B OS=  (626 aa)
37.3% identity

나는 다음을 시도했지만 아직 운이 없었습니다.

grep -B1 "35\+.*" -e '>>>' file > output_file

도움을 주시면 감사하겠습니다! 감사합니다!

답변1

수치 비교를 위해 정규식을 사용하는 것을 피하겠습니다. 다시 말하지만, 이는 -B전역 옵션이므로 필연적으로 >>>.

awk에서는 다음과 같은 작업을 수행할 수 있습니다.

$ awk '/>>>/ {print} />>tr/ {last = $0} $1+0 >= 35 {print last; print}' file
1>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp  (619 aa)
37.3% identity
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp  (619 aa)
37.3% identity
>>tr|N1NG13|N1NG13_ARAHY Seed storage protein Ara h1 OS=  (626 aa)
37.3% identity
>>tr|A8VT50|A8VT50_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity
>>tr|A1YQB2|A1YQB2_BOVIN Alpha lactabumin (Fragment) OS=  (52 aa)
50.0% identity
>>tr|A5Z1Q8|A5Z1Q8_ARADU Ara d 2.01 OS=Arachis duranensi  (160 aa)
44.8% identity
>>tr|A8VT44|A8VT44_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity
>>tr|A8VT41|A8VT41_ARADU Conglutin OS=Arachis duranensis  (160 aa)
44.8% identity
2>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp  (619 aa)
37.3% identity
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp  (619 aa)
37.3% identity

백분율 문자열 변환은 $1 + 0최소한 gawk및 에서 지원되는 것 같습니다 mawk.

관련 정보