두 번째 열에 따라 중복 행을 제거하는 방법

두 번째 열에 따라 중복 행을 제거하는 방법

다음 파일이 있습니다.

chr11_pilon3.g3568.t1   transcript:OIT01734 transcript:OIT01734 1.1e-107    389.8   1000    218 992 1   216 130 345 MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDA    MDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDA    MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDAR*  MKVWERVVEARVREMTSISVNQFGFMPGRSTTEAIHLVRRLVEHFRDKKKDLHMVFIDLENAYDKVPREVLWRCLEAKSVPEAYIRVIKDMYDGAKTRVRTVGGDSDHFPVVMGLHQGSALSPLLFALVMDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDAPVRIYKSAILGHLNSHGSQNALAGPVEAEENRQKTKKEVMEEIIQKSKFFKAQKAKDREENDELTEQLDKDFTSLVESKALLSLTQPDKINALKALVNKNISVGNVKKDEVADVPRKASIGKEKPDTYEMLVSEMALDMRARPSDRTKTPEEIAQEEKERLELLEQEXXXXXXXXXXXXXXDGNASDDNSKLVKDPRTVSGDDLGDDLEEVPRTKLGWIGEILRRKENELESEDAASSGDSDDGEDEGXXXXXXXXXXXXXXXXXXXXDEEQGKTQTIKDWEQSDDDIIDTELEDDDEGFGDDAKKVVKIKDHKEENLSITVAAENKKKMQVFYGVLLQYFAVLANKKPLNSKLLNLLVKPLMEMSAVSPYFAAICARQRLQRTRAQFCEDLKNTGKSSWPSLKTIFLLRLWSMIFPCSDFRHCVMTPAILLMCEYLMRCTIISGRDIAIASFLCSLLLSVIKQSQKFCPEAIVFIQTLLMAALDRKQRSNSQLDNLMEIKELGPLLCIRSSKVEMDSLDFLTLMDLPEDSQYFHSDNYRTSMLVTVLETLQGFVNVYKELISFPEIFMLISKLLCKMAGENHIPDALREKIKDVSQLIDTKAQEHHMLRQPLKMRKKKPVPIRMLNPKFEENFVKGRDYDPDRERA    389.8   1000    216 85.6    185 31  200 0   0   92.6    0   22IV6AV2SN4IV11IL12GSDA1PS1GE3ED1MK4AV6VF9DE29IV1HQ6FY2MV5FL1EG10IV14CR1HL4KR1KR5QE5PL2KE2GR6FY6GR3 85.6    1.1e-107    99.1
gene.9403.0.4.p1    transcript:OIT35479 transcript:OIT35479 8.5e-191    667.5   1721    690 406 1   378 1   378 MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVLSTCRSFSKSGVPFHSMVVTGGFCQRTQLENLRQELDILIATPGRFMFLIKEGYLQLTNLKCAVLDEVDILFSDEDFETAFQCLINSSPITTQYLFVTATLPMDIYNKLVESFPDCELVSGPGMHRTSPGLEEFLVDCSGDETAEKSPDTAFINKKNALLHLVEDSPVPKTIVFCNKIDSCRKVENALKRFDRKGFSIKILPFHAALDQRRRLANMEEFRRSKMENVSLFLVCTDRASRGIDFEGVDHVVLFDYPRDPSEYVRRVGRTARGAGGKGKAFIFAVGKQVSLARRIMERNKKGHPVHDVPSILT*  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVCQISSSIKGTFATYSPYCSATTHTKRKK  667.5   1721    378 91.0    344 34  352 0   0   93.1    0   6VASP14PQ3VG50IV25PSXPXDXNXNXHXPXPXPXTXQXSXSXSDN38ND3ITAT14DG20DE1KR2FS11GD14IS4QH30DE4EQ1QR2GD102  91.0    8.5e-191    54.8
gene.9403.0.5.p1    transcript:OIT35479 transcript:OIT35479 8.5e-191    667.5   1721    690 406 1   378 1   378 MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVLSTCRSFSKSGVPFHSMVVTGGFCQRTQLENLRQELDILIATPGRFMFLIKEGYLQLTNLKCAVLDEVDILFSDEDFETAFQCLINSSPITTQYLFVTATLPMDIYNKLVESFPDCELVSGPGMHRTSPGLEEFLVDCSGDETAEKSPDTAFINKKNALLHLVEDSPVPKTIVFCNKIDSCRKVENALKRFDRKGFSIKILPFHAALDQRRRLANMEEFRRSKMENVSLFLVCTDRASRGIDFEGVDHVVLFDYPRDPSEYVRRVGRTARGAGGKGKAFIFAVGKQVSLARRIMERNKKGHPVHDVPSILT*  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVCQISSSIKGTFATYSPYCSATTHTKRKK  667.5   1721    378 91.0    344 34  352 0   0   93.1    0   6VASP14PQ3VG50IV25PSXPXDXNXNXHXPXPXPXTXQXSXSXSDN38ND3ITAT14DG20DE1KR2FS11GD14IS4QH30DE4EQ1QR2GD102  91.0    8.5e-191    54.8
gene.69001.9.9.p1   NisylKD955766g0010.1    NisylKD955766g0010.1    1.4e-294    1011.9  2615    531 530 1   530 1   530 MKEMCLAVAPLPFRLGNNLIFHNPLSIGSSSHMDVTRLNSMGGTTTSLYAESAEKDLSDTVSSSRSEGVPLLHMISENESNNWISGDAVVRESEDDEILSLDGDQMSCSLSVVSDSSSLCGDDFIGFEVASEIFGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKIEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGHRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQEQWKKAFTNCFLMVDDEVGGTGNHEAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPTALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRAIQKGSKDNITVIVVDLKAQRKFKSKT  MKEMCLAVAPLPFRLGNNLIFRNPPSIGSSSHMDATRLNSMGDTTTSLYAESAEKDLSDTVSSSRSEGVPLLPMISENDRNNWIAGDAVVRESEDDEILSLDGDQVSCSLSVVSDSSSLCGDDFIGFEVASDIYGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKLEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGDRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQDQWKKAFTNCFLKVDDEVGGTGNREAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPMALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRATQKGSKDNITVIVVDLKAQRKFKSKT  MKEMCLAVAPLPFRLGNNLIFHNPLSIGSSSHMDVTRLNSMGGTTTSLYAESAEKDLSDTVSSSRSEGVPLLHMISENESNNWISGDAVVRESEDDEILSLDGDQMSCSLSVVSDSSSLCGDDFIGFEVASEIFGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKIEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGHRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQEQWKKAFTNCFLMVDDEVGGTGNHEAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPTALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRAIQKGSKDNITVIVVDLKAQRKFKSKT* MKEMCLAVAPLPFRLGNNLIFRNPPSIGSSSHMDATRLNSMGDTTTSLYAESAEKDLSDTVSSSRSEGVPLLPMISENDRNNWIAGDAVVRESEDDEILSLDGDQVSCSLSVVSDSSSLCGDDFIGFEVASDIYGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKLEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGDRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQDQWKKAFTNCFLKVDDEVGGTGNREAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPMALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRATQKGSKDNITVIVVDLKAQRKFKSKT  1011.9  2615    530 96.6    512 18  519 0   0   97.9    0   21HR2LP9VA7GD29HP5EDSR4SA20MV25ED1FY40IL74HD62ED11MK10HR40TM127IT25 96.6    1.4e-294    99.8
gene.9403.9.5.p1    transcript:OIT35479 transcript:OIT35479 8.5e-191    667.5   1721    690 406 1   378 1   378 MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVLSTCRSFSKSGVPFHSMVVTGGFCQRTQLENLRQELDILIATPGRFMFLIKEGYLQLTNLKCAVLDEVDILFSDEDFETAFQCLINSSPITTQYLFVTATLPMDIYNKLVESFPDCELVSGPGMHRTSPGLEEFLVDCSGDETAEKSPDTAFINKKNALLHLVEDSPVPKTIVFCNKIDSCRKVENALKRFDRKGFSIKILPFHAALDQRRRLANMEEFRRSKMENVSLFLVCTDRASRGIDFEGVDHVVLFDYPRDPSEYVRRVGRTARGAGGKGKAFIFAVGKQVSLARRIMERNKKGHPVHDVPSILT*  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVCQISSSIKGTFATYSPYCSATTHTKRKK  667.5   1721    378 91.0    344 34  352 0   0   93.1    0   6VASP14PQ3VG50IV25PSXPXDXNXNXHXPXPXPXTXQXSXSXSDN38ND3ITAT14DG20DE1KR2FS11GD14IS4QH30DE4EQ1QR2GD102  91.0    8.5e-191    54.8

위 파일에는 유사한 ID가 있습니다.

gene.9403.0.4.p1
gene.9403.0.5.p1
gene.9403.9.5.p1    

보관하면 gene.9403ID만 동일해집니다. 나머지 열은 gene.9403동일하므로 중복 항목을 제거하고 싶습니다.

나는 이것을 사용했고 awk -F"\t" '!seen[$2, $3, $4, $5, $6, $7,$8, $9,$10,$11,$12, $13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31]++' select-results2.txt위의 예에 대한 올바른 결과를 얻었습니다.

chr11_pilon3.g3568.t1   transcript:OIT01734 transcript:OIT01734 1.1e-107    389.8   1000    218 992 1   216 130 345 MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDA    MDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDA    MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDAR*  MKVWERVVEARVREMTSISVNQFGFMPGRSTTEAIHLVRRLVEHFRDKKKDLHMVFIDLENAYDKVPREVLWRCLEAKSVPEAYIRVIKDMYDGAKTRVRTVGGDSDHFPVVMGLHQGSALSPLLFALVMDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDAPVRIYKSAILGHLNSHGSQNALAGPVEAEENRQKTKKEVMEEIIQKSKFFKAQKAKDREENDELTEQLDKDFTSLVESKALLSLTQPDKINALKALVNKNISVGNVKKDEVADVPRKASIGKEKPDTYEMLVSEMALDMRARPSDRTKTPEEIAQEEKERLELLEQEXXXXXXXXXXXXXXDGNASDDNSKLVKDPRTVSGDDLGDDLEEVPRTKLGWIGEILRRKENELESEDAASSGDSDDGEDEGXXXXXXXXXXXXXXXXXXXXDEEQGKTQTIKDWEQSDDDIIDTELEDDDEGFGDDAKKVVKIKDHKEENLSITVAAENKKKMQVFYGVLLQYFAVLANKKPLNSKLLNLLVKPLMEMSAVSPYFAAICARQRLQRTRAQFCEDLKNTGKSSWPSLKTIFLLRLWSMIFPCSDFRHCVMTPAILLMCEYLMRCTIISGRDIAIASFLCSLLLSVIKQSQKFCPEAIVFIQTLLMAALDRKQRSNSQLDNLMEIKELGPLLCIRSSKVEMDSLDFLTLMDLPEDSQYFHSDNYRTSMLVTVLETLQGFVNVYKELISFPEIFMLISKLLCKMAGENHIPDALREKIKDVSQLIDTKAQEHHMLRQPLKMRKKKPVPIRMLNPKFEENFVKGRDYDPDRERA    389.8   1000    216 85.6    185 31  200 0   0   92.6    0   22IV6AV2SN4IV11IL12GSDA1PS1GE3ED1MK4AV6VF9DE29IV1HQ6FY2MV5FL1EG10IV14CR1HL4KR1KR5QE5PL2KE2GR6FY6GR3 85.6    1.1e-107    99.1
gene.9403.0.4.p1    transcript:OIT35479 transcript:OIT35479 8.5e-191    667.5   1721    690 406 1   378 1   378 MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVLSTCRSFSKSGVPFHSMVVTGGFCQRTQLENLRQELDILIATPGRFMFLIKEGYLQLTNLKCAVLDEVDILFSDEDFETAFQCLINSSPITTQYLFVTATLPMDIYNKLVESFPDCELVSGPGMHRTSPGLEEFLVDCSGDETAEKSPDTAFINKKNALLHLVEDSPVPKTIVFCNKIDSCRKVENALKRFDRKGFSIKILPFHAALDQRRRLANMEEFRRSKMENVSLFLVCTDRASRGIDFEGVDHVVLFDYPRDPSEYVRRVGRTARGAGGKGKAFIFAVGKQVSLARRIMERNKKGHPVHDVPSILT*  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVCQISSSIKGTFATYSPYCSATTHTKRKK  667.5   1721    378 91.0    344 34  352 0   0   93.1    0   6VASP14PQ3VG50IV25PSXPXDXNXNXHXPXPXPXTXQXSXSXSDN38ND3ITAT14DG20DE1KR2FS11GD14IS4QH30DE4EQ1QR2GD102  91.0    8.5e-191    54.8
gene.69001.9.9.p1   NisylKD955766g0010.1    NisylKD955766g0010.1    1.4e-294    1011.9  2615    531 530 1   530 1   530 MKEMCLAVAPLPFRLGNNLIFHNPLSIGSSSHMDVTRLNSMGGTTTSLYAESAEKDLSDTVSSSRSEGVPLLHMISENESNNWISGDAVVRESEDDEILSLDGDQMSCSLSVVSDSSSLCGDDFIGFEVASEIFGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKIEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGHRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQEQWKKAFTNCFLMVDDEVGGTGNHEAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPTALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRAIQKGSKDNITVIVVDLKAQRKFKSKT  MKEMCLAVAPLPFRLGNNLIFRNPPSIGSSSHMDATRLNSMGDTTTSLYAESAEKDLSDTVSSSRSEGVPLLPMISENDRNNWIAGDAVVRESEDDEILSLDGDQVSCSLSVVSDSSSLCGDDFIGFEVASDIYGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKLEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGDRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQDQWKKAFTNCFLKVDDEVGGTGNREAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPMALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRATQKGSKDNITVIVVDLKAQRKFKSKT  MKEMCLAVAPLPFRLGNNLIFHNPLSIGSSSHMDVTRLNSMGGTTTSLYAESAEKDLSDTVSSSRSEGVPLLHMISENESNNWISGDAVVRESEDDEILSLDGDQMSCSLSVVSDSSSLCGDDFIGFEVASEIFGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKIEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGHRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQEQWKKAFTNCFLMVDDEVGGTGNHEAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPTALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRAIQKGSKDNITVIVVDLKAQRKFKSKT* MKEMCLAVAPLPFRLGNNLIFRNPPSIGSSSHMDATRLNSMGDTTTSLYAESAEKDLSDTVSSSRSEGVPLLPMISENDRNNWIAGDAVVRESEDDEILSLDGDQVSCSLSVVSDSSSLCGDDFIGFEVASDIYGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKLEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGDRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQDQWKKAFTNCFLKVDDEVGGTGNREAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPMALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRATQKGSKDNITVIVVDLKAQRKFKSKT  1011.9  2615    530 96.6    512 18  519 0   0   97.9    0   21HR2LP9VA7GD29HP5EDSR4SA20MV25ED1FY40IL74HD62ED11MK10HR40TM127IT25 96.6    1.4e-294    99.8

다만, 제가 생각하지 않으면 gene.9403잘못된 내용을 삭제하게 될까봐 걱정이 됩니다. 첫 번째 열도 고려하는 방법이 있나요?

미리 감사드립니다.

답변1

이 시도:

awk '
  {line = gensub(/^([^.]+\.[^.]+)[^[:blank:]]*/, "\1", 1, $0)}
  !seen[line]++
' file

관련 정보