일치하는 항목이 있으면 file1의 값 중 일부를 file2의 열에 추가합니다.

Question

위의 질문에 '예'라고 대답했다고 가정하면 다음과 같이 할 수 있습니다 awk.

파싱.awk

FNR == NR {              # Only for the first file
  h[$3] = $1 "," $2      # Collect column one and two into 'h' hash
  next
}

{ split($2, a, ",") }    # Split the second column of the second file to array 'a'

a[1] in h {              # If the first element of the second column of the 
  $2 = h[a[1]] "," $2    # second file is in 'h' then prepend the value to $2
}

1                        # Print all lines

다음과 같이 실행하세요:

awk -f parse.awk FS=',' file1 FS='\t' OFS='\t' file2

산출:

NC_038375   Baculoviridae,Betabaculovirus,Trichoplusia_ni_granulovirus
NC_000867   Corticoviridae,Corticovirus,Pseudoalteromonas_virus_PM2
NC_000866   Prokaryote,Caudovirales,Myoviridae,Tequatrovirus,Escherichia_virus_T4
NC_000929   Prokaryote,Caudovirales,Myoviridae,Muvirus,Escherichia_virus_Mu
NC_004166   Prokaryote,Caudovirales,Siphoviridae,,Bacillus_phage_SPP1
NC_005859   Prokaryote,Caudovirales,Siphoviridae,Tequintavirus,Escherichia_virus_T5
NC_002166   Prokaryote,Caudovirales,Siphoviridae,Hendrixvirus,Escherichia_virus_HK022
NC_008720   Prokaryote,Caudovirales,Podoviridae,Enquatrovirus,Escherichia_virus_N4
NC_002371   Prokaryote,Caudovirales,Podoviridae,Lederbergvirus,Salmonella_virus_P22
NC_011048   Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_phi29
NNC_001929  Geminiviridae,Begomovirus,Abutilon_mosaic_virus
NC_002649   Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_GA1

Answer 1

위의 질문에 '예'라고 대답했다고 가정하면 다음과 같이 할 수 있습니다 awk.

파싱.awk

FNR == NR {              # Only for the first file
  h[$3] = $1 "," $2      # Collect column one and two into 'h' hash
  next
}

{ split($2, a, ",") }    # Split the second column of the second file to array 'a'

a[1] in h {              # If the first element of the second column of the 
  $2 = h[a[1]] "," $2    # second file is in 'h' then prepend the value to $2
}

1                        # Print all lines

다음과 같이 실행하세요:

awk -f parse.awk FS=',' file1 FS='\t' OFS='\t' file2

산출:

NC_038375   Baculoviridae,Betabaculovirus,Trichoplusia_ni_granulovirus
NC_000867   Corticoviridae,Corticovirus,Pseudoalteromonas_virus_PM2
NC_000866   Prokaryote,Caudovirales,Myoviridae,Tequatrovirus,Escherichia_virus_T4
NC_000929   Prokaryote,Caudovirales,Myoviridae,Muvirus,Escherichia_virus_Mu
NC_004166   Prokaryote,Caudovirales,Siphoviridae,,Bacillus_phage_SPP1
NC_005859   Prokaryote,Caudovirales,Siphoviridae,Tequintavirus,Escherichia_virus_T5
NC_002166   Prokaryote,Caudovirales,Siphoviridae,Hendrixvirus,Escherichia_virus_HK022
NC_008720   Prokaryote,Caudovirales,Podoviridae,Enquatrovirus,Escherichia_virus_N4
NC_002371   Prokaryote,Caudovirales,Podoviridae,Lederbergvirus,Salmonella_virus_P22
NC_011048   Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_phi29
NNC_001929  Geminiviridae,Begomovirus,Abutilon_mosaic_virus
NC_002649   Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_GA1

일치하는 항목이 있으면 file1의 값 중 일부를 file2의 열에 추가합니다.

답변1

관련 정보