awk를 사용하여 특정 열의 처음 두 단어만 인쇄하는 방법

Question 1

노력하다나뉘다()두 번째 열의 간격을 두고 원하는 만큼 단어를 인쇄하세요.

awk 'BEGIN{ FS=OFS="\t" }
{ split($2, tmp, " "); print $0, tmp[1], tmp[2] }' infile

Answer

노력하다나뉘다()두 번째 열의 간격을 두고 원하는 만큼 단어를 인쇄하세요.

awk 'BEGIN{ FS=OFS="\t" }
{ split($2, tmp, " "); print $0, tmp[1], tmp[2] }' infile

Question 2

더 복잡한 경우 tsv, 예를 들어 필드 내부에 탭이 있는 경우에는 awk제대로 작동하지 않습니다. 그런 다음 다음 python과 같은 적절한 CSV 파서 모듈을 사용해야 합니다 csv.

#!/usr/bin/env python3
import csv
with open('A.tsv') as csvfile:
    reader = csv.reader(csvfile, delimiter='\t')
    for row in reader:
        row.append(' '.join(row[1].split()[:2]))
        print('\t'.join(row))

Answer

더 복잡한 경우 tsv, 예를 들어 필드 내부에 탭이 있는 경우에는 awk제대로 작동하지 않습니다. 그런 다음 다음 python과 같은 적절한 CSV 파서 모듈을 사용해야 합니다 csv.

#!/usr/bin/env python3
import csv
with open('A.tsv') as csvfile:
    reader = csv.reader(csvfile, delimiter='\t')
    for row in reader:
        row.append(' '.join(row[1].split()[:2]))
        print('\t'.join(row))

Question 3

gensub()다음 과 같이 GNU awk를 사용하십시오 \s/\S.

$ awk '{print gensub(/\S+\s+(\S+\s+\S+).*/,"&\t\\1",1)}' file
BC02    Streptococcus oralis  chromosome, complete genome   2712    94  0   99.073  2053209 CP023507.1  1597    Streptococcus oralis
BC02    Staphylococcus aureus  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597    Staphylococcus aureus
BC02    Streptococcus sp.  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597        Streptococcus sp.

또는 더 짧은 내용을 위해 GNU sed를 사용하십시오.

$ sed -E 's/\S+\s+(\S+\s+\S+).*/&\t\1/' file
BC02    Streptococcus oralis  chromosome, complete genome   2712    94  0   99.073  2053209 CP023507.1  1597    Streptococcus oralis
BC02    Staphylococcus aureus  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597    Staphylococcus aureus
BC02    Streptococcus sp.  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597        Streptococcus sp.

위의 예에서는 첫 번째 필드에 공백이 포함되어 있지 않다고 가정합니다.

Answer

gensub()다음 과 같이 GNU awk를 사용하십시오 \s/\S.

$ awk '{print gensub(/\S+\s+(\S+\s+\S+).*/,"&\t\\1",1)}' file
BC02    Streptococcus oralis  chromosome, complete genome   2712    94  0   99.073  2053209 CP023507.1  1597    Streptococcus oralis
BC02    Staphylococcus aureus  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597    Staphylococcus aureus
BC02    Streptococcus sp.  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597        Streptococcus sp.

또는 더 짧은 내용을 위해 GNU sed를 사용하십시오.

$ sed -E 's/\S+\s+(\S+\s+\S+).*/&\t\1/' file
BC02    Streptococcus oralis  chromosome, complete genome   2712    94  0   99.073  2053209 CP023507.1  1597    Streptococcus oralis
BC02    Staphylococcus aureus  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597    Staphylococcus aureus
BC02    Streptococcus sp.  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597        Streptococcus sp.

위의 예에서는 첫 번째 필드에 공백이 포함되어 있지 않다고 가정합니다.

Question 4

Raku(이전 Perl_6) 사용

raku -ne 'print $_, "\t"; .split(/\t/).[1].words.[0..1].put;'

입력 예:

BC02    Streptococcus oralis  chromosome, complete genome   2712    94  0   99.073  2053209 CP023507.1  1597
BC02    Staphylococcus aureus  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597
BC02    Streptococcus sp.  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597

위의 코드를 세 부분으로 나누고,

1). 탭으로 분할하여 두 번째 요소를 꺼냅니다(Raku에서는 번호 매기기가 0부터 시작한다는 점을 기억하세요).

raku -ne '.split(/\t/).[1].put;'

샘플 출력을 제공합니다.

Streptococcus oralis  chromosome, complete genome
Staphylococcus aureus  chromosome, complete genome
Streptococcus sp.  chromosome, complete genome

2). 공백으로 구분된 경우 words처음 두 개(2)를 사용합니다.

raku -ne '.split(/\t/).[1].words.[0..1].put;'

샘플 출력을 제공합니다.

Streptococcus oralis
Staphylococcus aureus
Streptococcus sp.

삼). Raku 테마 변수를 먼저 인쇄 하여 $_(뒤에 ) \t위 내용을 기존 줄 전체와 결합합니다 .

raku -ne 'print $_, "\t"; .split(/\t/).[1].words.[0..1].put;'

샘플 출력을 제공합니다.

BC02    Streptococcus oralis  chromosome, complete genome   2712    94  0   99.073  2053209 CP023507.1  1597    Streptococcus oralis
BC02    Staphylococcus aureus  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597    Staphylococcus aureus
BC02    Streptococcus sp.  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597    Streptococcus sp.

https://raku.org/

Answer

Raku(이전 Perl_6) 사용

raku -ne 'print $_, "\t"; .split(/\t/).[1].words.[0..1].put;'

입력 예:

BC02    Streptococcus oralis  chromosome, complete genome   2712    94  0   99.073  2053209 CP023507.1  1597
BC02    Staphylococcus aureus  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597
BC02    Streptococcus sp.  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597

위의 코드를 세 부분으로 나누고,

1). 탭으로 분할하여 두 번째 요소를 꺼냅니다(Raku에서는 번호 매기기가 0부터 시작한다는 점을 기억하세요).

raku -ne '.split(/\t/).[1].put;'

샘플 출력을 제공합니다.

Streptococcus oralis  chromosome, complete genome
Staphylococcus aureus  chromosome, complete genome
Streptococcus sp.  chromosome, complete genome

2). 공백으로 구분된 경우 words처음 두 개(2)를 사용합니다.

raku -ne '.split(/\t/).[1].words.[0..1].put;'

샘플 출력을 제공합니다.

Streptococcus oralis
Staphylococcus aureus
Streptococcus sp.

삼). Raku 테마 변수를 먼저 인쇄 하여 $_(뒤에 ) \t위 내용을 기존 줄 전체와 결합합니다 .

raku -ne 'print $_, "\t"; .split(/\t/).[1].words.[0..1].put;'

샘플 출력을 제공합니다.

BC02    Streptococcus oralis  chromosome, complete genome   2712    94  0   99.073  2053209 CP023507.1  1597    Streptococcus oralis
BC02    Staphylococcus aureus  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597    Staphylococcus aureus
BC02    Streptococcus sp.  chromosome, complete genome  2712    94  0   99.073  2053209 CP023507.1  1597    Streptococcus sp.

https://raku.org/

awk를 사용하여 특정 열의 처음 두 단어만 인쇄하는 방법

답변1

답변2

답변3

답변4

관련 정보