디렉토리의 파일 구문 분석

Question 1

이를 수행하는 한 가지 방법은 다음과 같습니다. 먼저 fasta 파일을 읽고 유전자 이름으로 구성된 배열을 만듭니다. 이 키에 해당하는 값은 줄 바꿈으로 구분된 현재 다음 n 줄입니다.

출력은 match*.txt 파일에 저장됩니다.

awk -F '|' '
  # @the beginning of file, get its type
  FNR==1 {  inCsv = !(inFasta = FS == "|") }

  # get gene name n record next line number
  inFasta && /^>/ {
    t=$0; gene=$1
    gsub(/^.|[[:space:]]*$/, "", gene)
    nxtln=NR+1
  }
  # fill up the value for the current gene
  inFasta && NR==nxtln { a[gene] = t ORS $0 }

  # we are in CSV file
  # close previously open filehandle
  # open fresh file handle (match*.txt)
  # write to filehandle based on equality
  # of field1 and field3
  inCsv && NF>3 {
    if (FNR == 1) {
      close(outf)
      outf = "match" ++k ".txt"
    }
    print a[$($1==$3?4:3)] > outf
  }

' file_B.fasta FS=, file*.txt

$ cat match1.txt

>gene88 | shahid | ahifehhuh
TAGTCTTTCAAAAGA...
>gene67 | vdiic | behej
GTCAGTTTTTA...
>gene95 | siis | ahifehhniniuh
TAGTCTTTCAAAAGA..

Answer

이를 수행하는 한 가지 방법은 다음과 같습니다. 먼저 fasta 파일을 읽고 유전자 이름으로 구성된 배열을 만듭니다. 이 키에 해당하는 값은 줄 바꿈으로 구분된 현재 다음 n 줄입니다.

출력은 match*.txt 파일에 저장됩니다.

awk -F '|' '
  # @the beginning of file, get its type
  FNR==1 {  inCsv = !(inFasta = FS == "|") }

  # get gene name n record next line number
  inFasta && /^>/ {
    t=$0; gene=$1
    gsub(/^.|[[:space:]]*$/, "", gene)
    nxtln=NR+1
  }
  # fill up the value for the current gene
  inFasta && NR==nxtln { a[gene] = t ORS $0 }

  # we are in CSV file
  # close previously open filehandle
  # open fresh file handle (match*.txt)
  # write to filehandle based on equality
  # of field1 and field3
  inCsv && NF>3 {
    if (FNR == 1) {
      close(outf)
      outf = "match" ++k ".txt"
    }
    print a[$($1==$3?4:3)] > outf
  }

' file_B.fasta FS=, file*.txt

$ cat match1.txt

>gene88 | shahid | ahifehhuh
TAGTCTTTCAAAAGA...
>gene67 | vdiic | behej
GTCAGTTTTTA...
>gene95 | siis | ahifehhniniuh
TAGTCTTTCAAAAGA..

Question 2

awk '{if($1 == $3) {print $1,$2,$NF}else{if($1 == $NF){print $1,$2,$3}}}' filename

산출

gene1 description1 gene88
gene56 description2 gene67
gene6 description3 gene95

Answer

awk '{if($1 == $3) {print $1,$2,$NF}else{if($1 == $NF){print $1,$2,$3}}}' filename

산출

gene1 description1 gene88
gene56 description2 gene67
gene6 description3 gene95

디렉토리의 파일 구문 분석

답변1

답변2

관련 정보