특정 DNA 염기서열이 파일에 나타나는 횟수 찾기

Question 1

cat dna_textfile 
aaccgtttgtaaccggaac 

#!/bin/bash    
dna_file=/path/to/dna_textfiles
printf "\e[31mNucleotide sequence?:";
read -en 3 userInput
while [[ -z "${userInput}" ]]
do
read -en 3 userInput
done

count=$(grep -o "${userInput}" "${dna_file}" | wc -l)

echo "${userInput}", ${count}

산출:

 ttt, 1

#!/bin/bash
#set first and second arguments (dnafile and base respectively)

dir=$1
base=$2

count=$(grep -o ${base} ${dir} | wc -l)

echo "${base}", "${count}"

산출:

$ ./countmatches dnafile ttt
ttt, 1

@Kusalananda의 댓글에 답장

위의 솔루션이 중요합니다.중복 없음문자열에서 발생 횟수입니다. 예를 들어 문자열 "acacaca"에는 "aca"가 겹치지 않는 2개 항목과 "aca"가 3개 겹치는 경우가 있습니다. 계산을 위해겹치는발생 횟수:

#!/bin/bash
#set first and second arguments (sequence and base respectively)  
sequence=$1
base=$2
diff_sequence_base=$((${#sequence} - ${#base} | bc))

for ((i=0; i <= ${diff_sequence_base}; i++)); do
       [ ${sequence:i:${#base}} = $base ] && ((count++))

done
echo $base, $count


$ ./countmatches acacaca aca
aca, 3


$ ./countmatches aaccgtttttaaccggaac ttt
ttt, 3

Answer

cat dna_textfile 
aaccgtttgtaaccggaac 

#!/bin/bash    
dna_file=/path/to/dna_textfiles
printf "\e[31mNucleotide sequence?:";
read -en 3 userInput
while [[ -z "${userInput}" ]]
do
read -en 3 userInput
done

count=$(grep -o "${userInput}" "${dna_file}" | wc -l)

echo "${userInput}", ${count}

산출:

 ttt, 1

#!/bin/bash
#set first and second arguments (dnafile and base respectively)

dir=$1
base=$2

count=$(grep -o ${base} ${dir} | wc -l)

echo "${base}", "${count}"

산출:

$ ./countmatches dnafile ttt
ttt, 1

@Kusalananda의 댓글에 답장

위의 솔루션이 중요합니다.중복 없음문자열에서 발생 횟수입니다. 예를 들어 문자열 "acacaca"에는 "aca"가 겹치지 않는 2개 항목과 "aca"가 3개 겹치는 경우가 있습니다. 계산을 위해겹치는발생 횟수:

#!/bin/bash
#set first and second arguments (sequence and base respectively)  
sequence=$1
base=$2
diff_sequence_base=$((${#sequence} - ${#base} | bc))

for ((i=0; i <= ${diff_sequence_base}; i++)); do
       [ ${sequence:i:${#base}} = $base ] && ((count++))

done
echo $base, $count


$ ./countmatches acacaca aca
aca, 3


$ ./countmatches aaccgtttttaaccggaac ttt
ttt, 3

Question 2

시퀀스 일치 ttt및 일치 수 보고는 쉽습니다.

$ echo 'aaccgtttgtaaccggaac' | grep -o 'ttt' | wc -l

또는 시퀀스가 파일에 있는 경우:

$ echo 'aaccgtttgtaaccggaac'>dnafile
$ grep -o 'ttt' dnafile | wc -l
1

$ grep -o 'aac' dnafile | wc -l
3

따라서 당신이 해야 할 일은 bash 스크립트에 이 아이디어를 작성하는 것뿐입니다.

#!/bin/bash
dnafile=${1-./dnafile}                   # Name of the file to read (arg 1)
shift                                    # Erase arg 1.

for pat; do                              # Process all the other line arguments.
    printf '%s ' "$pat"                  # Print the patern used.
    grep -o "$pat" "$dnafile" | wc -l    # Find the count of matches.
done                                     # done.

chmod u+x countmatches다음과 같이 스크립트를 호출합니다(실행 가능하게 만든 후).

$ ./countmatches dnafile ttt aac ccgtttg ag
ttt 1
aac 3
ccgtttg 1
ag 0

Answer

시퀀스 일치 ttt및 일치 수 보고는 쉽습니다.

$ echo 'aaccgtttgtaaccggaac' | grep -o 'ttt' | wc -l

또는 시퀀스가 파일에 있는 경우:

$ echo 'aaccgtttgtaaccggaac'>dnafile
$ grep -o 'ttt' dnafile | wc -l
1

$ grep -o 'aac' dnafile | wc -l
3

따라서 당신이 해야 할 일은 bash 스크립트에 이 아이디어를 작성하는 것뿐입니다.

#!/bin/bash
dnafile=${1-./dnafile}                   # Name of the file to read (arg 1)
shift                                    # Erase arg 1.

for pat; do                              # Process all the other line arguments.
    printf '%s ' "$pat"                  # Print the patern used.
    grep -o "$pat" "$dnafile" | wc -l    # Find the count of matches.
done                                     # done.

chmod u+x countmatches다음과 같이 스크립트를 호출합니다(실행 가능하게 만든 후).

$ ./countmatches dnafile ttt aac ccgtttg ag
ttt 1
aac 3
ccgtttg 1
ag 0

Question 3

파일의 행에서 겹치지 않는 염기의 경우, 예:

aaccgtttgtaaccggaac 
acacaca

, 노력하다

awk '{print gsub (base, "&")}' base="ttt" file
1
0

겹치는 경우 다음을 시도하십시오.

awk '{while (0 < T=index ($0, base)) {CNT++; $0 = substr($0, T+1)}; print CNT+0;  T = CNT = 0}' base="aca" file
0
3

줄당이 아닌 파일당 개수가 필요한 경우 CNTs를 추가하고 해당 섹션에 인쇄하세요 END.

Answer

파일의 행에서 겹치지 않는 염기의 경우, 예:

aaccgtttgtaaccggaac 
acacaca

, 노력하다

awk '{print gsub (base, "&")}' base="ttt" file
1
0

겹치는 경우 다음을 시도하십시오.

awk '{while (0 < T=index ($0, base)) {CNT++; $0 = substr($0, T+1)}; print CNT+0;  T = CNT = 0}' base="aca" file
0
3

줄당이 아닌 파일당 개수가 필요한 경우 CNTs를 추가하고 해당 섹션에 인쇄하세요 END.

특정 DNA 염기서열이 파일에 나타나는 횟수 찾기

답변1

답변2

답변3

관련 정보