Medline 초록에서 약물 이름을 구문 분석해야 합니다. 출력을 가져온 다음 붙여넣기를 사용하여 이 작업을 수행하려고 했지만 동일한 줄에 있더라도 각 일치 항목에 대해 하나의 출력이 생성되기 grep -wf
때문에 grep -owf
출력이 일치하지 않습니다 .grep -owf
패턴 파일:
DrugA
DrugB
DrugC
DrugD
구문 분석할 파일:
In our study, DrugA and DrugB were found to be effective. DrugA was more effective than DrugB.
In our study, DrugC was found to be effective
In our study, DrugX was found to be effective
원하는 출력:
DrugA In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugB In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugC In our study, DrugC was found to be effective
답변1
어쩌면 방법이 있을까요 awk
?
awk '
NR == FNR {
a[$0] = 1
n = length($0)
w = n > w ? n : w
next
}
{
for (i in a)
if ($0 ~ i)
printf "%-* s %s\n", w, i, $0
}
' pattern_file.txt data_file.txt
답변2
엄밀히 말하면 혼자가 아니지만 grep
다음과 같이 작동합니다.
while IFS= read -r pattern; do
grep "$pattern" input | awk -v drug="$pattern" 'BEGIN {OFS="\t"} { print drug,$0}'
done < "patterns"
답변3
일방 sed
통행:
sed 's|.*|/&/{h;s/^/&\\t/p;g}|' pattern_file | sed -nf - input