터미널을 통해 파일에서 영어 단어 찾기

Question 1

GNU grep에는 다음과 같은 옵션이 있습니다:

grep --only-matching --ignore-case --fixed-strings --file /usr/share/dict/british-english-insane /path/to/file.txt

이 출력은 한 줄에 하나의 문자열을 찾습니다. /usr/share/dict/british-english-insane데비안 패키지에서 제공하는 단어 목록 입니다 wbritish-insane.

Answer

GNU grep에는 다음과 같은 옵션이 있습니다:

grep --only-matching --ignore-case --fixed-strings --file /usr/share/dict/british-english-insane /path/to/file.txt

이 출력은 한 줄에 하나의 문자열을 찾습니다. /usr/share/dict/british-english-insane데비안 패키지에서 제공하는 단어 목록 입니다 wbritish-insane.

Question 2

그는 흥미롭다!

file=/usr/share/licenses/common/GPL3/license.txt
dict=/usr/share/dict/cracklib-small

while read word; do
    grep >/dev/null -i "\<$word\>" $file &&
        printf 'Word "%s" found in GPLv3...\n' $word
done < $dict

산출:

Word a found in GPLv3...
Word ability found in GPLv3...
Word about found in GPLv3...
(...)

cracklib-small 파일은 패키지에 포함되어 있습니다.cracklib http://sourceforge.net/projects/cracklib

Answer

그는 흥미롭다!

file=/usr/share/licenses/common/GPL3/license.txt
dict=/usr/share/dict/cracklib-small

while read word; do
    grep >/dev/null -i "\<$word\>" $file &&
        printf 'Word "%s" found in GPLv3...\n' $word
done < $dict

산출:

Word a found in GPLv3...
Word ability found in GPLv3...
Word about found in GPLv3...
(...)

cracklib-small 파일은 패키지에 포함되어 있습니다.cracklib http://sourceforge.net/projects/cracklib

Question 3

grep기반 솔루션은 특히 큰 단어 목록의 경우 매우 느린 경우가 많습니다.

단어 목록이 이미 정렬되어 있다는 사실을 활용할 수 있습니다(그러나 내 시스템에서는 UTF-8로 인코딩되었음에도 불구하고 적어도 영국 영어가 POSIX/C 로케일에서 이미 정렬되어 있는 것 같습니다).

tr -cs "[:alpha:]'" '[\n*]' < /etc/passwd |
  LC_ALL=C sort -u |
  LC_ALL=C comm -12 - /usr/share/dict/british-english-insane

대소문자를 구분하지 않고 단어를 찾으려면 사전에 모든 항목을 소문자 또는 대문자로 변환하는 것이 좋습니다.

Answer

grep기반 솔루션은 특히 큰 단어 목록의 경우 매우 느린 경우가 많습니다.

단어 목록이 이미 정렬되어 있다는 사실을 활용할 수 있습니다(그러나 내 시스템에서는 UTF-8로 인코딩되었음에도 불구하고 적어도 영국 영어가 POSIX/C 로케일에서 이미 정렬되어 있는 것 같습니다).

tr -cs "[:alpha:]'" '[\n*]' < /etc/passwd |
  LC_ALL=C sort -u |
  LC_ALL=C comm -12 - /usr/share/dict/british-english-insane

대소문자를 구분하지 않고 단어를 찾으려면 사전에 모든 항목을 소문자 또는 대문자로 변환하는 것이 좋습니다.

Question 4

file=/usr/lib/python2.6/LICENSE.txt
dict=/usr/share/dict/british-english-huge   # or any suitable list

sort "$dict" \
     <(sed "s/[].,\"?!;:#$%&()*+<>=@\^_{}|~[]\+/\n/g   # keep ' for now
            s|[-/[[:digit:][:blank:][:cntrl:]]\+|\n|g
            s/\<'\+/\n/; s/'\>\+/\n/                   # remove '
           " <(<"$file" tr '[:upper:]' '[:lower:]') ) |
uniq -c | awk '$1 > +1 {print $2}'

시간 내에 382단어를 찾았습니다(대소문자를 구분하지 않음):

real   0m1.723s
user   0m1.872s
sys    0m0.048s

Answer

file=/usr/lib/python2.6/LICENSE.txt
dict=/usr/share/dict/british-english-huge   # or any suitable list

sort "$dict" \
     <(sed "s/[].,\"?!;:#$%&()*+<>=@\^_{}|~[]\+/\n/g   # keep ' for now
            s|[-/[[:digit:][:blank:][:cntrl:]]\+|\n|g
            s/\<'\+/\n/; s/'\>\+/\n/                   # remove '
           " <(<"$file" tr '[:upper:]' '[:lower:]') ) |
uniq -c | awk '$1 > +1 {print $2}'

시간 내에 382단어를 찾았습니다(대소문자를 구분하지 않음):

real   0m1.723s
user   0m1.872s
sys    0m0.048s

터미널을 통해 파일에서 영어 단어 찾기

답변1

답변2

답변3

답변4

관련 정보