파일에서 가장 일반적인 N개의 단어를 찾고 하이픈을 처리하는 방법은 무엇입니까?

Question 1

이렇게 하면 트릭을 수행할 수 있습니다.

sed ':1;/-$/{N;b1};s/-\n//g;y/ /\n/' file | sort | uniq -c

Answer

이렇게 하면 트릭을 수행할 수 있습니다.

sed ':1;/-$/{N;b1};s/-\n//g;y/ /\n/' file | sort | uniq -c

Question 2

Perl은 이에 편리합니다. -0777 스위치는 전체 파일을 단일 문자열로 변환합니다.

perl -0777 -ne '
   s/-\n//g;                  # join the hyphenated words
   $count{$_}++ for split;    # count all the words
   while (($k,$v) = each %count) {print "$k:$v\n"}
' file

world:2
helo:1
hello:2
words:2
test:2

출력은 특별한 순서가 없습니다.

더 모호한 것도 있습니다:티클. tclsh는 다른 언어처럼 선택의 폭이 넓지 않으므로 -e한 줄의 코드에는 더 많은 작업이 필요합니다. 이는 파일의 단어 순서를 유지한다는 장점이 있습니다.

echo '
    set fh [open [lindex $argv 1] r]
    set data [read -nonewline $fh]
    close $fh
    foreach word [split [string map {"-\n" ""} $data]] {
        dict incr count $word
    }
    dict for {k v} $count {puts "$k:$v"}
' | tclsh -- file

hello:2
world:2
test:2
helo:1
words:2

Answer

Perl은 이에 편리합니다. -0777 스위치는 전체 파일을 단일 문자열로 변환합니다.

perl -0777 -ne '
   s/-\n//g;                  # join the hyphenated words
   $count{$_}++ for split;    # count all the words
   while (($k,$v) = each %count) {print "$k:$v\n"}
' file

world:2
helo:1
hello:2
words:2
test:2

출력은 특별한 순서가 없습니다.

더 모호한 것도 있습니다:티클. tclsh는 다른 언어처럼 선택의 폭이 넓지 않으므로 -e한 줄의 코드에는 더 많은 작업이 필요합니다. 이는 파일의 단어 순서를 유지한다는 장점이 있습니다.

echo '
    set fh [open [lindex $argv 1] r]
    set data [read -nonewline $fh]
    close $fh
    foreach word [split [string map {"-\n" ""} $data]] {
        dict incr count $word
    }
    dict for {k v} $count {puts "$k:$v"}
' | tclsh -- file

hello:2
world:2
test:2
helo:1
words:2

Question 3

tr++ sed파이프를 사용하십시오 datamash.

$ tr ' ' '\n' <file | sed '/-/N;s/-\n//' | datamash -s -g1 --output-delimiter=':' count 1
hello:2
helo:1
test:2
words:2
world:2

Answer

tr++ sed파이프를 사용하십시오 datamash.

$ tr ' ' '\n' <file | sed '/-/N;s/-\n//' | datamash -s -g1 --output-delimiter=':' count 1
hello:2
helo:1
test:2
words:2
world:2

파일에서 가장 일반적인 N개의 단어를 찾고 하이픈을 처리하는 방법은 무엇입니까?

답변1

답변2

답변3

관련 정보