한 단어에서 여러 문자를 대문자로 사용하세요.

Question

나는 다음 프로그램을 작성했다TxR이 문제를 해결하기 위해 Lisp가 사용되었습니다. 이는 /usr/share/dict/words사용 가능한 사전 파일에 의존하며, 추가적으로 컴퓨터 프로그램의 식별자에 대한 일반적인 약어인 자체 단어를 추가합니다. 나는 이러한 소스 중 일부를 발견하고 몇 가지를 직접 추가했습니다.

이 프로그램은 TXR Lisp의 내장 트라이 기능을 사용하여 사전에서 트라이를 구축합니다. 트리 구조는 대규모 사전에 최적화되지 않았고 메모리 집약적이며 코드가 약간 다루기 힘들기 때문에 합리화를 통해 이점을 얻을 수 있습니다.

가장 높은 수준에서 이 함수는 camelize질문의 요구 사항을 충족하기 위해 좋은 것으로 간주되는 단어(일반적으로 단 하나)의 낙타 표기 버전 목록을 반환합니다.

REPL을 사용하여 탐색할 수 있는 함수만 남겨 두었습니다.

$ txr -i camel.tl
Do not operate heavy equipment or motor vehicles while using TXR.
1> (camelize "shellexecuteex")
("ShellExecuteEx")
2> (camelize "getenvptr")
("GetEnvPtr")
3> (camelize "hownowbrowncow")
("HowNowBrownCow")
4> (camelize "arcanthill")
("ArcAnthill")
5> (camelize "calcmd5digest")
("CalcMd5Digest")
6> (camelize "themother")
("TheMother" "ThemOther")

작동 방식의 핵심은 break-word단어의 가능한 모든 분할 목록을 반환하는 함수입니다.

알고리즘은 재귀적이며 대략 다음과 같이 작동합니다.

사전에서 찾은 특정 단어의 모든 접두사를 찾아보세요. 예를 들어 albumin접두사 a, alb및 를 입력해 보세요 album.albumin
하나 이상의 접두사가 발견되면 반복합니다. 각 접두사에 대해 접미사를 단어로 나누고 각 가능성에 접두사를 추가하는 가능한 방법을 찾습니다.
접두어를 찾을 수 없으면 해당 단어는 사전에 없는 가비지 문자로 시작됩니다. 이 경우 해당 단어의 연속된 위치를 스캔하여 사전에 나오는 단어가 있는지 확인합니다. 예를 들어, 가 있는 경우 34albumin및 3는 4건너뛰고 a검색됩니다. 쓰레기가 분리되면 단어로 처리합니다. 2단계와 유사하게 단어의 나머지 부분을 재귀적으로 결합합니다.

이 camelize함수는 break-word아래와 같이 후보 세그먼트 집합을 가져오고 선택합니다.

각 단어 구분에는 숫자 쌍으로 구성된 정렬 키가 할당됩니다.쓰레기의 양단어의 구분과길이. 정크 카운트는 사전에 없는 문자의 수를 나타냅니다. 길이는 단어 분할의 요소 수입니다.
1에서 식별된 키를 기반으로 세그먼트 목록을 정렬합니다. 더 많은 정크 문자를 포함하는 단어 세그먼트는 엄격하게 더 나쁜 것으로 간주됩니다. 두 단어 세그먼트에 동일한 수의 정크 문자가 포함되어 있으면 더 많은 조각이 포함된 단어가 더 나쁜 것으로 간주됩니다. 예를 들어 의 경우 2개의 결함 이 themother있고 . 그들은 모두 쓰레기 문자가 0입니다. 쓰레기의 양은 0입니다. 그리고 둘 다 길이가 긴 두 요소이므로 동일합니다. 알고리즘은 둘 다 선택합니다.the motherthem other
2단계에서 정렬한 후 등가 키별로 정렬을 그룹화하고 가장 적합한 그룹을 선택합니다. 그런 다음 이 그룹에서 CamelCase를 생성합니다.

쓰레기는 목록 주석에 의해 단어 세그먼트로 식별됩니다 (:junk "grf"). 예를 들어, cat3dog다음과 같은 오류가 발생합니다 ("cat" (:junk "3") "dog"). junk-quantity함수의 코드는 camelize이를 처리하기 위해 일부 구조적 패턴 일치를 사용합니다.

긴 입력에는 시간이 걸립니다. 예를 들어 입력에는 몇 초 정도 걸립니다. 코드를 컴파일하면 속도를 높일 수 있습니다. 이 break-word기능은 또한 재귀 검색이 동일한 접두사 조합을 합산하려고 시도하기 때문에 동일한 접미사 세분화를 많이 계산하므로 암기의 이점을 누릴 수 있습니다.

1> (camelize "nowisthetimeforallgoodmentocometotheaidoftheircountry")
("NoWistHeTimeForAllGoodMenToComeToTheAidOfTheirCountry" 
 "NoWistHeTimeForAllGoodMenToComeToTheAidOftHeirCountry"
 "NoWistHeTimeForAllGoodMenToComeTotHeAidOfTheirCountry"
 "NoWistHeTimeForAllGoodMenToComeTotHeAidOftHeirCountry"
 "NoWistHeTimeForaLlGoodMenToComeToTheAidOfTheirCountry"
 "NoWistHeTimeForaLlGoodMenToComeToTheAidOftHeirCountry"
 "NoWistHeTimeForaLlGoodMenToComeTotHeAidOfTheirCountry"
 "NoWistHeTimeForaLlGoodMenToComeTotHeAidOftHeirCountry"
 "NowIsTheTimeForAllGoodMenToComeToTheAidOfTheirCountry"
 "NowIsTheTimeForAllGoodMenToComeToTheAidOftHeirCountry"
 "NowIsTheTimeForAllGoodMenToComeTotHeAidOfTheirCountry"
 "NowIsTheTimeForAllGoodMenToComeTotHeAidOftHeirCountry"
 "NowIsTheTimeForaLlGoodMenToComeToTheAidOfTheirCountry"
 "NowIsTheTimeForaLlGoodMenToComeToTheAidOftHeirCountry"
 "NowIsTheTimeForaLlGoodMenToComeTotHeAidOfTheirCountry"
 "NowIsTheTimeForaLlGoodMenToComeTotHeAidOftHeirCountry")

(왜 이런 결과가 나오는 걸까요 ForaLlGood? ll식별자를 계산할 때 약어로 사용되어 단어로 나열되기 때문입니다.)

이제 코드만 입력하면 됩니다.

(defvarl %dict% "/usr/share/dict/words")

(defun trie-dict (dict)
  (let ((trie (make-trie)))
    (each ((word dict))
      (if (> (len word) 2)
        (trie-add trie word t)))
    (each ((word '#"a I ad am an as at ax be by do ex go he hi \
                    id if in is it lo me mi my no of oh on or \
                    ow ox pa pi re so to un up us we abs act \
                    addr alloc alt arg attr app arr auth avg \
                    bat bg bin bool brk btn buf char calc cb \
                    cert cfg ch chr circ clr cmd cmp cnt \
                    concat conf config conn cont conv col coll \
                    com cord coord cos csum ctrl ctx cur cpy \
                    db dbg dec def def del dest dev dev diff \
                    dir dis disp doc drv dsc dt en enc env eq err \
                    expr exch exchg fig fmt fp func ge gen gt hex \
                    hdr hor hw id idx iface img inc info init int \
                    lang lat lib le len ll lon math max mem mcu \
                    mid min misc mng mod msg ne net num obj ord \
                    op os param pic pos posix pred pref prev proc \
                    prof ptr pwr px qry rand rect recv rem res \
                    ret rev req rng rx sem sel seq stat std str \
                    sin sqrt src swp sync temp temp tgl tmp tmr \
                    tran trans ts tx txt unix usr val var vert win \
                    xform xmit xref xtract"))
      (trie-add trie word t))
    trie))

(defvarl %trie% (trie-dict (file-get-lines %dict%)))

(defun break-word (trie word)
  (iflet ((lw (len word))
          ((plusp lw)))
    (build
      (let ((i 0)
            (cursor (trie-lookup-begin trie)))
        (whilet ((next (if (< i lw)
                         (trie-lookup-feed-char cursor [word i]))))
          (inc i)
          (set cursor next)
          (if (trie-value-at next)
            (let ((first-word [word 0..i])
                  (rest-words (break-word trie [word i..:])))
              (if rest-words
                (each ((rest-wordlist rest-words))
                  (add ^(,first-word ,*rest-wordlist)))
                (add ^(,first-word))))))
        (unless (get)
          (for ((j 1)) ((and (< j lw) (not (get)))) ((inc j))
            (let ((i j)
                  (cursor (trie-lookup-begin trie)))
              (whilet ((next (if (and (< i lw) (not (get)))
                               (trie-lookup-feed-char cursor [word i]))))
                (inc i)
                (set cursor next)
                (if (trie-value-at next)
                  (let ((junk-word [word 0..j])
                        (rest-words (break-word trie [word j..:])))
                    (each ((rest-wordlist rest-words))
                      (add ^((:junk ,junk-word) ,*rest-wordlist)))))))))
        (unless (get)
          (add ^((:junk ,word))))))))

(defun junk-quantity (broken-word)
  (let ((char-count 0))
    (each ((word broken-word))
      (if-match (:junk @str) word
        (inc char-count (len str))))
    char-count))

(defun camelize (word)
  (if (empty word)
    word
    (flow (break-word %trie% word)
      (mapcar [juxt [juxt junk-quantity len] use])
      (sort @1 : first)
      (partition-by first)
      first
      (mapcar second)
      (mapcar
        (opip (mapcar (do match @(or `@{x 1}@y`
                                     (:junk `@{x 1}@y`))
                                @1
                         `@(upcase-str x)@y`))
              cat-str)))))

Answer 1