"LC_COLLATE=en_US.UTF-8 ls -l"의 순서를 디버깅하거나 추적하려면 어떻게 해야 합니까?

2024-6-2 • tag-icon

"LC_COLLATE=en_US.UTF-8 ls -l"의 순서를 디버깅하거나 추적하려면 어떻게 해야 합니까?

(en_US.UTF-8 로케일에서 예기치 않은 정렬 순서가 발생했습니다.UTF 정렬의 기본 설명)

다음 테스트 시나리오를 고려해보세요.

# touch abc{.{g,t,t.t},@.s,-{s-w,t.c}}
# LC_COLLATE="en_US.UTF-8" ls -l test
total 0
-rw-r--r-- 1 root root 0 May  8 08:52 abc.g
-rw-r--r-- 1 root root 0 May  8 08:52 [email protected]
-rw-r--r-- 1 root root 0 May  8 08:52 abc-s-w
-rw-r--r-- 1 root root 0 May  8 08:52 abc.t
-rw-r--r-- 1 root root 0 May  8 08:52 abc-t.c
-rw-r--r-- 1 root root 0 May  8 08:52 abc.t.t

나는 "모든 점"이나 "모든 빼기 기호"가 먼저 정렬될 것이라고 생각했지만 결과는 흥미로운 혼합처럼 보입니다. 사용된 패키지는

coreutils-8.25-13.7.1.x86_64
glibc-2.22-100.8.1.x86_64
glibc-locale-2.22-100.8.1.x86_64

LC_COLLATE=POSIX"올바른" 정렬을 사용한 결과:

# ls -l test
total 0
-rw-r--r-- 1 root root 0 May  8 08:52 abc-s-w
-rw-r--r-- 1 root root 0 May  8 08:52 abc-t.c
-rw-r--r-- 1 root root 0 May  8 08:52 abc.g
-rw-r--r-- 1 root root 0 May  8 08:52 abc.t
-rw-r--r-- 1 root root 0 May  8 08:52 abc.t.t
-rw-r--r-- 1 root root 0 May  8 08:52 [email protected]

일부 세부정보:

# locale -k LC_COLLATE
collate-nrules=0
collate-rulesets=""
collate-symb-hash-sizemb=0
collate-codeset="ANSI_X3.4-1968"
# LC_COLLATE="en_US.UTF-8" locale -k LC_COLLATE
collate-nrules=4
collate-rulesets=""
collate-symb-hash-sizemb=2707
collate-codeset="UTF-8"

자세한 추적이나 디버그 메시지처럼 정렬을 "설명"하는 방법이 있나요? 명령일 필요는 없지만 ls몇 가지 간단한 데모 코드도 작동합니다.

LC_COLLATE약간 "더 전통적인" UTF-8에 대한 안전한 대안이 있습니까? 즉, LC_COLLATE=POSIX안전하게 사용할 수 있습니까?

관련 정보