텍스트 문서의 유니코드 인코딩을 확인하는 방법

Question

이맥스

C-x =( M-x what-cursor-position)예를 들어 화면 하단에 현재 문자에 대한 코드 포인트 정보를 표시합니다.

Char: И (1048, #o2030, #x418, file ...) point=7 of 8 (75%) column=0

C-u C-x =코드 포인트, 바이트 표현, 유니코드 문자에 대한 메타데이터, 문자를 표시하는 데 사용되는 글꼴 등을 포함한 추가 정보가 있는 창을 엽니다.

             position: 7 of 8 (75%), column: 0
            character: И (displayed as И) (codepoint 1048, #o2030, #x418)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x0418
               script: cyrillic
               syntax: w    which means: word
             category: .:Base, L:Left-to-right (strong), Y:2-byte Cyrillic, c:Chinese, h:Korean, j:Japanese, y:Cyrillic
             to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
          buffer code: #xD0 #x98
            file code: #xD0 #x98 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-DAMA-Ubuntu Mono-normal-normal-normal-*-17-*-*-*-m-0-iso10646-1 (#x2CB)
         Unicode data:
                 Name: CYRILLIC CAPITAL LETTER I
             Category: Letter, Uppercase
      Combining class: Lu
        Bidi category: Lu
             Old name: CYRILLIC CAPITAL LETTER II
            Lowercase: и

Character code properties: customize what to show
  name: CYRILLIC CAPITAL LETTER I
  old-name: CYRILLIC CAPITAL LETTER II
  general-category: Lu (Letter, Uppercase)
  decomposition: (1048) ('И')

[back]

명령줄

이것unicode유틸리티(Debian/Ubuntu/...와 같은 일부 배포판에서 사용 가능)pip3 install unicode)은 하나 이상의 유니코드 문자에 대한 정보를 표시합니다. 편집기에서 복사하여 붙여넣는 경우 해당 편집기는 파일과 다르게 클립보드를 인코딩할 수 있습니다.

$ unicode И
U+0418 CYRILLIC CAPITAL LETTER I
UTF-8: d0 98 UTF-16BE: 0418 Decimal: &#1048; Octal: \02030
И (и)
Lowercase: 0438
Category: Lu (Letter, Uppercase)
Bidi: L (Left-to-Right)

Answer 1

이맥스

C-x =( M-x what-cursor-position)예를 들어 화면 하단에 현재 문자에 대한 코드 포인트 정보를 표시합니다.

Char: И (1048, #o2030, #x418, file ...) point=7 of 8 (75%) column=0

C-u C-x =코드 포인트, 바이트 표현, 유니코드 문자에 대한 메타데이터, 문자를 표시하는 데 사용되는 글꼴 등을 포함한 추가 정보가 있는 창을 엽니다.

             position: 7 of 8 (75%), column: 0
            character: И (displayed as И) (codepoint 1048, #o2030, #x418)
    preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x0418
               script: cyrillic
               syntax: w    which means: word
             category: .:Base, L:Left-to-right (strong), Y:2-byte Cyrillic, c:Chinese, h:Korean, j:Japanese, y:Cyrillic
             to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
          buffer code: #xD0 #x98
            file code: #xD0 #x98 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-DAMA-Ubuntu Mono-normal-normal-normal-*-17-*-*-*-m-0-iso10646-1 (#x2CB)
         Unicode data:
                 Name: CYRILLIC CAPITAL LETTER I
             Category: Letter, Uppercase
      Combining class: Lu
        Bidi category: Lu
             Old name: CYRILLIC CAPITAL LETTER II
            Lowercase: и

Character code properties: customize what to show
  name: CYRILLIC CAPITAL LETTER I
  old-name: CYRILLIC CAPITAL LETTER II
  general-category: Lu (Letter, Uppercase)
  decomposition: (1048) ('И')

[back]

명령줄

이것unicode유틸리티(Debian/Ubuntu/...와 같은 일부 배포판에서 사용 가능)pip3 install unicode)은 하나 이상의 유니코드 문자에 대한 정보를 표시합니다. 편집기에서 복사하여 붙여넣는 경우 해당 편집기는 파일과 다르게 클립보드를 인코딩할 수 있습니다.

$ unicode И
U+0418 CYRILLIC CAPITAL LETTER I
UTF-8: d0 98 UTF-16BE: 0418 Decimal: &#1048; Octal: \02030
И (и)
Lowercase: 0438
Category: Lu (Letter, Uppercase)
Bidi: L (Left-to-Right)

텍스트 문서의 유니코드 인코딩을 확인하는 방법

내가 시도한 것

내가 무엇을 바라는가?

사례 연구: 러시아어 악센트 모음

노트

답변1

이맥스

명령줄

관련 정보