Unicode中的汉字完整范围是什么?

时间:2009-09-02 06:13:29

标签: unicode cjk

U + 4E00..U + 9FFF是整套的一部分,但不是全部

5 个答案:

答案 0 :(得分:92)

您可以通过CJK Unicode FAQ(包括“中文,日文和韩文”字符)查找完整列表

East Asian Script”文档的确提到:

  

包含汉字表意文字的块

     

汉字表意字符可在Unicode标准的五个主要块中找到,如   如表12-2所示

表12-2。含汉字表意文字的块

Block                                   Range       Comment
CJK Unified Ideographs                  4E00-9FFF   Common
CJK Unified Ideographs Extension A      3400-4DBF   Rare
CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants

注意:块范围可以随着时间的推移而发展:最新版本在CJK Unified Ideographs

另见维基百科:

答案 1 :(得分:45)

Unicode目前有74605个CJK字符。 CJK字符不仅包括中文使用的字符,还包括日语汉字,韩语汉字和越南语Chu Nom。一些CJK字符中文字符。

1)来自CJK Unified Ideographs block

的20941个字符

代码将U + 4E00指向U + 9FCC。

  1. U+4E00 - U+62FF
  2. U+6300 - U+77FF
  3. U+7800 - U+8CFF
  4. U+8D00 - U+9FCC
  5. 2)来自CJKUI Ext A block的6582个字符。

    代码点U+3400 to U+4DB5。 Unicode 3.0(1999)。

    3)来自CJKUI Ext B block

    的42711个字符

    代码点U + 20000至U + 2A6D6。 Unicode 3.1(2001)。

    1. U+20000 - U+215FF
    2. U+21600 - U+230FF
    3. U+23100 - U+245FF
    4. U+24600 - U+260FF
    5. U+26100 - U+275FF
    6. U+27600 - U+290FF
    7. U+29100 - U+2A6DF
    8. 3)来自CJKUI Ext C block

      的4149个字符

      代码点U+2A700 to U+2B734。 Unicode 5.2(2009)。

      4)来自CJKUI Ext D block

      的222个字符

      代码点U+2B740 to U+2B81D。 Unicode 6.0(2010)。

      5)CJKUI Ext E block。

      Coming soon

      如果上述情况不够意义,请查看known issues。玩得开心=)

答案 2 :(得分:3)

Unicode version 11.0.0

In Unicode the Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters.

These ranges often contain non-assigned or reserved code points(suck as U+2E9A , U+2EF4 - 2EFF),

Chinese characters

bottom  top     reference(also have a look at wiki page)    block name
4E00    9FEF    http://www.unicode.org/charts/PDF/U4E00.pdf CJK Unified Ideographs
3400    4DBF    http://www.unicode.org/charts/PDF/U3400.pdf CJK Unified Ideographs Extension A
20000   2A6DF   http://www.unicode.org/charts/PDF/U20000.pdf    CJK Unified Ideographs Extension B
2A700   2B73F   http://www.unicode.org/charts/PDF/U2A700.pdf    CJK Unified Ideographs Extension C
2B740   2B81F   http://www.unicode.org/charts/PDF/U2B740.pdf    CJK Unified Ideographs Extension D
2B820   2CEAF   http://www.unicode.org/charts/PDF/U2B820.pdf    CJK Unified Ideographs Extension E
2CEB0   2EBEF   https://www.unicode.org/charts/PDF/U2CEB0.pdf   CJK Unified Ideographs Extension F
3007    3007    https://zh.wiktionary.org/wiki/%E3%80%87    in block CJK Symbols and Punctuation
  • In CJK Unified Ideographs block, I notice many answers use upper bound 9FCC, but U+9FCD(鿍) is indeed a chinese char. And all characters in this block are Chinese characters(also used in Japanese or Korean etc.).
  • Most of characters in CJK Unified Ideograohs Ext (Except Ext F, only 17% in Ext F are chinese characters), are traditional chinese characters, which are rarely used in China.
  • 〇 is the chinese character form of zero and still in use today

Therefore the range is

[0x3007,0x3007],[0x3400,0x4DBF],[0x4E00,0x9FEF],[0x20000,0x2EBFF]

CJK characters but never used in chinese

They are Common Han used only for compatibility.

It is almost impossible to see them appear in any chinese book, article , writings etc.

all characters here has one corresponding glyph-identical chinese characters. Such as 金(U+F90A) and 金(U+91D1), they are identical in Glyph.

 F900    FAFF   https://www.unicode.org/charts/PDF/UF900.pdf  CJK Compatibility Ideographs
2F800   2FA1F   https://www.unicode.org/charts/PDF/U2F800.pdf CJK Compatibility Ideographs Supplement

CJK related symbols

2E80    2EFF    http://www.unicode.org/charts/PDF/U2E80.pdf CJK Radicals Supplement

2F00    2FDF    http://www.unicode.org/charts/PDF/U2F00.pdf Kangxi Radicals 
2FF0    2FFF    https://unicode.org/charts/PDF/U2FF0.pdf    Ideographic Description Character
3000    303F    https://www.unicode.org/charts/PDF/U3000.pdf    CJK Symbols and Punctuation
3100    312f    https://unicode.org/charts/PDF/U3100.pdf    Bopomofo
31A0    31BF    https://unicode.org/charts/PDF/U31A0.pdf    Bopomofo Extended
31C0    31EF    http://www.unicode.org/charts/PDF/U31C0.pdf CJK Strokes
3200    32FF    https://unicode.org/charts/PDF/U3200.pdf    Enclosed CJK Letters and Months
3300    33FF    https://unicode.org/charts/PDF/U3300.pdf    CJK Compatibility
FE30    FE4F    https://www.unicode.org/charts/PDF/UFE30.pdf    CJK Compatibility Forms
FF00    FFEF    https://www.unicode.org/charts/PDF/UFF00.pdf    Halfwidth and Fullwidth Forms
1F200   1F2FF   https://www.unicode.org/charts/PDF/U1F200.pdf   Enclosed Ideographic Supplement
  • some blocks such as Hangul Compatibility Jamo are abandoned because of no relation to Chinese.
  • Kangxi Radicals is not Chinese characters, it's graphical component of a Chinese charaters, it are used specially to express radicals, .e.g. ⼻(U+2F3B) and 彳(U+5F73), ⻜(U+2EDC) and 飞 (U+98DE)

Other common punctuation appears in chinese

This is a wide range, some punctuation maybe never used, some punctuations such as ……”“ are used so much in chinese.

0000    007F    https://unicode.org/charts/PDF/U0000.pdf    C0 Controls and Basic Latin 
2000    206F    https://unicode.org/charts/PDF/U2000.pdf    General Punctuation
……

There are also many chinese-related symbols, such as Yijing Hexagram Symbols or Kanbun, but it's off-topic anyway. I write non-chinese-characters in CJK to have a better explaination of what are chinese characters. And ranges above already covers almost all of chars appear in Chinese writing except math and other specialty notation.

Supplementary

CJK Symbols and Punctuation

 、。〃〄々〆〇〈〉《》「」『』【】〒〓〔〕〖〗〘〙〚〛〜〝〞〟〠〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬〰〱〲〳〴〵〶〷〸〹〺〻〼〽 〾 〿

Halfwidth and Fullwidth Forms

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ¢£¬ ̄¦¥₩│←↑→↓■○

Refer

  1. https://zh.wikipedia.org/wiki/%E6%B1%89%E5%AD%97 (in chinese language, notice the right side bar)
  2. https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%9B%B8%E5%AE%B9%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97 (notice the bottom table)
  3. http://www.unicode.org

答案 3 :(得分:1)

其他人回答的Unicode代码块肯定涵盖了大多数中文Unicode字符,但也查看了其他一些代码块。

CJK_UNIFIED_IDEOGRAPHS
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E
CJK_COMPATIBILITY
CJK_COMPATIBILITY_FORMS
CJK_COMPATIBILITY_IDEOGRAPHS
CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT
CJK_RADICALS_SUPPLEMENT
CJK_STROKES
CJK_SYMBOLS_AND_PUNCTUATION
ENCLOSED_CJK_LETTERS_AND_MONTHS
ENCLOSED_IDEOGRAPHIC_SUPPLEMENT
KANGXI_RADICALS
IDEOGRAPHIC_DESCRIPTION_CHARACTERS

请参阅我更全面的讨论herethis site便于浏览Unicode。

答案 4 :(得分:0)

总而言之,听起来像是它们:

var blocks = [
  [0x3400, 0x4DB5],
  [0x4E00, 0x62FF],
  [0x6300, 0x77FF],
  [0x7800, 0x8CFF],
  [0x8D00, 0x9FCC],
  [0x2e80, 0x2fd5],
  [0x3190, 0x319f],
  [0x3400, 0x4DBF],
  [0x4E00, 0x9FCC],
  [0xF900, 0xFAAD],
  [0x20000, 0x215FF],
  [0x21600, 0x230FF],
  [0x23100, 0x245FF],
  [0x24600, 0x260FF],
  [0x26100, 0x275FF],
  [0x27600, 0x290FF],
  [0x29100, 0x2A6DF],
  [0x2A700, 0x2B734],
  [0x2B740, 0x2B81D]
]