Question

来自unicodedata doc：

unicodedata.digit（chr [，default]）返回分配给的数字值   字符chr为整数。如果未定义此类值，则默认为   返回，或者，如果没有给出，则引发ValueError。

unicodedata.numeric（chr [，default]）返回指定的数值   将字符chr作为float。如果未定义此类值，则默认为   返回，或者，如果没有给出，则引发ValueError。

有人可以解释一下这两个功能之间的区别吗？

这里可以阅读the implementation of both functions，但对我来说并不明显，因为我不熟悉CPython的实现，因此与快速查看有什么区别。

编辑1：

很好的例子可以显示差异。

编辑2：

用于补充评论的示例以及来自@ user2357112的精彩答案：

print(unicodedata.digit('1')) # Decimal digit one.
print(unicodedata.digit('١')) # ARABIC-INDIC digit one
print(unicodedata.digit('¼')) # Not a digit, so "ValueError: not a digit" will be generated.

print(unicodedata.numeric('Ⅱ')) # Roman number two.
print(unicodedata.numeric('¼')) # Fraction to represent one quarter.

Answer 1

简短回答：

如果一个字符代表一个十进制数字，那么像1，¹（SUPERSCRIPT ONE），①（CIRCLED DIGIT ONE），١（ARABIC-INDIC） DIGIT ONE），unicodedata.digit将返回字符表示为int的数字（所有这些示例都为1）。

如果字符代表任何数值，那么诸如⅐（VULGAR FRACTION ONE SEVENTH）和所有十进制数字示例unicodedata.numeric之类的东西将把该字符的数值作为浮点数。

由于技术原因，最新的数字字符如（DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO）可能会引发unicodedata.digit的ValueError。

答案很长：

Unicode字符都具有Numeric_Type属性。此属性可以有4个可能的值：Numeric_Type = Decimal，Numeric_Type = Digit，Numeric_Type = Numeric或Numeric_Type = None。

引用Unicode standard, version 10.0.0, section 4.6，

Numeric_Type = Decimal属性值（与General_Category = Nd相关联属性值）仅限于十进制基数中使用的那些数字字符数字和一组完整的数字已在连续范围内编码，以Numeric_Value的升序排列，并以数字零作为第一个代码点范围。

Numeric_Type =十进制字符因此是符合其他一些特定技术要求的十进制数字。

排除了这些属性赋值在Unicode标准中定义的十进制数字一些字符，如CJK表意数字（见表4-5中的前十个条目），它们不是以连续的顺序编码的。十进制数字也不包括兼容性下标和上标数字，以防止简单的解析器误解他们在上下文中的价值（有关上标和下标的更多信息，请参阅第22.4节，上标和下标符号。）传统上，Unicode字符数据库已经为这些非连续或兼容性数字组赋值Numeric_Type = Digit，以识别它们由数字值组成但不一定满足Numeric_Type = Decimal的所有条件。但是，区别 Numeric_Type = Digit和更通用的Numeric_Type = Numeric已被证明不是在实现中很有用。结果，未来的数字组可以添加到标准中哪些不符合Numeric_Type = Decimal的标准就是分配值Numeric_Type = Numeric。

所以Numeric_Type = Digit历史上用于不符合Numeric_Type = Decimal的技术要求的其他数字，但他们认为没有用，并且不符合Numeric_Type = Decimal要求的数字字符刚刚被赋予Numeric_Type = Numeric，因为Unicode 6.3.0。例如，Unicode 7.0中引入的（DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO）具有Numeric_Type = Numeric。

Numeric_Type =数字表示代表数字且不适合其他类别的所有字符，Numeric_Type = None表示不代表数字的字符（或至少不在正常使用情况下）。< / p>

具有非None Numeric_Type属性的所有字符都具有表示其数值的Numeric_Value属性。对于具有Numeric_Type = Decimal或Numeric_Type = Digit的字符，unicodedata.digit将返回该值作为int，并且unicodedata.numeric将返回该值作为具有任何非None Numeric_Type的字符的float。

unicodedata.digit和unicodedata.numeric有什么区别？

1 个答案: