Question

我从PEP 3131了解到Python中支持非ASCII标识符，但它不被视为最佳实践。

然而，我得到了这种奇怪的行为，我的标识符（U + 1D70F）似乎自动转换为τ（U + 03C4）。

class Base(object):
    def __init__(self):
        self. = 5 # defined with U+1D70F

a = Base()
print(a.)     # 5             # (U+1D70F)
print(a.τ)     # 5 as well     # (U+03C4) ? another way to access it?
d = a.__dict__ # {'τ':  5}     # (U+03C4) ? seems converted
print(d['τ'])  # 5             # (U+03C4) ? consistent with the conversion
print(d[''])  # KeyError: '' # (U+1D70F) ?! unexpected!

这是预期的行为吗？为什么会发生这种静默转换？ NFKC规范化有什么可看的吗？我认为这只是为了规范地排序Unicode字符序列 ...

Answer 1

每the documentation on identifiers：

解析时，所有标识符都转换为正常格式NFKC; 标识符的比较基于NFKC。

使用unicodedata：

可以看出U + 03C4是合适的结果

>>> import unicodedata
>>> unicodedata.normalize('NFKC', '')
'τ'

但是，此转换不适用于字符串文字，例如您用作字典键的字符串文字，因此它正在查找仅包含的字典中的未转换字符转换后的角色。

self. = 5  # implicitly converted to "self.τ = 5"
a.  # implicitly converted to "a.τ"
d['']  # not converted

你可以看到类似的问题，例如与getattr一起使用的字符串文字：

>>> getattr(a, '')
Traceback (most recent call last):
  File "python", line 1, in <module>
AttributeError: 'Base' object has no attribute ''
>>> getattr(a, unicodedata.normalize('NFKD', ''))
5

非ASCII Python标识符和反射率

1 个答案: