Question

如果我想打印Unicode，我通常会这样做：

print("There are ", end="")
try:
    print(u"\u221E", end="")  # ∞
    unicode_support = True
except UnicodeError:
    print("infinity", end="")
    unicode_support = False
print(" ways to get Unicode wrong.")

if unicode_support:
    print(u"\U0001F440 see you have a Unicode font.")
else:
    print("You do not have Unicode support.")

如果我想从一个方法或类似的东西返回一个Unicode字符串，这将无法工作，因为Python总是会理解带有Unicode的字符串文字，并且只在打印到没有Unicode支持的东西时才会抛出此错误。我想做这样的事情：

import sys as _sys

UNICODE_SUPPORT = _sys.stdout.unicode_support

def get_heart():
    if UNICODE_SUPPORT:
        return u"\u2665"  # ♥
    return "heart"

print("I{}U".format(get_heart.upper()))

如果当前的stdout支持Unicode else sys.stdout.supports_unicode，我希望等效True为False。

Answer 1

这主要是一个黑客攻击，但也可能是这样的：

 UNICODE_SUPPORT = sys.stdout.encoding in ('UTF-8', 'UTF-16', 'UTF-16LE', 'UTF-16BE', 'UTF-32', 'UTF-32LE', 'UTF32BE')

或（感谢Martijn Pieters）：

 UNICODE_SUPPORT = sys.stdout.encoding.lower().startswith('utf')

简单地说， Unicode 是一个庞大的列表，列出了世界各地用于编写语言的所有字符。包括古代语言和许多常见和不常见的符号（U+1F4A9）。该列表中的每个项目都称为代码点，并由数字标识。

UTF-8，UTF-16和UTF-32是编码，专门设计用于将所有代码点编码为字节序列。 UTF-16和UTF-32是固定大小的多字节编码，同时存在big-endian和little-endian。

Unicode设计为通用，根据定义，除UTF-...以外的任何编码仅支持Unicode的子集。 cp1252和iso-8859-15作为这样的编码，支持（部分）Unicode的拉丁子集。

Answer 2

sys.stdout.encoding

当没有为其设置编码时，

为None，例如在没有特殊预防措施的情况下重定向到文件时，例如print(u'fo\xe0ba')将失败（尝试使用ascii编码并失败）。

补充：请注意，大多数编码都是不“通用” - 每个编码仅支持Unicode的子集。 “支持Unicode”是一回事; “支持所有的Unicode”（又名“使用通用编码”）是另一种。

UTF-8是迄今为止最流行的通用编码，尽管您可能偶尔会遇到UTF-16，甚至是UTF-32（我个人从来没有遇到过后者“在野外”:-)。

顺便说一下，即使某个设备支持，例如utf-8，不暗示它的字体库中会有正确的字形，以便可读且毫不含糊地显示每个代码点 - - 这是一个非常不同的问题。

检查stdout是否支持unicode？

2 个答案: