Question

使用Python 2.7。作为JSON响应的一部分，API返回字符串：

<a href="https://about.twitter.com/products/tweetdeck" rel="nofollow">TweetDeck</a>

我正在使用内部执行的库：

six.u(json.dumps(s))

json.dumps()输出为：

'"<a href=\\"https://about.twitter.com/products/tweetdeck\\" rel=\\"nofollow\\">TweetDeck</a>"'

可以使用json.loads

正确解码此输出

但是对six.u的调用给出了：

u'"<a href="https://about.twitter.com/products/tweetdeck" rel="nofollow">TweetDeck</a>"'

尝试使用json.loads解码此字符串会引发错误。

ValueError: Extra data: line 1 column 11 - line 1 column 86 (char 10 - 85)

看起来six.u的来电未转义href值，但我不完全确定如何解决这个问题。

Answer 1

six.u()适用于 unicode字符串文字，而不是JSON输出。您不应该使用它将JSON解码为Unicode字符串。

来自six.u() documenation：

“假”unicode文字。 text应该始终是正常的字符串文字。在Python 2中，u()返回unicode，在Python 3中返回一个字符串。此外，在Python 2中，字符串使用unicode-escape编解码器进行解码，该编解码器允许在其中使用unicode转义。

强调我的。

相反，如果使用Python 2，则解码字符串：

json_string = json.dumps(s)
if hasattr(json_string, 'decode'):
    # Python 2; decode to a Unicode value
    json_string = json_string.decode('ascii')

或使用unicode()函数并捕获Python 3中的NameError：

json_string = json.dumps(s)
try:
    # Python 2; decode to a Unicode value from ASCII
    json_string = unicode(json_string)
except NameError:
    # Python 3, already Unicode
    pass

或在致电ensure_ascii时将False设为json.dumps()：

json_string = json.dumps(s, ensure_ascii=False)

但是，这仍然可以在Python 2中返回str类型，但仅当输入只包含仅ASCII数据时，输出才能安全地与unicode值混合。

无论哪种方式，您都可以在Python 2和Python 3之间获得一致的值; six.u()解码还会将\uhhhh JSON Unicode转义序列解码为Unicode代码点，而Python 3 JSON结果则会保留这些转义序列。通过解码，您可以将\uhhhh序列保留在Python 2和3中，使用ensure_ascii，您可以在两者中获得Unicode代码点。

由于这是第三方库，我filed a bug report;你无法从这个错误中恢复过来;你不能在前面插入额外的反斜杠然后将它们移除，因为你无法将它们与正常的反斜杠区分开来。

six.u（）取消转义HTML字符串

1 个答案: