Question

通过阅读各种帖子，似乎JavaScript的unescape()等同于Pythons urllib.unquote()，但是当我测试两者时，我会得到不同的结果：

在浏览器控制台中：

unescape('%u003c%u0062%u0072%u003e');

输出： <br>

在Python解释器中：

import urllib
urllib.unquote('%u003c%u0062%u0072%u003e')

输出： %u003c%u0062%u0072%u003e

我希望Python也返回<br>。关于我在这里缺少什么的想法？

谢谢！

Answer 1

%uxxxx是urllib.parse.unquote()（Py 3）/ urllib.unquote()（Py 2）不支持的non standard URL encoding scheme。

它只是ECMAScript ECMA-262第3版的一部分;格式被W3C拒绝，并且从未成为RFC的一部分。

您可以使用正则表达式转换此类代码点：

try:
    unichr  # only in Python 2
except NameError:
    unichr = chr  # Python 3

re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), quoted)

这解码了ECMAScript 3rd ed可以解码的%uxxxx和%uxx格式。

演示：

>>> import re
>>> quoted = '%u003c%u0062%u0072%u003e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), quoted)
'<br>'
>>> altquoted = '%u3c%u0062%u0072%u3e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), altquoted)
'<br>'

但是如果可能的话，你应该完全避免使用编码。

Javascript unescape（）与Python urllib.unquote（）

在浏览器控制台中：

在Python解释器中：

1 个答案: