Question

我正在尝试转换此unicode值：

string_value = u'd \ xe9cid \ xe9'

到

string_value =u'décidé'

我觉得我已经尝试了一切：

decoded_str = string_value.decode('utf-8')

或

string_value = str(string_value)
decoded_str = string_value.encode('latin1').decode('utf-8')

或

string_value = string_value.decode('latin-1')

对于这个结果是：

d \ XC3 \ xa9cid \ XC3版权所有\ xA9

如果我这样做，我会得到相同的结果：

string_value = string_value.encode('utf-8')

我读过： How do I convert 'blah \xe9 blah' to 'blah é blah'

也来自： Why does Python print unicode characters when the default encoding is ASCII?

和： How do I convert a unicode to a string at the Python level?

编辑：

我的问题是我需要使用数据，我的意思是如果我有：

string_value = u'mai 2017 \ u2013 Aujourd \ u2019hui'

是：

mai 2017 - Aujourd'hui

我想这样做：

string_list = string_value.split('-')

但结果是：

[u'mai 2017 \u2013 Aujourd\u2019hui']

我愿意：

['mai 2017', 'Aujourd’hui']

新编辑：

由于你的回答，我明白我会走错方向。 \ xe9是'é'的正确表示，它不是问题。我真正的问题是为什么json.loads（）将'mai 2017 - Aujourd'hui'转换为'mai 2017 \ u2013 Aujourd \ u2019hui'？

Answer 1

我不确定您要问的是什么："C:\Program Files (x86)\Windows Kits\10\bin\x86\signtool.exe" sign /tr http://timestamp.digicert.com /td sha256 /fd sha256 /a "app.msi"代表代码点233（十六进制为\xe9），which simply is the letter "é"：

e9

您的困惑可能源于Python字符串的>>> u'é' == u'\xe9' True（在Python 2中）是ASCII，因此非ASCII字符被转义。如果你没有明确repr，那么Python控制台会使用repr显示一个值：

print

但是，当您打印该值时，转换不会发生并且一切都按预期工作：

>>> print(repr(u'é'))
u'\xe9'

>>> print(repr(u'\xe9'))
u'\xe9'

另请注意，在Python 3中，>>> print(u'é') é >>> print(u'\xe9') é返回Unicode：

repr

问题编辑后更新：

正如评论中所指出的，Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> print(repr(u'\xe9')) 'é'与\u2013不是同一个字符（就像-和a是单独的字符一样）。因此，您需要在b上进行拆分，而不是拆分\u2013。

Answer 2

splitting a string with a unicode delimiter?

所以...

print string_value.split(u"\u2013")

将unicode \ xe9转换为é（python 2.7）

2 个答案: