>>> test
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2
'"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> print test
"Hello," he said.
"I am nine years oldâ"
>>> print test2
"Hello," he\u200b said\u200f\u200e.
"I\u200b am\u200b nine years old"
那么我如何从test2转换为test(即打印unicode字符)? .decode('utf-8')
没有做到。
答案 0 :(得分:3)
您可以使用unicode-escape
encoding将'\\u200b'
解码为u'\u200b'
。
>>> test1 = u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2 = '"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> test2.decode('unicode-escape')
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old"'
>>> print test2.decode('unicode-escape')
"Hello," he said.
"I am nine years old"
注意:但即使这样,test2
也无法解码为与test1
完全匹配,因为u'\xe2'
中的test1
就在结束引号之前("
})。
>>> test1 == test2.decode('unicode-escape')
False
>>> test1.replace(u'\xe2', '') == test2.decode('unicode-escape')
True