将ascii字符转换为普通文本

时间:2013-09-28 15:35:06

标签: python ascii

我有这样的文字:

‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.

我知道#8216是一个ASCII字符。如何在不使用繁琐的.replace的情况下将其转换为普通字符。

1 个答案:

答案 0 :(得分:3)

你有一个HTML转义。使用HTMLParser.HTMLParser() class取消这些:

from HTMLParser import HTMLParser

parser = HTMLParser()
unescaped = parser.unescape(escaped)

演示:

>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> escaped = '‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.'
>>> parser.unescape(escaped)
u'\u2018The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,\u2019wroteforumuser Ensorceled.'
>>> print parser.unescape(escaped)
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.

在Python 3中,HTMLParser模块已重命名为html.parser;相应地调整导入:

from html.parser import HTMLParser