我有这样的文字:
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.
我知道#8216是一个ASCII字符。如何在不使用繁琐的.replace的情况下将其转换为普通字符。
答案 0 :(得分:3)
你有一个HTML转义。使用HTMLParser.HTMLParser()
class取消这些:
from HTMLParser import HTMLParser
parser = HTMLParser()
unescaped = parser.unescape(escaped)
演示:
>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> escaped = '‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.'
>>> parser.unescape(escaped)
u'\u2018The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,\u2019wroteforumuser Ensorceled.'
>>> print parser.unescape(escaped)
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.
在Python 3中,HTMLParser
模块已重命名为html.parser
;相应地调整导入:
from html.parser import HTMLParser