我有一个如下字符串:
THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist.
可以看出,撇号(')表示为ASCII码:
'
你怎么建议我编码这个字符串?
其他ascii代码也出现了:
"
&
答案 0 :(得分:0)
这些被称为HTML entities。最简单的方法是使用标准库中的HtmlParser:
>>> s = "THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist."
>>> import HTMLParser
>>> HTMLParser.HTMLParser().unescape(s)
u"THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist."
另见: