Question

我需要将除字母之外的所有ascii符号替换为HTML编号（http://www.ascii.cl/htmlcodes.htm）。从这篇文章（Convert HTML entities to Unicode and vice versa），我可以使用这段代码，但我仍然无法使*（或许可能是其他许多角色）工作。

可能是什么解决方案？只是简单的替换可能是唯一的解决方案吗？

>>> from BeautifulSoup import BeautifulStoneSoup as bs
>>> import cgi
>>> cgi.escape("<*>").encode('ascii', 'xmlcharrefreplace')

'&lt;*&gt;'

Answer 1

你的问题有点模糊。我将假设通过＆＃34;字母＆＃34;你的意思是来自a-z的所有字符及其大写变体。然后，您可以使用正则表达式实现所需的结果：

>>> f = lambda s: re.sub(r'([^a-zA-Z])', lambda x: '&#{};'.format(ord(x.group(0))), s)
>>> f("<hi>")
'&#60;hi&#62;'
>>> f("<*>")
'&#60;&#42;&#62;'

请注意，在不知道您的特殊应用的情况下，这看起来很奇怪。可能有更好的方法来解决真正的潜在问题。