Question

我想转换

从此

&lt;b&gt;&lt;i&gt;&lt;u&gt;Charming boutique selling trendy casual &amp;amp; dressy apparel for women, some plus sized items, swimwear, shoes &amp;amp; jewelry.&lt;/u&gt;&lt;/i&gt;&lt;/b&gt;

对此

Charming boutique selling trendy casual dressy apparel for women, some plus sized items, swimwear, shoes jewelry.

我很困惑如何删除特殊字符以及特殊字符之间的一些字母。有人可以建议一种方法吗？

Answer 1

尝试以下操作：

import re

string = '&lt;b&gt;&lt;i&gt;&lt;u&gt;Charming boutique selling trendy casual &amp;amp; dressy apparel for women, some plus sized items, swimwear, shoes &amp;amp; jewelry.&lt;/u&gt;&lt;/i&gt;&lt;/b&gt;'

string = re.sub('&lt;/?[a-z]+&gt;', '', string)
string = string.replace('&amp;amp;', '&')

print(string)  # prints 'Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.'

您要更改的字符串看起来好像是HTML，已经被转义了几次，所以我的解决方案仅适用于这种情况。

我使用正则表达式将标签替换为空字符串，并使用文字&替换了转义符以代替＆符号。

希望这是您想要的，如果有任何麻烦，请告诉我。

Answer 2

您可以使用html模块和BeautifulSoup来获取没有转义标签的文本：

s = "&lt;b&gt;&lt;i&gt;&lt;u&gt;Charming boutique selling trendy casual &amp;amp; dressy apparel for women, some plus sized items, swimwear, shoes &amp;amp; jewelry.&lt;/u&gt;&lt;/i&gt;&lt;/b&gt;"

from bs4 import BeautifulSoup
from html import unescape

soup = BeautifulSoup(unescape(s), 'lxml')
print(soup.text)

打印：

Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.

如何在Python 3中删除字符串中的特殊字符？

2 个答案: