如何在Python 3中删除字符串中的特殊字符?

时间:2018-07-30 19:46:09

标签: python string

我想转换

从此

<b><i><u>Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.</u></i></b>

对此

Charming boutique selling trendy casual dressy apparel for women, some plus sized items, swimwear, shoes jewelry.

我很困惑如何删除特殊字符以及特殊字符之间的一些字母。有人可以建议一种方法吗?

2 个答案:

答案 0 :(得分:2)

尝试以下操作:

import re

string = '<b><i><u>Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.</u></i></b>'

string = re.sub('</?[a-z]+>', '', string)
string = string.replace('&', '&')

print(string)  # prints 'Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.'

您要更改的字符串看起来好像是HTML,已经被转义了几次,所以我的解决方案仅适用于这种情况。

我使用正则表达式将标签替换为空字符串,并使用文字&替换了转义符以代替&符号。

希望这是您想要的,如果有任何麻烦,请告诉我。

答案 1 :(得分:2)

您可以使用html模块和BeautifulSoup来获取没有转义标签的文本:

s = "<b><i><u>Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.</u></i></b>"

from bs4 import BeautifulSoup
from html import unescape

soup = BeautifulSoup(unescape(s), 'lxml')
print(soup.text)

打印:

Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.