Question

我有一个如下所示的列表：

exclude = ["please", "hi", "team"]

我有一个字符串如下：

text = "Hi team, please help me out."

我希望我的字符串看起来像：

text = ", help me out."

有效地删除列表exclude

中可能出现的任何字词

我尝试了以下内容：

if any(e in text.lower()) for e in exclude:
         print text.lower().strip(e)

但是上面的if语句返回一个布尔值，因此我得到以下错误：

NameError: name 'e' is not defined

我如何完成这项工作？

Answer 1

这样的东西？

>>> from string import punctuation
>>> ' '.join(x for x in (word.strip(punctuation) for word in text.split())
                                                   if x.lower() not in exclude)
'help me out

如果您想使用exclude中不存在的字词来保留尾随/前导标点符号：

>>> ' '.join(word for word in text.split()
                             if word.strip(punctuation).lower() not in exclude)
'help me out.'

第一个相当于：

>>> out = []
>>> for word in text.split():
        word = word.strip(punctuation)
        if word.lower() not in exclude:
            out.append(word)
>>> ' '.join(out)
'help me out'

Answer 2

您可以使用此功能（请记住它区分大小写）

for word in exclude:
    text = text.replace(word, "")

Answer 3

如果你不担心标点符号：

>>> import re
>>> text = "Hi team, please help me out."
>>> text = re.findall("\w+",text)
>>> text
['Hi', 'team', 'please', 'help', 'me', 'out']
>>> " ".join(x for x in text if x.lower() not in exclude)
'help me out'

在上面的代码中，re.findall会找到所有单词并将它们放在一个列表中 \w匹配A-Za-z0-9
+表示一次或多次出现

Answer 4

这将用空格替换非字母数字或属于停用词列表的所有内容，然后将结果拆分为您想要保留的单词。最后，列表被连接成一个字符串，其中单词是间隔的。注意：区分大小写。

' '.join ( re.sub('\W|'+'|'.join(stopwords),' ',sentence).split() )

使用示例：

>>> import re
>>> stopwords=['please','hi','team']
>>> sentence='hi team, please help me out.'
>>> ' '.join ( re.sub('\W|'+'|'.join(stopwords),' ',sentence).split() )
'help me out'

Answer 5

使用简单的方法：

import re
exclude = ["please", "hi", "team"]
text = "Hi team, please help me out."
l=[]

te = re.findall("[\w]*",text)
for a in te:
    b=''.join(a)
    if (b.upper() not in (name.upper() for name in exclude)and a):
        l.append(b)
print " ".join(l)

希望有所帮助

Python如何根据列表中的项从字符串中剥离字符串

5 个答案: