Question

目标：从一系列清单strings_2_remove中删除项目。我有list的{{1}}，例如：

strings

我还有strings_2_remove = [ "dogs are so cool", "cats have cute toe beans" ]的{{1}}，看起来像这样：

series

从strings中删除df.Sentences.head() 0 dogs are so cool because they are nice and funny 1 many people love cats because cats have cute toe beans 2 hamsters are very small and furry creatures 3 i got a dog because i know dogs are so cool because they are nice and funny 4 birds are funny when they dance to music, they bop up and down Name: Summary, dtype: object中的strings后的结果应如下所示：

list

我尝试以下操作以实现所需的输出：

series

但是，这没有实现我的目标。

有什么建议吗？

Answer 1

尝试：

result = df.Sentences
for stringToRemove in strings_2_remove:
    result = result.replace(stringToRemove, '', regex=False)

使用RegEx有更好，性能更好的解决方案。更多信息here。

Answer 2

df.Sentences.apply(lambda x: re.sub('|'.join(strings_2_remove),'',x))

Answer 3

使用Series.replace：

df.Sentences.replace('|'.join(strings_2_remove), '', regex=True)

0                      because they are nice and funny
1                       many people love cats because 
2          hamsters are very small and furry creatures
3    i got a dog because i know  because they are n...
4    birds are funny when they dance to music, they...
Name: Sentences, dtype: object

Answer 4

我创建的测试数据框为：

df = pd.DataFrame({ 'Summary':[
    'dogs are so cool because they are nice and funny',
    'many people love cats because cats have cute toe beans',
    'hamsters are very small and furry creatures',
    'i got a dog because i know dogs are so cool because they are nice and funny',
    'birds are funny when they dance to music, they bop up and down']})

第一步是将strings_2_remove转换为模式列表（您必须import re）：

pats = [ re.compile(str + ' *') for str in strings_2_remove ]

请注意，每个模式都用' *'（一个可选的空格）补充。否则，结果字符串可能包含两个相邻空格。如我所见，其他解决方案在此细节上遗漏了。

然后定义要应用的功能：

def fn(txt):
    for pat in pats:
        if pat.search(txt):
            return pat.sub('', txt)
    return txt

对于每个模式，它都会搜索源字符串以及是否存在某些内容找到然后返回替换结果匹配的字符串中包含一个空字符串。否则，它将返回源字符串。

唯一要做的就是应用此功能：

df.Summary.apply(fn)

从一系列字符串中删除字符串列表

4 个答案: