从一系列字符串中删除字符串列表

时间:2019-04-17 16:15:55

标签: python string pandas

目标:从一系列清单strings_2_remove中删除项目。 我有list的{​​{1}},例如:

strings

我还有strings_2_remove = [ "dogs are so cool", "cats have cute toe beans" ] 的{​​{1}},看起来像这样:

series

strings中删除df.Sentences.head() 0 dogs are so cool because they are nice and funny 1 many people love cats because cats have cute toe beans 2 hamsters are very small and furry creatures 3 i got a dog because i know dogs are so cool because they are nice and funny 4 birds are funny when they dance to music, they bop up and down Name: Summary, dtype: object 中的strings后的结果应如下所示:

list

我尝试以下操作以实现所需的输出:

series

但是,这没有实现我的目标。

有什么建议吗?

4 个答案:

答案 0 :(得分:1)

尝试:

result = df.Sentences
for stringToRemove in strings_2_remove:
    result = result.replace(stringToRemove, '', regex=False)

使用RegEx有更好,性能更好的解决方案。更多信息here

答案 1 :(得分:1)

df.Sentences.apply(lambda x: re.sub('|'.join(strings_2_remove),'',x))

答案 2 :(得分:1)

使用Series.replace

df.Sentences.replace('|'.join(strings_2_remove), '', regex=True)

0                      because they are nice and funny
1                       many people love cats because 
2          hamsters are very small and furry creatures
3    i got a dog because i know  because they are n...
4    birds are funny when they dance to music, they...
Name: Sentences, dtype: object

答案 3 :(得分:0)

我创建的测试数据框为:

df = pd.DataFrame({ 'Summary':[
    'dogs are so cool because they are nice and funny',
    'many people love cats because cats have cute toe beans',
    'hamsters are very small and furry creatures',
    'i got a dog because i know dogs are so cool because they are nice and funny',
    'birds are funny when they dance to music, they bop up and down']})

第一步是将strings_2_remove转换为模式列表 (您必须import re):

pats = [ re.compile(str + ' *') for str in strings_2_remove ]

请注意,每个模式都用' *'(一个可选的空格)补充。 否则,结果字符串可能包含两个相邻空格。 如我所见,其他解决方案在此细节上遗漏了。

然后定义要应用的功能:

def fn(txt):
    for pat in pats:
        if pat.search(txt):
            return pat.sub('', txt)
    return txt

对于每个模式,它都会搜索源字符串以及是否存在某些内容 找到然后返回替换结果 匹配的字符串中包含一个空字符串。 否则,它将返回源字符串。

唯一要做的就是应用此功能:

df.Summary.apply(fn)