目标:从一系列清单strings_2_remove
中删除项目。
我有list
的{{1}},例如:
strings
我还有strings_2_remove = [
"dogs are so cool",
"cats have cute toe beans"
]
的{{1}},看起来像这样:
series
从strings
中删除df.Sentences.head()
0 dogs are so cool because they are nice and funny
1 many people love cats because cats have cute toe beans
2 hamsters are very small and furry creatures
3 i got a dog because i know dogs are so cool because they are nice and funny
4 birds are funny when they dance to music, they bop up and down
Name: Summary, dtype: object
中的strings
后的结果应如下所示:
list
我尝试以下操作以实现所需的输出:
series
但是,这没有实现我的目标。
有什么建议吗?
答案 0 :(得分:1)
尝试:
result = df.Sentences
for stringToRemove in strings_2_remove:
result = result.replace(stringToRemove, '', regex=False)
使用RegEx有更好,性能更好的解决方案。更多信息here。
答案 1 :(得分:1)
df.Sentences.apply(lambda x: re.sub('|'.join(strings_2_remove),'',x))
答案 2 :(得分:1)
df.Sentences.replace('|'.join(strings_2_remove), '', regex=True)
0 because they are nice and funny
1 many people love cats because
2 hamsters are very small and furry creatures
3 i got a dog because i know because they are n...
4 birds are funny when they dance to music, they...
Name: Sentences, dtype: object
答案 3 :(得分:0)
我创建的测试数据框为:
df = pd.DataFrame({ 'Summary':[
'dogs are so cool because they are nice and funny',
'many people love cats because cats have cute toe beans',
'hamsters are very small and furry creatures',
'i got a dog because i know dogs are so cool because they are nice and funny',
'birds are funny when they dance to music, they bop up and down']})
第一步是将strings_2_remove
转换为模式列表
(您必须import re
):
pats = [ re.compile(str + ' *') for str in strings_2_remove ]
请注意,每个模式都用' *'
(一个可选的空格)补充。
否则,结果字符串可能包含两个相邻空格。
如我所见,其他解决方案在此细节上遗漏了。
然后定义要应用的功能:
def fn(txt):
for pat in pats:
if pat.search(txt):
return pat.sub('', txt)
return txt
对于每个模式,它都会搜索源字符串以及是否存在某些内容 找到然后返回替换结果 匹配的字符串中包含一个空字符串。 否则,它将返回源字符串。
唯一要做的就是应用此功能:
df.Summary.apply(fn)