我需要删除一个字符串列表:
list_strings=['describe','include','any']
来自熊猫的一列:
My_Column
include details about your goal
describe expected and actual results
show some code anywhere
我尝试过
df['My_Column']=df['My_Column'].str.replace('|'.join(list_strings), '')
但它会删除部分单词。
例如:
My_Column
details about your goal
expected and actual results
show some code where # here it should be anywhere
我的预期输出:
My_Column
details about your goal
expected and actual results
show some code anywhere
答案 0 :(得分:2)
像这样使用“单词边界”表达式\b
。
In [46]: df.My_Column.str.replace(r'\b{}\b'.format('|'.join(list_strings)), '')
Out[46]:
0 details about your goal
1 expected and actual results
2 show some code anywhere
Name: My_Column, dtype: object
答案 1 :(得分:0)
.str.replace()
方法的第一个参数必须是字符串或已编译的正则表达式;没有列表。
您可能想要
list_strings=['Describe','Include','any'] # Note capital D and capital I
for s in [f"\\b{s}\\b" for s in list_strings]: # surrounded word boundaries (\b)
df['My_Column'] = df['My_Column'].str.replace(s, '')
获得
My_Column 0 details about your goal 1 expected and actual results 2 Show some code anywhere
答案 2 :(得分:0)
您的问题是pandas
没有看到单词,只是看到了一个字符列表。因此,当您要求熊猫删除“ any”时,它并不是从勾勒单词开始的。因此,一种选择是自己做,也许是这样的:
# Your data
df = pd.DataFrame({'My_Column':
['Include details about your goal',
'Describe expected and actual results',
'Show some code anywhere']})
list_strings=['describe','include','any'] # make sure it's lower case
def remove_words(s):
if s is not None:
return ' '.join(x for x in s.split() if x.lower() not in list_strings)
# Apply the function to your column
df.My_Column = df.My_Column.map(remove_words)