从熊猫列中删除字符串列表

时间:2020-10-11 21:59:49

标签: python pandas

我需要删除一个字符串列表:

list_strings=['describe','include','any']

来自熊猫的一列:

My_Column

include details about your goal
describe expected and actual results
show some code anywhere

我尝试过

df['My_Column']=df['My_Column'].str.replace('|'.join(list_strings), '')

但它会删除部分单词。

例如:

My_Column

details about your goal
expected and actual results
show some code where # here it should be anywhere

我的预期输出:

My_Column

details about your goal
expected and actual results
show some code anywhere 

3 个答案:

答案 0 :(得分:2)

像这样使用“单词边界”表达式\b

In [46]: df.My_Column.str.replace(r'\b{}\b'.format('|'.join(list_strings)), '')
Out[46]: 
0         details about your goal
1     expected and actual results
2         show some code anywhere
Name: My_Column, dtype: object

答案 1 :(得分:0)

.str.replace()方法的第一个参数必须是字符串或已编译的正则表达式;没有列表。

您可能想要

list_strings=['Describe','Include','any']            # Note capital D and capital I

for s in [f"\\b{s}\\b" for s in list_strings]:       # surrounded word boundaries (\b) 
    df['My_Column'] = df['My_Column'].str.replace(s, '')

获得

                     My_Column
0      details about your goal
1  expected and actual results
2      Show some code anywhere

答案 2 :(得分:0)

您的问题是pandas没有看到单词,只是看到了一个字符列表。因此,当您要求熊猫删除“ any”时,它并不是从勾勒单词开始的。因此,一种选择是自己做,也许是这样的:

# Your data
df = pd.DataFrame({'My_Column':
['Include details about your goal',
'Describe expected and actual results',
'Show some code anywhere']})

list_strings=['describe','include','any'] # make sure it's lower case

def remove_words(s):
    if s is not None:
        return ' '.join(x for x in s.split() if x.lower() not in list_strings)

# Apply the function to your column
df.My_Column = df.My_Column.map(remove_words)