Question

我需要删除一个字符串列表：

list_strings=['describe','include','any']

来自熊猫的一列：

My_Column

include details about your goal
describe expected and actual results
show some code anywhere

我尝试过

df['My_Column']=df['My_Column'].str.replace('|'.join(list_strings), '')

但它会删除部分单词。

例如：

My_Column

details about your goal
expected and actual results
show some code where # here it should be anywhere

我的预期输出：

My_Column

details about your goal
expected and actual results
show some code anywhere

Answer 1

像这样使用“单词边界”表达式\b。

In [46]: df.My_Column.str.replace(r'\b{}\b'.format('|'.join(list_strings)), '')
Out[46]: 
0         details about your goal
1     expected and actual results
2         show some code anywhere
Name: My_Column, dtype: object

Answer 2

.str.replace()方法的第一个参数必须是字符串或已编译的正则表达式；没有列表。

您可能想要

list_strings=['Describe','Include','any']            # Note capital D and capital I

for s in [f"\\b{s}\\b" for s in list_strings]:       # surrounded word boundaries (\b) 
    df['My_Column'] = df['My_Column'].str.replace(s, '')

获得

                     My_Column
0      details about your goal
1  expected and actual results
2      Show some code anywhere

Answer 3

您的问题是pandas没有看到单词，只是看到了一个字符列表。因此，当您要求熊猫删除“ any”时，它并不是从勾勒单词开始的。因此，一种选择是自己做，也许是这样的：

# Your data
df = pd.DataFrame({'My_Column':
['Include details about your goal',
'Describe expected and actual results',
'Show some code anywhere']})

list_strings=['describe','include','any'] # make sure it's lower case

def remove_words(s):
    if s is not None:
        return ' '.join(x for x in s.split() if x.lower() not in list_strings)

# Apply the function to your column
df.My_Column = df.My_Column.map(remove_words)

从熊猫列中删除字符串列表

3 个答案: